Skip to content

Mathematical Foundations

Number Systems

Computers use different number systems to represent data, each with specific use cases:

  • Binary (Base-2): Uses 0s and 1s (bits). Each bit represents a power of 2 (e.g., 1011 = \(1 \\cdot 2^3 + 0 \\cdot 2^2 + 1 \\cdot 2^1 + 1 \\cdot 2^0 = 8 + 0 + 2 + 1 = 11\) decimal). Binary is the native format for CPU registers, memory, and instructions.
  • Decimal (Base-10): Human-readable, used in user interfaces and financial calculations. Computers convert decimal inputs to binary for processing.
  • Hexadecimal (Base-16): Uses 0-9 and A-F (e.g., 2F = \(2 \\cdot 16^1 + 15 \\cdot 16^0 = 47\) decimal). Compact for representing binary (1 hex digit = 4 bits), common in memory addresses and debugging.
  • Octal (Base-8): Less common, used in legacy systems or Unix permissions (e.g., 777 octal for file permissions).

Binary Number System

Basics: Bits, Bytes, and Positional Notation

  • Bits and Grouping:
    • A bit (binary digit) is the smallest unit, representing 0 or 1.
    • Bits are grouped for efficiency:
      • Nibble: 4 bits (e.g., 1011).
      • Byte: 8 bits (e.g., 10110011), the standard unit for memory addressing.
      • Word: Typically 16, 32, or 64 bits, depending on architecture (e.g., 64-bit in modern x86 CPUs).
    • MSB (Most Significant Bit): The leftmost bit, with the highest positional value.
    • LSB (Least Significant Bit): The rightmost bit, with the lowest value.

Conversions

  • Binary to Decimal: Sum the positional values where bits are 1.

  • Decimal to Binary: Repeatedly divide by 2, recording remainders (LSB first).

    • Example: Convert 13 to binary: 13 ÷ 2 = 6 remainder 1 6 ÷ 2 = 3 remainder 0 3 ÷ 2 = 1 remainder 1 1 ÷ 2 = 0 remainder 1 Read remainders bottom-up: 1101.
  • Binary to Hex/Octal: Group bits (4 for hex, 3 for octal) and convert each group.

    • Example: Binary 10110011 = 1011 (B hex) + 0011 (3 hex) = B3 hex.
    • Octal: 10 110 011 = 2 (2 oct) + 6 (6 oct) + 3 (3 oct) = 263 octal.
  • Binary Fractions: Extend positional notation left of the decimal point with negative powers (e.g., 0.101 = \(1 \\cdot 2^{-1} + 0 \\cdot 2^{-2} + 1 \\cdot 2^{-3} = 0.5 + 0 + 0.125 = 0.625\) decimal).

    • Decimal to binary fraction: Multiply fractional part by 2, record integer part (repeat until 0 or desired precision).

Signed Binary Representations

Computers handle negative numbers using fixed bit widths (e.g., 8-bit, 16-bit).

  • Sign-Magnitude: MSB is sign (0=positive, 1=negative); rest is magnitude.
    • Example (4-bit): +3 = 0011, -3 = 1011.
    • Issues: Two zeros (+0=0000, -0=1000), complex arithmetic.
  • One's Complement: Negate by inverting bits.
    • Example (4-bit): +3 = 0011, -3 = ~0011 = 1100.
    • Issues: Still two zeros, end-around carry in addition.
  • Two's Complement (Most Common): Negate by inverting bits and adding 1.
    • Example (4-bit): +3 = 0011, -3 = ~0011 + 1 = 1100 + 1 = 1101.
    • Range (n bits): \(-2^{n-1}\) to \(2^{n-1}-1\) (e.g., 4-bit: -8 to 7).
    • Advantages: Single zero, seamless arithmetic (addition works for signed/unsigned).

Little-Endian and Big-Endian Representations

  • Big-Endian:

    • The most significant byte (MSB) is stored at the lowest memory address.
    • Matches human-readable order (e.g., writing 1234 from left to right).
    • Example: 32-bit integer 0x12345678 (decimal 305,419,896) in big-endian:
      • Memory layout (addresses increasing left to right): 12 34 56 78.
      • At address 0x1000: 12, 0x1001: 34, 0x1002: 56, 0x1003: 78.
  • Little-Endian:

    • The least significant byte (LSB) is stored at the lowest memory address.

Binary Arithmetic in Depth

  • Addition: Bit-by-bit with carry (e.g., 101 + 11 = 1000, carry propagates).
  • Subtraction: Add the two's complement of the subtrahend.
  • Multiplication: Shift and add (e.g., 101 * 10 = 101 \<< 1 + 101 \<< 0 = 1010 + 0000 = 1010 or 5*2=10).
  • Division: Shift and subtract, or use repeated subtraction.
  • Hardware handles these in the ALU, but software can emulate for big integers.

Bitwise Operations

Bitwise operations manipulate individual bits, offering fast, memory-efficient alternatives to arithmetic:

  • AND (&): Sets a bit to 1 if both inputs are 1 (e.g., 101 & 110 = 100).
  • OR (|): Sets a bit to 1 if either input is 1 (e.g., 101 | 110 = 111).
  • XOR (^): Sets a bit to 1 if inputs differ (e.g., 101 ^ 110 = 011).
  • NOT (~): Inverts bits (e.g., ~101 = 010 in 3-bit).
  • Shifts (\<<, >>): Move bits left or right (e.g., 101 \<< 1 = 1010, doubles the value; 101 >> 1 = 010, halves it).

Use Cases:

  • Flags: Store multiple boolean values in one byte (e.g., permissions: read = 001, write = 010).
  • Optimization: Check if a number is odd (x & 1) faster than modulo (x % 2).
  • Masking: Extract specific bits (e.g., value & 0xFF gets the lowest 8 bits).

Floating-Point Arithmetic

IEEE 754 Representation

The IEEE 754 standard defines how floating-point numbers are stored, typically in 32-bit (single precision) or 64-bit (double precision) formats.

  • Components:

    • Sign Bit: 1 bit (0 = positive, 1 = negative).
    • Exponent: Stored with a bias (e.g., 127 for single precision) to represent positive and negative powers of 2.
    • Mantissa (Fraction): The significant digits, normalized to 1.xxxx (implicit leading 1 not stored).
  • Single Precision (32-bit):

    • Sign: 1 bit.
    • Exponent: 8 bits (biased by 127, range -126 to +127).
    • Mantissa: 23 bits (plus implicit 1, giving 24-bit precision).
    • Example:

      3.14 ≈ \(1.57 \\times 2^1\):

      - Sign: 0 (positive).
      - Exponent: 1 + 127 = 128 = 10000000 (binary).
      - Mantissa: 1.57 (normalized) ≈ 10010001111010111000010 (23 bits).
      - Stored as: 0 10000000 10010001111010111000010.
    
  • Double Precision (64-bit):

    • Sign: 1 bit.
    • Exponent: 11 bits (biased by 1023, range -1022 to +1023).
    • Mantissa: 52 bits (plus implicit 1).
    • Offers higher precision and range but uses more memory and CPU cycles.
  • Special Values:

    • Zero: Exponent and mantissa all 0 (sign bit determines +0 or -0).
    • Infinity: Exponent all 1s, mantissa 0 (e.g., result of 1/0).
    • NaN (Not a Number): Exponent all 1s, non-zero mantissa (e.g., 0/0, sqrt(-1)).
    • Subnormal Numbers: Exponent all 0s, non-zero mantissa, for very small numbers (lose precision).

Floating-Point Operations

The CPU’s FPU performs operations like addition, subtraction, multiplication, division, and square root:

  • Addition/Subtraction: Align exponents (shift smaller number’s mantissa), add/subtract mantissas, normalize result, and adjust exponent.
    • Example: 1.5 + 2.5 = 4.0 involves aligning \(1.5 \\times 2^0\) and \(1.25 \\times 2^1\), adding mantissas, and normalizing.
  • Multiplication: Multiply mantissas, add exponents (subtract bias), normalize.
  • Division: Divide mantissas, subtract exponents, normalize.
  • Rounding: Results are rounded to fit the mantissa (e.g., round-to-nearest, round-to-zero), causing precision loss.

Performance: Floating-point operations are slower than integer ones (e.g., 2-5 cycles for addition vs. 1 for integer addition) due to complex normalization and rounding. SIMD instructions (e.g., SSE/AVX on x86) accelerate vectorized floating-point operations. Why it matters: Operations can amplify precision errors, and their cost impacts performance in math-heavy applications like graphics or machine learning.

Precision and Errors

Floating-point numbers are approximations, leading to errors:

  • Rounding Errors: Finite mantissa bits cause small inaccuracies (e.g., 0.1 + 0.2 = 0.30000004 in single precision).
  • Catastrophic Cancellation: Subtracting nearly equal numbers loses significant digits (e.g., 1.234567 - 1.234566 = 0.000001, but precision may drop).
  • Overflow/Underflow: Results too large/small yield infinity or subnormal numbers.
  • Non-Associativity: (a + b) + c ≠ a + (b + c) due to rounding (e.g., adding many small numbers to a large one).

Mitigation:

  • Use double precision for higher accuracy (but more memory).
  • Use fixed-point or decimal arithmetic (e.g., Python’s decimal, Java’s BigDecimal) for exact calculations (e.g., finance).
  • Rearrange computations to minimize cancellation (e.g., add small numbers first).

Performance Considerations

  • Hardware Support: Modern CPUs have dedicated FPUs or SIMD units (e.g., Intel’s AVX-512 for 16 parallel float operations). Use compiler flags (e.g., -mavx) or intrinsics for optimization.
  • Memory Usage: Single precision (4 bytes) vs. double (8 bytes) affects cache efficiency and bandwidth.
  • Vectorization: Libraries like NumPy use SIMD for fast array operations.
  • Software Fallbacks: Without FPU support (e.g., some embedded systems), floating-point is emulated, slowing performance.

Binary-Coded Decimal (BCD)

BCD encodes decimal digits (0-9) using 4 bits per digit, ensuring exact decimal representation:

  • Representation:
    • Decimal 42 = 0100 0010 (4 = 0100, 2 = 0010).
    • Packed BCD: Two digits per byte (e.g., 01000010 for 42).
    • Unpacked BCD: One digit per byte (e.g., 00000100 00000010).
  • Advantages: Avoids floating-point errors, simplifies decimal display (e.g., calculators).
  • Disadvantages: Less efficient (e.g., 9999 needs 16 bits in BCD vs. 14 in binary).
  • Use Cases: Financial systems, embedded displays, legacy hardware (e.g., x86 BCD instructions like DAA).

Boolean Algebra

Boolean algebra is the mathematical language of digital logic. It operates on binary values (0 = false, 1 = true) using three core operations: AND, OR, and NOT. Every CPU instruction, memory flag, and hardware gate is built from Boolean functions. Mastering it lets you write branchless, cache-friendly, and ultra-fast code, debug low-level systems, and even design custom hardware.

Formal Foundations

A Boolean algebra is a 6-tuple (S, +, ·, ¬, 0, 1) where:

  • S = {0, 1} — the set of truth values
  • + = OR (logical disjunction)
  • · = AND (logical conjunction)
  • ¬ = NOT (complement)
  • 0 = identity element for OR
  • 1 = identity element for AND

The system satisfies commutativity, associativity, distributivity, identity, complement, and absorption laws.

Primitive Operations & Truth Tables

Operation Symbol Name Truth Table
AND A · B or A & B Conjunction 0 0 → 0
0 1 → 0
1 0 → 0
1 1 → 1
OR A + B or A \| B Disjunction 0 0 → 0
0 1 → 1
1 0 → 1
1 1 → 1
NOT ¬A or ~A Negation 0 → 1
1 → 0

Derived Gates

Gate Expression When True
NAND ¬(A · B) False only if both true
NOR ¬(A + B) True only if both false
XOR A ⊕ B = (A · ¬B) + (¬A · B) True if inputs differ
XNOR ¬(A ⊕ B) True if inputs are equal

Universal Gates: NAND and NOR can implement any Boolean function. This is why NAND flash dominates memory.

Complete Boolean Laws

Law Expression
Identity A + 0 = A, A · 1 = A
Dominance A + 1 = 1, A · 0 = 0
Idempotence A + A = A, A · A = A
Complement A + ¬A = 1, A · ¬A = 0
Involution ¬(¬A) = A
Commutative A + B = B + A, A · B = B · A
Associative (A + B) + C = A + (B + C)
Distributive A · (B + C) = (A · B) + (A · C)
Absorption A + (A · B) = A, A · (A + B) = A
De Morgan ¬(A + B) = ¬A · ¬B, ¬(A · B) = ¬A + ¬B
Consensus (A·B) + (¬A·C) + (B·C) = (A·B) + (¬A·C)

Canonical Forms

  • Sum of Products (SOP): F = Σ minterms Example: F(A,B) = A¬B + AB = m1 + m3

  • Product of Sums (POS): F = Π maxterms Example: F(A,B) = (A+B)(¬A+B) = M0 · M2

Every function has a unique SOP/POS representation.

Simplification Techniques

  • Algebraic Reduction
F = A·B + A·¬B + ¬A·B
  = A·(B + ¬B) + ¬A·B
  = A·1 + ¬A·B
  = A + ¬A·B
  = A + B  (absorption)
  • Karnaugh Map (K-map) For F(A,B,C) with 1s at 001, 011, 110, 111:
AB\C 00 01 11 10
00 0 1 1 0
01 0 0 1 0
11 1 1 1 1
10 0 0 1 0

→ Group: BC (quad) + AB (pair) → F = BC + AB

  • Quine-McCluskey: Used in EDA tools for functions with >6 variables.

Boolean Logic in the CPU (ALU)

The ALU contains parallel arrays:

  • AND gate array → bitwise AND
  • OR gate array → bitwise OR
  • XOR gate array → addition, subtraction
  • NOT array → inversion
  • MUX → selects result based on opcode

Example Opcode Table:

Opcode Operation
000 AND
001 OR
010 ADD
110 NOT

Branch prediction: if (ZF · ¬CF) → taken

Boolean Algebra in Code

Bitwise Tricks

// Is power of 2?
bool is_pow2(uint32_t x) {
    return x > 0 && (x & (x - 1)) == 0;
}

// Swap without temp
void swap(int* a, int* b) {
    *a ^= *b;
    *b ^= *a;
    *a ^= *b;
}

// Count set bits (Brian Kernighan)
int popcount(uint32_t x) {
    int count = 0;
    while (x) {
        x &= x - 1;
        count++;
    }
    return count;
}

Bit Masks & Permissions

#define FLAG_READ  (1 << 0)  // 0b0001
#define FLAG_WRITE (1 << 1)  // 0b0010
#define FLAG_EXEC  (1 << 2)  // 0b0100

uint8_t perms = 0;
perms |= FLAG_READ | FLAG_WRITE;
perms &= ~FLAG_EXEC;

if (perms & FLAG_READ) {
    printf("Read allowed\n");
}

Conditional Optimization

// Before (redundant)
if ((x > 0 && y > 0) || (x > 0 && z > 0)) { ... }

// After (distributive law)
if (x > 0 && (y > 0 || z > 0)) { ... }  // Fewer comparisons

Advanced Applications

Multiplexer (MUX)

Y = (S · A) + (¬S · B)

Used in ALU to select add/subtract path.

D Flip-Flop (Sequential Logic)

D Q(t+1)
0 0
1 1

Built from 6 NAND gates — foundation of registers.

Finite State Machine (FSM)

NextState = f(CurrentState, Input)
Output    = g(CurrentState, Input)

Example: Traffic light controller with Boolean state encoding.

SAT Solvers & Formal Verification

Reduce circuit to CNF → find satisfying assignment Tools: Z3, MiniSat, ABC

Cryptography & Security

  • One-Time Pad: C = P ⊕ K, P = C ⊕ K
  • AES S-box: 8→8 bit nonlinear Boolean function
  • Side-channel resistance: Use Boolean masking

Performance Impact

Operation CPU Cycles (x86) Energy
AND/OR 1 ~0.1 pJ
XOR 1 ~0.15 pJ
Shift 1 ~0.1 pJ
Compare 1–3 ~0.3 pJ

Combinatorics

Combinatorics is the study of counting, arrangement, and selection of objects. It's fundamental to probability, algorithm analysis, and system design. Understanding combinatorial principles helps you calculate complexity, design efficient algorithms, and reason about system scalability.

Fundamental Counting Principles

Multiplication Principle (Rule of Product)

If one operation can be performed in \(m\) ways and a second operation can be performed in \(n\) ways, then the two operations together can be performed in \(m \\times n\) ways.

  • Example: A password has 2 letters (26 choices each) followed by 3 digits (10 choices each).
    • Total passwords: \(26 \\times 26 \\times 10 \\times 10 \\times 10 = 676,000\)
  • Generalization: For \(k\) operations with \(n_1, n_2, \\ldots, n_k\) choices: \(n_1 \\times n_2 \\times \\cdots \\times n_k\)

Addition Principle (Rule of Sum)

If one task can be done in \(m\) ways and another task in \(n\) ways (mutually exclusive), then either task can be done in \(m + n\) ways.

  • Example: Choose a dessert: 5 cakes OR 3 pies = 8 total choices.

Inclusion-Exclusion Principle

For two sets \(A\) and \(B\):

\[|A \\cup B| = |A| + |B| - |A \\cap B|\]
  • Example: In a class of 30 students, 20 know Python, 15 know Java, and 8 know both.
    • Students knowing at least one: \(20 + 15 - 8 = 27\)
  • Three sets: \(|A \\cup B \\cup C| = |A| + |B| + |C| - |A \\cap B| - |A \\cap C| - |B \\cap C| + |A \\cap B \\cap C|\)

Permutations

A permutation is an ordered arrangement of objects. Order matters.

Permutations of n Distinct Objects

The number of ways to arrange \(n\) distinct objects is: $\(P(n, n) = n! = n \\times (n-1) \\times (n-2) \\times \\cdots \\times 2 \\times 1\)$

  • Example: Arrange 5 books on a shelf: \(5! = 120\) ways.

  • Factorial properties:

    • \(0! = 1\) (by definition)
    • \(n! = n \\times (n-1)!\)
    • For large \(n\): \(n! \\approx \\sqrt{2\\pi n} \\left(\\frac{n}{e}\\right)^n\) (Stirling's approximation)

Permutations of n Objects Taken r at a Time

The number of ways to arrange \(r\) objects from \(n\) distinct objects: $\(P(n, r) = \\frac{n!}{(n-r)!} = n \\times (n-1) \\times \\cdots \\times (n-r+1)\)$

  • Example: Choose and arrange 3 winners from 10 contestants: \(P(10, 3) = 10 \\times 9 \\times 8 = 720\)
  • Interpretation: First place: 10 choices, second: 9, third: 8.

Permutations with Repetition

When objects can be repeated: $\(n^r\)$

  • Example: 4-digit PIN with digits 0-9: \(10^4 = 10,000\) possibilities.

Permutations with Identical Objects

When some objects are identical, divide by the factorials of identical groups:

\[\\frac{n!}{n_1! \\times n_2! \\times \\cdots \\times n_k!}\]

where \(n_1 + n_2 + \\cdots + n_k = n\) (counts of identical objects).

  • Example: Arrange letters in "MISSISSIPPI" (1 M, 4 I, 4 S, 2 P):
    • Total: \(\\frac{11!}{1! \\times 4! \\times 4! \\times 2!} = \\frac{39,916,800}{1 \\times 24 \\times 24 \\times 2} = 34,650\)

Combinations

A combination is a selection of objects where order does NOT matter.

Combinations of n Objects Taken r at a Time

The number of ways to choose \(r\) objects from \(n\) distinct objects:

\[C(n, r) = \\binom{n}{r} = \\frac{n!}{r!(n-r)!} = \\frac{P(n, r)}{r!}\]
  • Example: Choose 3 members from 10 people for a committee: \(C(10, 3) = \\frac{10!}{3! \\times 7!} = \\frac{10 \\times 9 \\times 8}{3 \\times 2 \\times 1} = 120\)
  • Key difference from permutations: \(\\binom{10}{3} = 120\) vs \(P(10, 3) = 720\) (order doesn't matter in committees).

Binomial Coefficients

The notation \(\\binom{n}{r}\) is called a binomial coefficient and appears in the binomial theorem:

\[(x + y)^n = \\sum\_{r=0}^{n} \\binom{n}{r} x^{n-r} y^r\]

Properties:

  • Symmetry: \(\\binom{n}{r} = \\binom{n}{n-r}\)
  • Pascal's Identity: \(\\binom{n}{r} = \\binom{n-1}{r-1} + \\binom{n-1}{r}\) (forms Pascal's triangle)
  • Sum: \(\\sum\_{r=0}^{n} \\binom{n}{r} = 2^n\)
  • Boundary: \(\\binom{n}{0} = \\binom{n}{n} = 1\)

Combinations with Repetition

When objects can be selected more than once:

\[\\binom{n+r-1}{r} = \\frac{(n+r-1)!}{r!(n-1)!}\]
  • Example: Choose 5 fruits from 3 types (apples, oranges, bananas), allowing duplicates:
    • \(\\binom{3+5-1}{5} = \\binom{7}{5} = 21\) ways

Combinatorial Identities

Vandermonde's Identity:

\[\\binom{m+n}{r} = \\sum\_{k=0}^{r} \\binom{m}{k} \\binom{n}{r-k}\]

Hockey-Stick Identity:

\[\\sum\_{i=r}^{n} \\binom{i}{r} = \\binom{n+1}{r+1}\]

Applications in Computer Science

Algorithm Analysis

  • Subset generation: \(2^n\) subsets of \(n\) elements (exponential complexity).
  • String matching: \(n!\) permutations of a string of length \(n\) (brute force is infeasible).
  • Graph algorithms: \(\\binom{n}{2} = \\frac{n(n-1)}{2}\) possible edges in a graph with \(n\) vertices.

Hash Tables & Collisions

  • Birthday paradox: With \(n\) items and \(m\) buckets, probability of collision:
    • Approximation: \(1 - e^{-\\frac{n^2}{2m}}\)
    • For \(m=365\) (days), \(n=23\) gives ~50% collision probability.

Cryptography

  • Key space: \(2^n\) possible keys for \(n\)-bit key (e.g., AES-256: \(2^{256}\)).
  • Password strength: \(|\\Sigma|^L\) where \(\\Sigma\) is character set, \(L\) is length.

Database Query Optimization

  • Join orderings: For \(n\) tables, \((n-1)!\) possible join orders (optimizer uses heuristics).

Code Example: Generating Combinations

def combinations(n, r):
    """Calculate C(n, r) using dynamic programming."""
    # Base cases
    if r > n or r < 0:
        return 0
    if r == 0 or r == n:
        return 1

    # Use Pascal's triangle
    dp = [[0] * (r + 1) for _ in range(n + 1)]
    for i in range(n + 1):
        for j in range(min(i, r) + 1):
            if j == 0 or j == i:
                dp[i][j] = 1
            else:
                dp[i][j] = dp[i-1][j-1] + dp[i-1][j]
    return dp[n][r]

# Generate all combinations recursively
def generate_combinations(arr, r):
    """Generate all r-combinations of arr."""
    if r == 0:
        return [[]]
    if len(arr) < r:
        return []

    result = []
    # Include first element
    for combo in generate_combinations(arr[1:], r - 1):
        result.append([arr[0]] + combo)
    # Exclude first element
    result.extend(generate_combinations(arr[1:], r))
    return result

Probability

Probability quantifies uncertainty and randomness. It's essential for randomized algorithms, machine learning, system reliability, game theory, and performance modeling. Understanding probability enables you to make informed decisions under uncertainty and design robust systems.

Basic Concepts

Sample Space and Events

  • Sample Space (\(\\Omega\)): The set of all possible outcomes of an experiment.
    • Example: Rolling a die: \(\\Omega = \\{1, 2, 3, 4, 5, 6\\}\)
  • Event (\(E\)): A subset of the sample space.
    • Example: "Rolling an even number": \(E = \\{2, 4, 6\\}\)
  • Elementary Event: An event with a single outcome.
  • Complement: \(\\bar{E} = \\Omega \\setminus E\) (event does NOT occur).

Probability Axioms (Kolmogorov)

For any event \(E\) in sample space \(\\Omega\):

  1. Non-negativity: \(P(E) \\geq 0\)
  2. Normalization: \(P(\\Omega) = 1\)
  3. Additivity: For mutually exclusive events \(E_1, E_2, \\ldots\): $\(P(E_1 \\cup E_2 \\cup \\cdots) = P(E_1) + P(E_2) + \\cdots\)$

Consequences:

  • \(P(\\emptyset) = 0\)
  • \(P(\\bar{E}) = 1 - P(E)\)
  • \(P(E) \\leq 1\) for any event \(E\)

Classical Probability (Equally Likely Outcomes)

If all outcomes are equally likely:

\[P(E) = \\frac{\\text{Number of favorable outcomes}}{\\text{Total number of outcomes}} = \\frac{|E|}{|\\Omega|}\]
  • Example: Probability of rolling a 6: \(P(\\{6\\}) = \\frac{1}{6}\)

Empirical Probability (Frequency)

For repeated experiments:

\[P(E) \\approx \\frac{\\text{Number of times E occurred}}{\\text{Total number of trials}}\]
  • Law of Large Numbers: As trials increase, empirical probability converges to true probability.

Conditional Probability and Independence

Conditional Probability

The probability of event \(A\) given that event \(B\) has occurred:

\[P(A | B) = \\frac{P(A \\cap B)}{P(B)} \\quad \\text{(if } P(B) > 0\\text{)}\]
  • Example: Draw a card from a deck. Given it's red, probability it's a heart:
    • \(P(\\text{Heart} | \\text{Red}) = \\frac{P(\\text{Heart} \\cap \\text{Red})}{P(\\text{Red})} = \\frac{13/52}{26/52} = \\frac{1}{2}\)

Multiplication Rule

\[P(A \\cap B) = P(A | B) \\times P(B) = P(B | A) \\times P(A)\]
  • Generalization: \(P(A_1 \\cap A_2 \\cap \\cdots \\cap A_n) = P(A_1) \\times P(A_2 | A_1) \\times P(A_3 | A_1 \\cap A_2) \\times \\cdots\)

Independence

Events \(A\) and \(B\) are independent if:

\[P(A \\cap B) = P(A) \\times P(B)\]

Equivalently: \(P(A | B) = P(A)\) (knowing \(B\) doesn't affect \(A\)).

  • Example: Tossing two coins: \(P(\\text{First heads} \\cap \\text{Second heads}) = \\frac{1}{2} \\times \\frac{1}{2} = \\frac{1}{4}\)

  • Mutual Independence: Events \(A_1, A_2, \\ldots, A_n\) are mutually independent if:

    $\(P(A\_{i_1} \\cap A\_{i_2} \\cap \\cdots \\cap A\_{i_k}) = P(A\_{i_1}) \\times P(A\_{i_2}) \\times \\cdots \\times P(A\_{i_k})\)$ for any subset \(\\{i_1, i_2, \\ldots, i_k\\}\).

Bayes' Theorem

\[P(A | B) = \\frac{P(B | A) \\times P(A)}{P(B)}\]

Extended form (with partition):

\[P(A_i | B) = \\frac{P(B | A_i) \\times P(A_i)}{\\sum\_{j=1}^{n} P(B | A_j) \\times P(A_j)}\]

where \(\\{A_1, A_2, \\ldots, A_n\\}\) partitions the sample space.

  • Example (Medical Testing):
    • Disease prevalence: \(P(\\text{Disease}) = 0.01\)
    • Test sensitivity: \(P(\\text{Positive} | \\text{Disease}) = 0.95\)
    • Test specificity: \(P(\\text{Negative} | \\text{No Disease}) = 0.90\)
    • Given positive test, probability of disease: $\(P(\\text{Disease} | \\text{Positive}) = \\frac{0.95 \\times 0.01}{0.95 \\times 0.01 + 0.10 \\times 0.99} \\approx 0.0876\)$

Random Variables

A random variable \(X\) is a function that maps outcomes to real numbers: \(X: \\Omega \\rightarrow \\mathbb{R}\).

Discrete Random Variables

Takes countable values (e.g., integers).

Probability Mass Function (PMF):

\[p_X(x) = P(X = x)\]

Properties:

  • \(p_X(x) \\geq 0\) for all \(x\)
  • \(\\sum\_{x} p_X(x) = 1\)

Cumulative Distribution Function (CDF):

\[F_X(x) = P(X \\leq x) = \\sum\_{k \\leq x} p_X(k)\]

Continuous Random Variables

Takes uncountable values (e.g., real numbers).

Probability Density Function (PDF):

\[f_X(x) \\geq 0 \\quad \\text{and} \\quad \\int\_{-\\infty}^{\\infty} f_X(x) , dx = 1\]

CDF:

\[F_X(x) = P(X \\leq x) = \\int\_{-\\infty}^{x} f_X(t) , dt\]

Relationship:

\(f_X(x) = \\frac{d}{dx} F_X(x)\) (if differentiable).

Expected Value (Mean)

Discrete:

\[E[X] = \\mu = \\sum\_{x} x \\cdot p_X(x)\]

Continuous:

\[E[X] = \\mu = \\int\_{-\\infty}^{\\infty} x \\cdot f_X(x) , dx\]

Properties:

  • Linearity: \(E[aX + b] = aE[X] + b\)
  • Additivity: \(E[X + Y] = E[X] + E[Y]\) (always)
  • Independence: \(E[XY] = E[X] \\times E[Y]\) (if \(X\) and \(Y\) are independent)

Variance and Standard Deviation

Variance:

\[\\text{Var}(X) = \\sigma^2 = E[(X - \\mu)^2] = E[X^2] - (E[X])^2\]

Standard Deviation:

\[\\sigma = \\sqrt{\\text{Var}(X)}\]

Properties:

  • \(\\text{Var}(aX + b) = a^2 \\text{Var}(X)\)
  • \(\\text{Var}(X + Y) = \\text{Var}(X) + \\text{Var}(Y)\) (if \(X\) and \(Y\) are independent)

Common Discrete Distributions

Bernoulli Distribution (\(X \\sim \\text{Bernoulli}(p)\)):

  • Models single trial with success probability \(p\).
  • \(P(X = 1) = p\), \(P(X = 0) = 1-p\)
  • \(E[X] = p\), \(\\text{Var}(X) = p(1-p)\)

Binomial Distribution (\(X \\sim \\text{Binomial}(n, p)\)):

  • Number of successes in \(n\) independent Bernoulli trials.
  • \(P(X = k) = \\binom{n}{k} p^k (1-p)^{n-k}\) for \(k = 0, 1, \\ldots, n\)
  • \(E[X] = np\), \(\\text{Var}(X) = np(1-p)\)
  • Example: Number of heads in 10 coin flips: \(X \\sim \\text{Binomial}(10, 0.5)\)

Geometric Distribution (\(X \\sim \\text{Geometric}(p)\)):

  • Number of trials until first success.
  • \(P(X = k) = (1-p)^{k-1} p\) for \(k = 1, 2, \\ldots\)
  • \(E[X] = \\frac{1}{p}\), \(\\text{Var}(X) = \\frac{1-p}{p^2}\)

Poisson Distribution (\(X \\sim \\text{Poisson}(\\lambda)\)):

  • Models rare events over time/space.
  • \(P(X = k) = \\frac{\\lambda^k e^{-\\lambda}}{k!}\) for \(k = 0, 1, 2, \\ldots\)
  • \(E[X] = \\lambda\), \(\\text{Var}(X) = \\lambda\)
  • Example: Number of requests per second to a server.

Common Continuous Distributions

Uniform Distribution (\(X \\sim \\text{Uniform}(a, b)\)):

  • \(f_X(x) = \\frac{1}{b-a}\) for \(a \\leq x \\leq b\), 0 otherwise
  • \(E[X] = \\frac{a+b}{2}\), \(\\text{Var}(X) = \\frac{(b-a)^2}{12}\)

Exponential Distribution (\(X \\sim \\text{Exponential}(\\lambda)\)):

  • Models waiting times between events.
  • \(f_X(x) = \\lambda e^{-\\lambda x}\) for \(x \\geq 0\)
  • \(E[X] = \\frac{1}{\\lambda}\), \(\\text{Var}(X) = \\frac{1}{\\lambda^2}\)
  • Memoryless property: \(P(X > s+t | X > s) = P(X > t)\)

Normal (Gaussian) Distribution (\(X \\sim \\mathcal{N}(\\mu, \\sigma^2)\)):

  • \(f_X(x) = \\frac{1}{\\sigma\\sqrt{2\\pi}} e^{-\\frac{(x-\\mu)^2}{2\\sigma^2}}\)
  • \(E[X] = \\mu\), \(\\text{Var}(X) = \\sigma^2\)
  • Central Limit Theorem: Sum of many independent random variables approximates normal.
  • Standard Normal: \(Z \\sim \\mathcal{N}(0, 1)\) with CDF \(\\Phi(z)\)

Applications in Computer Science

Randomized Algorithms

  • QuickSort pivot selection: Random pivot gives \(O(n \\log n)\) expected time.
  • Monte Carlo methods: Estimate \(\\pi\) by random sampling.
  • Hash functions: Uniform distribution minimizes collisions.

System Reliability

  • Series system: \(P(\\text{System works}) = \\prod\_{i=1}^{n} P(\\text{Component } i \\text{ works})\)
  • Parallel system: \(P(\\text{System works}) = 1 - \\prod\_{i=1}^{n} (1 - P(\\text{Component } i \\text{ works}))\)

Load Balancing

  • Birthday paradox: With \(n\) requests and \(m\) servers, expected collisions help design hash-based load balancers.

Code Example: Monte Carlo Simulation

import random
import math

def estimate_pi(n_trials):
    """Estimate π using Monte Carlo method."""
    inside_circle = 0
    for _ in range(n_trials):
        x = random.uniform(-1, 1)
        y = random.uniform(-1, 1)
        if x**2 + y**2 <= 1:
            inside_circle += 1
    return 4 * inside_circle / n_trials

# Expected value: E[estimate] ≈ π
print(f"π estimate: {estimate_pi(1000000)}")  # ~3.141...

def binomial_pmf(n, p, k):
    """Calculate P(X = k) for X ~ Binomial(n, p)."""
    from math import comb
    return comb(n, k) * (p ** k) * ((1 - p) ** (n - k))

def expected_value_binomial(n, p):
    """E[X] for X ~ Binomial(n, p)."""
    return n * p

def variance_binomial(n, p):
    """Var(X) for X ~ Binomial(n, p)."""
    return n * p * (1 - p)

Statistics

Statistics provides methods to collect, analyze, interpret, and present data. It's crucial for A/B testing, performance monitoring, machine learning evaluation, and data-driven decision making. Understanding statistics enables you to draw valid conclusions from data and avoid common pitfalls.

Descriptive Statistics

Measures of Central Tendency

Mean (Arithmetic Average): $\(\\bar{x} = \\frac{1}{n} \\sum\_{i=1}^{n} x_i\)$

  • Properties: Sensitive to outliers.
  • Weighted mean: \(\\bar{x}\_w = \\frac{\\sum w_i x_i}{\\sum w_i}\)

Median:

  • Middle value when data is sorted (or average of two middle values if \(n\) is even).
  • Properties: Robust to outliers, 50th percentile.

Mode:

  • Most frequently occurring value(s).
  • Useful for categorical data.

Comparison:

  • Symmetric data: Mean ≈ Median ≈ Mode
  • Right-skewed: Mean > Median > Mode
  • Left-skewed: Mean < Median < Mode

Measures of Dispersion

Range: $\(\\text{Range} = \\max(x_i) - \\min(x_i)\)$

Variance (Sample): $\(s^2 = \\frac{1}{n-1} \\sum\_{i=1}^{n} (x_i - \\bar{x})^2\)$

Standard Deviation (Sample): $\(s = \\sqrt{s^2}\)$

Interquartile Range (IQR): $\(\\text{IQR} = Q_3 - Q_1\)$

where \(Q_1\) (25th percentile) and \(Q_3\) (75th percentile) are quartiles.

Coefficient of Variation: $\(\\text{CV} = \\frac{s}{\\bar{x}} \\times 100\\%\)$

(Relative measure of variability)

Percentiles and Quartiles

  • \(p\)-th percentile: Value below which \(p\\%\) of data falls.
  • Quartiles:
    • \(Q_1\): 25th percentile
    • \(Q_2\): 50th percentile (median)
    • \(Q_3\): 75th percentile

Five-Number Summary:

  • Minimum, \(Q_1\), Median, \(Q_3\), Maximum
  • Used in box plots.

Probability Distributions in Practice

Sampling Distributions

Sample Mean Distribution:

  • If \(X_1, X_2, \\ldots, X_n\) are i.i.d. with mean \(\\mu\) and variance \(\\sigma^2\):
    • \(E[\\bar{X}] = \\mu\)
    • \(\\text{Var}(\\bar{X}) = \\frac{\\sigma^2}{n}\)
    • Central Limit Theorem: \(\\bar{X} \\approx \\mathcal{N}(\\mu, \\frac{\\sigma^2}{n})\) for large \(n\)

Standard Error

Standard Error of the Mean: $\(\\text{SE} = \\frac{\\sigma}{\\sqrt{n}} \\approx \\frac{s}{\\sqrt{n}}\)$

Measures variability of sample mean across different samples.

Inferential Statistics

Point Estimation

Estimator: A statistic used to estimate a population parameter.

Properties of Good Estimators:

  • Unbiased: \(E[\\hat{\\theta}] = \\theta\)
  • Consistent: \(\\hat{\\theta} \\xrightarrow{p} \\theta\) as \(n \\to \\infty\)
  • Efficient: Minimum variance among unbiased estimators

Common Estimators:

  • Sample mean: \(\\bar{X} = \\frac{1}{n}\\sum X_i\) (unbiased for \(\\mu\))
  • Sample variance: \(S^2 = \\frac{1}{n-1}\\sum(X_i - \\bar{X})^2\) (unbiased for \(\\sigma^2\))

Confidence Intervals

A confidence interval provides a range of plausible values for a parameter with a specified confidence level.

95% Confidence Interval for Mean (Large Sample, Known \(\\sigma\)): $\(\\bar{x} \\pm 1.96 \\times \\frac{\\sigma}{\\sqrt{n}}\)$

95% Confidence Interval for Mean (Large Sample, Unknown \(\\sigma\)): $\(\\bar{x} \\pm t\_{0.025, n-1} \\times \\frac{s}{\\sqrt{n}}\)$

where \(t\_{0.025, n-1}\) is the critical value from \(t\)-distribution.

Interpretation: If we repeat the sampling process many times, 95% of intervals will contain the true parameter.

Hypothesis Testing

Null Hypothesis (\(H_0\)): Statement to be tested (often "no effect" or "no difference").

Alternative Hypothesis (\(H_1\) or \(H_a\)): Statement we want to prove.

Test Statistic: A value computed from sample data used to decide between \(H_0\) and \(H_1\).

\(p\)-value: Probability of observing data as extreme (or more extreme) than observed, assuming \(H_0\) is true.

Decision Rule:

  • If \(p\)-value < \(\\alpha\) (significance level, e.g., 0.05), reject \(H_0\)
  • Otherwise, fail to reject \(H_0\)

Type I Error (\(\\alpha\)): Rejecting \(H_0\) when it's true (false positive).

Type II Error (\(\\beta\)): Failing to reject \(H_0\) when it's false (false negative).

Power: \(1 - \\beta\) (probability of correctly rejecting \(H_0\)).

Common Tests

\(z\)-test (Large Sample, Known \(\\sigma\)): $\(z = \\frac{\\bar{x} - \\mu_0}{\\sigma / \\sqrt{n}}\)$

\(t\)-test (Small Sample, Unknown \(\\sigma\)): $\(t = \\frac{\\bar{x} - \\mu_0}{s / \\sqrt{n}}\)$

Follows \(t\)-distribution with \(n-1\) degrees of freedom.

Chi-Square Test:

  • Tests independence in contingency tables.
  • \(\\chi^2 = \\sum \\frac{(O_i - E_i)^2}{E_i}\) where \(O_i\) = observed, \(E_i\) = expected.

Correlation and Regression

Correlation

Pearson Correlation Coefficient: $\(r = \\frac{\\sum (x_i - \\bar{x})(y_i - \\bar{y})}{\\sqrt{\\sum (x_i - \\bar{x})^2 \\sum (y_i - \\bar{y})^2}} = \\frac{\\text{Cov}(X, Y)}{s_X s_Y}\)$

  • Range: \(-1 \\leq r \\leq 1\)
  • \(r = 1\): Perfect positive linear relationship
  • \(r = -1\): Perfect negative linear relationship
  • \(r = 0\): No linear relationship (but may have nonlinear relationship)

Interpretation:

  • \(|r| > 0.7\): Strong correlation
  • \(0.3 < |r| < 0.7\): Moderate correlation
  • \(|r| < 0.3\): Weak correlation

Causation vs. Correlation: Correlation does NOT imply causation.

Simple Linear Regression

Models relationship between dependent variable \(Y\) and independent variable \(X\): $\(Y = \\beta_0 + \\beta_1 X + \\epsilon\)$

Least Squares Estimates: $\(\\hat{\\beta_1} = \\frac{\\sum (x_i - \\bar{x})(y_i - \\bar{y})}{\\sum (x_i - \\bar{x})^2} = r \\frac{s_Y}{s_X}\)$

\[\\hat{\\beta_0} = \\bar{y} - \\hat{\\beta_1} \\bar{x}\]

Coefficient of Determination (\(R^2\)): $\(R^2 = r^2 = \\frac{\\text{Explained Variation}}{\\text{Total Variation}}\)$

  • Proportion of variance in \(Y\) explained by \(X\).
  • Range: \(0 \\leq R^2 \\leq 1\)

Applications in Computer Science

A/B Testing

  • Hypothesis: \(H_0\): No difference between variants, \(H_1\): Variants differ
  • Metrics: Conversion rate, click-through rate, revenue
  • Statistical significance: \(p\)-value < 0.05
  • Sample size: Power analysis to detect meaningful differences

Performance Monitoring

  • Latency percentiles: \(p50\) (median), \(p95\), \(p99\)
  • Throughput: Mean requests per second with confidence intervals
  • Anomaly detection: Values beyond \(\\mu \\pm 3\\sigma\) (3-sigma rule)

Machine Learning Evaluation

  • Accuracy: \(\\frac{\\text{TP} + \\text{TN}}{\\text{TP} + \\text{TN} + \\text{FP} + \\text{FN}}\)
  • Precision: \(\\frac{\\text{TP}}{\\text{TP} + \\text{FP}}\)
  • Recall: \(\\frac{\\text{TP}}{\\text{TP} + \\text{FN}}\)
  • F1-Score: \(2 \\times \\frac{\\text{Precision} \\times \\text{Recall}}{\\text{Precision} + \\text{Recall}}\)
  • Confusion Matrix: TP, TN, FP, FN counts

Database Query Optimization

  • Cardinality estimation: Using statistics to estimate result sizes
  • Selectivity: Fraction of rows matching a condition

Code Example: Statistical Analysis

import numpy as np
from scipy import stats

def descriptive_stats(data):
    """Calculate descriptive statistics."""
    return {
        'mean': np.mean(data),
        'median': np.median(data),
        'std': np.std(data, ddof=1),  # Sample std
        'variance': np.var(data, ddof=1),
        'min': np.min(data),
        'max': np.max(data),
        'q1': np.percentile(data, 25),
        'q3': np.percentile(data, 75),
        'iqr': np.percentile(data, 75) - np.percentile(data, 25)
    }

def confidence_interval_mean(data, confidence=0.95):
    """Calculate confidence interval for mean."""
    n = len(data)
    mean = np.mean(data)
    std = np.std(data, ddof=1)
    se = std / np.sqrt(n)

    alpha = 1 - confidence
    t_critical = stats.t.ppf(1 - alpha/2, df=n-1)

    margin = t_critical * se
    return (mean - margin, mean + margin)

def t_test_one_sample(data, mu0):
    """One-sample t-test: H0: μ = μ0 vs H1: μ ≠ μ0."""
    t_stat, p_value = stats.ttest_1samp(data, mu0)
    return {
        't_statistic': t_stat,
        'p_value': p_value,
        'reject_h0': p_value < 0.05
    }

def correlation_analysis(x, y):
    """Calculate correlation and perform test."""
    r, p_value = stats.pearsonr(x, y)
    return {
        'correlation': r,
        'p_value': p_value,
        'significant': p_value < 0.05
    }

# Example: Analyze response times
response_times = np.random.exponential(scale=100, size=1000)
stats_dict = descriptive_stats(response_times)
ci = confidence_interval_mean(response_times)
print(f"Mean: {stats_dict['mean']:.2f}ms")
print(f"95% CI: [{ci[0]:.2f}, {ci[1]:.2f}]ms")

Common Pitfalls

  1. Confusing correlation with causation: Ice cream sales and drowning are correlated (both increase in summer), but one doesn't cause the other.

  2. Multiple comparisons problem: Testing many hypotheses increases chance of false positives. Use Bonferroni correction: \(\\alpha\_{\\text{adjusted}} = \\frac{\\alpha}{n}\).

  3. Selection bias: Sample not representative of population.

  4. Survivorship bias: Only analyzing successful cases, ignoring failures.

  5. P-hacking: Manipulating data or tests until \(p\)-value < 0.05.

  6. Ignoring effect size: Statistical significance doesn't mean practical importance.

  7. Small sample sizes: Leads to unreliable estimates and low power.