Sorting Algorithms¶

Sorting algorithms rearrange elements in a collection (e.g., array or list) into a canonical order, typically non-decreasing (ascending). They are ubiquitous in computing, underpinning data processing in databases, search engines, and machine learning pipelines. Efficiency is measured by time complexity (comparisons/swaps), space (auxiliary), stability (preserves relative order of equals), and adaptability (e.g., to nearly-sorted input). Asymptotic analysis focuses on worst-case O(f(n)) for n elements, with comparison-based sorts bounded by Ω(n log n) (information-theoretic: log₂(n!) ≈ n log n - n comparisons needed for n! permutations).

1. Comparison-Based Sorting¶

Comparison-based sorting algorithms rely exclusively on pairwise comparisons (e.g., <, >, =) to determine element order, without exploiting specific key properties like integer ranges. This generality makes them suitable for arbitrary comparables (strings, objects), but imposes a theoretical lower bound of Ω(n log n) comparisons in the worst case, derived from the decision tree model: A sorting algorithm corresponds to a binary tree with n! leaves (permutations), requiring height at least log₂(n!) ≈ n log n - 1.405n + O(log n) (Stirling's approximation). Quadratics (O(n²)) dominate for small n (≤10^3) due to simplicity and low constants, but scale poorly; they excel in adaptive scenarios (nearly sorted) or educational contexts.

1. Bubble Sort¶

Bubble sort, also known as sinking sort, repeatedly steps through the list, compares adjacent elements, and swaps if out-of-order, "bubbling" larger elements to the end. It is a quintessential quadratic, named for the visual of large values rising like bubbles.

Core Mechanics:
- Outer loop: n-1 passes (i=0 to n-2), each fixing the last i+1 elements.
- Inner loop: Compare j and j+1 (j=0 to n-i-2), swap if arr[j] > arr[j+1].
- Optimization: Early termination if no swaps in a pass (sorted prefix detected).
- Inversion Reduction: Each swap fixes one adjacent inversion; total swaps = initial inversions I ≤ n(n-1)/2.

Complexity Analysis:

Scenario	Comparisons	Swaps	Time Derivation	Space	Adaptive?
Worst (reverse)	Θ(n²)	Θ(n²)	∑_{i=1}^{n-1} (n-i) = n(n-1)/2	O(1)	No
Average (uniform random)	Θ(n²)	Θ(n²/3)	Expected I = n(n-1)/4; each pass ~n/2 compares	O(1)	Yes (optimized)
Best (sorted)	Θ(n)	0	One pass detects no swaps	O(1)	Yes

Derivation: Worst: Reverse sorted → every adjacent pair swaps each pass. Average: For random permutation, prob(arr[j]>arr[j+1])=1/2 → expected swaps per pass ∑ (n-i)/2 = O(n²). Optimized: If sorted runs of length r, passes = n-r.
Stability Proof: Swaps only adjacent unequals; equals untouched, preserving order.
Cache Effects: Sequential access excellent (prefetch); but many passes cause thrashing in large n.

Advanced Variants:
- Cocktail Shaker Sort: Bidirectional passes (forward bubble large, backward small); reduces passes by ~14% avg. Complexity unchanged O(n²).
- Odd-Even Sort: Parallel variant; odd/even indexed pairs compare/swap simultaneously. PRAM time O(n) with n processors; used in odd-even transposition networks.
- Optimized with Block Swaps: For runs, merge blocks instead of bubbling (hybrid with insertion).
- Concurrent: Fine-grained locks per pair; lock-free via CAS swaps.
Pros/Cons and Optimizations:
- Pros: Simple (no recursion), in-place, stable; reveals inversions explicitly.
- Cons: O(n²) passes inefficient; non-adaptive without flag.
- Optimizations: Track last swap index (shrink inner loop); vectorized SIMD for adjacent compares (e.g., AVX2 in C++: 8 pairs/cycle).
- Theoretical Extensions: As a selection network, compares form a comparator graph; minimal comparators for sorting networks is O(n log n).
Applications: Educational (visualize passes), small datasets (n<100), distributed bubbling in parallel merge (e.g., MPI implementations).

Pseudocode (Optimized Bubble):

function bubbleSort(arr: array<T>):
    n = arr.length
    for i from 0 to n-2:
        swapped = false
        last_swap = 0  // Optional: track to shrink
        for j from 0 to n-i-2:
            if arr[j] > arr[j+1]:
                swap arr[j], arr[j+1]
                swapped = true
                last_swap = j
        if not swapped: return  // Early exit
        // Optional: Set inner end = last_swap

2. Insertion Sort¶

Insertion sort constructs a sorted prefix [0..i] by inserting the i-th element into its position via shifts. It mimics human card sorting, efficient for incremental data.

Core Mechanics:
- For i=1 to n-1: key = arr[i]; j=i-1; while j≥0 and arr[j]>key: arr[j+1]=arr[j], j--; arr[j+1]=key.
- Shifts rightward until hole found; one inversion fixed per shift.
- Natural Runs: Detect increasing sequences to skip (e.g., timsort precursor).

Complexity Analysis:

Scenario	Comparisons	Shifts	Time Derivation	Space	Adaptive?
Worst (reverse)	Θ(n²)	Θ(n²)	∑_{i=1}^{n-1} i = n(n-1)/2	O(1)	Yes
Average (uniform)	Θ(n²)	Θ(n²/4)	Expected shifts per i: i/2	O(1)	Yes
Best (sorted)	Θ(n)	0	One compare per i	O(1)	Yes

Derivation: For insertion at i, expected shifts i/2 (uniform prior). Total: ∑ i/2 = n²/4. Inversions I = ∑ shifts; sorted I=0 → O(n).
Stability Proof: Shifts preserve relative order (equals shift together).
Cache Effects: Excellent locality (access [j..i] sequentially); outperforms quick for n<50.

Advanced Variants:
- Binary Insertion Sort: Use binary search for position O(log i), but shifts still O(i) → O(n log n) compares, O(n²) time. Useful for large keys.
- Shell Sort: Gap sequence (e.g., 3x-1: 1,4,13,...); h-sorts on strides. Time O(n^{1.5}) avg; adaptive.
  - Derivation: Gaps reduce inversions exponentially; Pratt's sequence O(n log n).
- Library Sort: Gapped array (50% load) for insertions; O(n log n) expected.
- Concurrent: Threaded shifts (divide prefix); lock-free via atomic moves.
Pros/Cons and Optimizations:
- Pros: Stable, online (sorts arriving elements), highly adaptive; low constants.
- Cons: Quadratic shifts; not for random large n.
- Optimizations: Sentinel (min value at -1) avoids j≥0 check; binary for position if compares cheap.
- Theoretical Extensions: As a patience sorting, simulates Young tableaux; longest increasing subsequence O(n log n).
Applications: Timetable scheduling (insert events), small arrays in hybrids, online algorithms (streaming data).

Pseudocode (Standard Insertion):

function insertionSort(arr: array<T>):
    n = arr.length
    for i from 1 to n-1:
        key = arr[i]
        j = i - 1
        while j >= 0 and arr[j] > key:
            arr[j+1] = arr[j]
            j--
        arr[j+1] = key

3. Selection Sort¶

Selection sort divides into sorted prefix and unsorted suffix, selecting the minimum from suffix to append.

Core Mechanics:
- For i=0 to n-2: min_idx = i; for j=i+1 to n-1: if arr[j] < arr[min_idx]: min_idx=j; swap arr[i], arr[min_idx].
- One pass per position; no early exit.
Complexity Analysis:

Scenario Comparisons Swaps Time Derivation Space Adaptive?

Worst/Avg/Best Θ(n²) ≤n-1 ∑_{i=0}^{n-2} (n-i-1) = n(n-1)/2 O(1) No
- Derivation: Always n(n-1)/2 compares; swaps only when min ≠ i (avg n/2).
- Stability Proof: Fails—swaps may leapfrog equals (e.g., [2a,1,2b] → [1,2a,2b], 2a before 2b lost).
- Cache Effects: Poor—scans full suffix each time, random accesses.
Advanced Variants:
- Tournament Sort: Build comparison tree (O(n) build, O(n log n) extract); parallelizable.
- Cycle Sort: Minimize writes (O(n²) compares, O(n) writes); for write-heavy (EEPROM).
  - Mechanics: Follow cycles in permutation: Start at i, find correct pos, rotate cycle.
- Bidirectional Selection: Select max/min alternately for two ends.
- Concurrent: Parallel min-find in suffix (reduce O(n/p)).
Pros/Cons and Optimizations:
- Pros: Bounded swaps (n-1 max), simple; guaranteed O(n²).
- Cons: Non-adaptive, unstable; many compares.
- Optimizations: Early min tracking; vectorized min-scan (SIMD).
- Theoretical Extensions: As a selection network, minimal depth O(log n); relates to median-finding O(n).
Applications: Flash memory (few writes), tournament scheduling, when swaps costly.

Pseudocode (Standard Selection):

function selectionSort(arr: array<T>):
    n = arr.length
    for i from 0 to n-2:
        min_idx = i
        for j from i+1 to n-1:
            if arr[j] < arr[min_idx]:
                min_idx = j
        if min_idx != i:
            swap arr[i], arr[min_idx]

Comparative Analysis and Cross-Cutting Themes¶

Algorithm	Inversions Handled	Cache Locality	Parallel Potential	Hybrid Role
Bubble	Adjacent only	High (sequential)	High (odd-even)	Pre-sort runs
Insertion	All via shifts	Excellent (local)	Medium (threaded)	In timsort
Selection	Global min	Poor (full scans)	High (tournament)	Write-min

Inversion Tables: All quadratics reduce I by 1 per swap/shift; total time Θ(I + n).
Stability Trade-Off: Bubble/insertion yes (O(1) per equal); selection no (jumps).
Empirical: For n=1000 random, insertion ~2x faster than bubble (fewer cache misses); selection slowest.
Extensions: Quantum bubble O(n log n) theoretical; distributed (MapReduce: local sort + global merge).

2. Linearithmic Sorts¶

Linearithmic sorts achieve the near-optimal O(n log n) time complexity for comparison-based sorting, leveraging divide-and-conquer paradigms (recursive partitioning), in-place partitioning (e.g., quicksort), or heapification (priority queues). These transcend quadratics by exploiting recursion depth O(log n) and per-level work O(n), as per the Master Theorem: For T(n) = a T(n/b) + f(n), with a=2, b=2, f(n)=O(n), Case 2 yields Θ(n log n). They balance trade-offs in space (O(n) for merge vs. O(1) for heap), stability (merge yes, quick no), and adaptability (hybrids detect runs). Exhaustively, we dissect merge sort (stable divide-conquer), quicksort (in-place partition), heapsort (guaranteed in-place), and hybrids (e.g., timsort: adaptive merge-insertion). Derivations use recurrences; pseudocode neutral (Python/C++ notes); variants cover parallelism and external; themes include pivot selection (quicksort entropy) and natural run detection (hybrids).

1. Merge Sort¶

Merge sort exemplifies pure divide-and-conquer: Recursively split into halves, sort, and merge sorted subarrays. Its stability and predictability make it foundational for external sorting.

Core Mechanics:
- Divide: Mid = floor((low + high)/2); recurse on [low..mid], [mid+1..high].
- Conquer: Merge via two pointers on temp arrays, advancing smaller.
- Base: Singletons sorted.
- Run Detection: In hybrids, identify increasing runs to skip recursion.
Complexity Analysis:

Scenario Time Space Stable? Adaptive?

Worst/Avg/Best Θ(n log n) Θ(n) aux (temp) Yes No (pure)
- Derivation: Recurrence T(n) = 2 T(n/2) + Θ(n) merge. Unroll: log n levels, each Θ(n) work → Θ(n log n). Space: O(n) recursion stack + temp; in-place variants O(1) aux but O(n²) time worst.
- Stability Proof: In merge, when equal, take left (preserves relative order).
- Cache Effects: Poor recursion (random access); bottom-up iterative sequential better.
Advanced Variants:
- Bottom-Up Merge Sort: Iterative; start with size=1, double to n, merge pairs. O(n log n), no recursion stack.
- In-Place Merge Sort: Block-based (merge small runs in-place via rotations); time O(n log n) avg, O(n²) worst (complex rotations).
- External Merge Sort: For disk: K-way merge (heap on runs); O(n log n / log B) I/O, B=block size.
- Parallel Merge Sort: Fork subarrays (e.g., C++ std::execution::par); PRAM O(log n) depth w/ n processors.
- Natural Merge Sort: Extend runs (e.g., timsort precursor); adaptive O(n log r) r=#runs.
Pros/Cons and Optimizations:
- Pros: Stable, worst-case guaranteed; parallel-friendly.
- Cons: O(n) space; not in-place.
- Optimizations: Reuse temp array (allocate once); merge small subarrays with insertion O(n log log n) hybrid.
- Theoretical Extensions: As a sorting network, depth O(log n); relates to FFT for O(n log n) convolutions.
Applications: External files (databases), stable multi-key sorts, parallel computing (e.g., GPU thrust::sort).

Pseudocode (Recursive Top-Down):

function mergeSort(arr: array<T>, low: int, high: int):
    if low < high:
        mid = (low + high) / 2
        mergeSort(arr, low, mid)
        mergeSort(arr, mid+1, high)
        merge(arr, low, mid, high)

function merge(arr: array<T>, low: int, mid: int, high: int):
    temp = allocate(high - low + 1)
    i = low, j = mid+1, k = 0
    while i <= mid and j <= high:
        if arr[i] <= arr[j]:  // Stable: <= favors left
            temp[k++] = arr[i++]
        else:
            temp[k++] = arr[j++]
    while i <= mid: temp[k++] = arr[i++]
    while j <= high: temp[k++] = arr[j++]
    for m from 0 to k-1: arr[low + m] = temp[m]
    free(temp)

2. Quicksort¶

Quicksort, by Hoare (1961), partitions around a pivot, recursing on subarrays. Its average-case prowess stems from balanced partitions, but worst-case requires randomization.

Core Mechanics:
- Pivot choice: Last/middle/random/median-of-3.
- Partition: Hoare (two pointers cross) or Lomuto (one pointer).
- Recurse on pivot; =pivot optional (3-way).
- Tail-call optimization: Iterate larger, recurse smaller (O(log n) stack).

Complexity Analysis:

Scenario	Time	Space	Stable?	Adaptive?
Worst (sorted, bad pivot)	O(n²)	O(n) stack	No	No
Average (random)	Θ(n log n)	O(log n) avg	No	Partial (3-way)
Best (balanced)	Θ(n log n)	O(log n)	No	No

Derivation: Avg: Assume uniform pivot rank k uniform [0,n] → T(n) = (1/n) ∑_{k=1}^n [T(k-1) + T(n-k) + O(n)] = 2/n ∫ T(x) dx + O(n) → T(n)/n ≈ 2 T(n/2)/(n/2) + O(1) → Θ(n log n). Worst: Unbalanced T(n)=T(n-1)+O(n)=O(n²).
Stability Proof: Fails—partitions may reorder equals across pivot.
Cache Effects: Excellent (sequential partition); recursion jumps mitigated by tail calls.

Advanced Variants:
- Randomized Quicksort: Random pivot; expected O(n log n) w.h.p. (entropy ≥ log(n!)/n ≈ log n -1).
- 3-Way Quicksort (Dutch National Flag): Partition <, =, >; O(n) on duplicates (Bentley-McIlroy).
- Introsort: Hybrid quick + heap + insertion; switches to heap at O(log n) recursion depth for O(n log n) worst.
- Yaroslavskiy Dual-Pivot: Two pivots for JDK; O(1.9 n log n) empirical.
- Parallel Quicksort: Task-steal subarrays; Cilk O(log n) span.
Pros/Cons and Optimizations:
- Pros: In-place, fast constants, cache-efficient.
- Cons: Unstable, worst O(n²); recursion depth.
- Optimizations: Median-of-3 pivot (reduces entropy); insertion for small n (<10).
- Theoretical Extensions: As a randomized algorithm, Chernoff bounds on partition balance.
Applications: General-purpose (C++ std::sort), in-memory databases, when space tight.

Pseudocode (Lomuto Partition, Randomized):

function quickSort(arr: array<T>, low: int, high: int):
    if low < high:
        // Randomize: swap arr[low + rand()%(high-low+1)], arr[low]
        pivot_idx = partition(arr, low, high)
        quickSort(arr, low, pivot_idx - 1)
        quickSort(arr, pivot_idx + 1, high)

function partition(arr: array<T>, low: int, high: int): int
    pivot = arr[high]
    i = low - 1
    for j from low to high-1:
        if arr[j] <= pivot:
            i++
            swap arr[i], arr[j]
    swap arr[i+1], arr[high]
    return i+1

3. Heapsort¶

Heapsort transforms the array into a heap (priority queue), then extracts max repeatedly, sifting down.

Core Mechanics:
- Build max-heap: From bottom (n/2 to 1), sift-down.
- Sort: For i=n-1 to 1: Swap root/i, sift-down on [0..i-1].
- Sift-down: Swap with larger child, recurse.
Complexity Analysis:

Scenario Time Space Stable? Adaptive?

Worst/Avg/Best Θ(n log n) O(1) No No
- Derivation: Build: ∑_{h=0}^{log n} (n/2^{h+1}) × O(h) = O(n) (geometric: 2n ∑ (h/2^h) = 2n). Extracts: n × O(log n) = O(n log n).
- Stability Proof: Fails—sift-down reorders equals arbitrarily.
- Cache Effects: Good build (bottom-up sequential); extracts jumpy.
Advanced Variants:
- Bottom-Up Heapsort: Floyd's method; O(n) build alternative.
- Smoothsort: Leonardo numbers for adaptive (O(n) sorted, O(n log n) random); heap variants.
- Weak Heapsort: Relaxed invariants for cache; O(n log n).
- Parallel Heapsort: Concurrent build/extract; GPU O(n log log n).
Pros/Cons and Optimizations:
- Pros: In-place, worst-case O(n log n); no recursion.
- Cons: Unstable, constants 2x merge; no partial sorts.
- Optimizations: d-ary heap (d=4 cache-better); ternary sift.
- Theoretical Extensions: As selection algorithm, finds k-th O(n).
Applications: When space critical (embedded), guaranteed time (real-time OS).

Pseudocode (Standard Heapsort):

function heapSort(arr: array<T>):
    n = arr.length
    // Build max-heap (1-based logical)
    for i from n/2 -1 downto 0:
        siftDown(arr, n, i)
    for i from n-1 downto 1:
        swap arr[0], arr[i]
        siftDown(arr, i, 0)

function siftDown(arr: array<T>, n: int, i: int):
    largest = i
    left = 2*i +1, right=2*i+2
    if left < n and arr[left] > arr[largest]: largest = left
    if right < n and arr[right] > arr[largest]: largest = right
    if largest != i:
        swap arr[i], arr[largest]
        siftDown(arr, n, largest)

4. Hybrid Sorts¶

Hybrids fuse linearithmics with quadratics for real-world efficiency, detecting patterns (runs, small n) to switch strategies.

Timsort (Merge + Insertion): Python/Java default; detects natural runs (increasing sequences).
- Mechanics: Find runs (min 32-64 elems), merge with insertion for small; galloping merge (exponential probe) for uneven.
- Complexity: O(n log n) worst; O(n) sorted; O(n) space.
- Derivation: r runs → O(n log r); worst r=n/32 → O(n log n). Galloping: O(m + log diff) merges.
- Stability: Yes (stable merge).
Introsort (Quick + Heap + Insertion): C++ std::sort; recursion depth limit.
- Mechanics: Quick until depth >2 log n, switch heap; insertion for n<16.
- Complexity: O(n log n) worst (heap fallback).
- Derivation: Quick avg fast; heap guarantees.
Other Hybrids: Grail sort (in-place stable merge); Wiki sort (block merge).
Pros/Cons: Empirical speed (2-3x pure); complex code.
Optimizations: Run length thresholds; SIMD merges.

Pseudocode Snippet (Timsort Run Detection):

function findRun(arr: array<T>, start: int): int
    i = start +1
    while i < n and arr[i-1] <= arr[i]: i++  // Increasing
    return i - start
// Then merge runs

Comparative Analysis and Cross-Cutting Themes¶

Sort	Recursion Depth	Pivot/Partition	Parallel Ease	Empirical Speed (n=10^6)
Merge	O(log n)	N/A (split)	High	Baseline
Quick	O(log n) avg	Critical	Medium	1.5-2x merge
Heap	O(1) (iter)	Heapify	Low	1.2x merge
Timsort	O(log n)	Run-based	Medium	2-3x quick

Pivot Entropy: Quicksort avg log n if balanced; median-of-3 reduces variance.
External/Parallel: Merge for disk; quick for shared-memory.
Stability Cost: Merge 20% slower than quick for non-stable needs.

3. Non-Comparison Sorts¶

Non-comparison sorts bypass the Ω(n log n) lower bound of comparison-based algorithms by exploiting key properties—such as fixed digit length in radix or bounded range in counting—enabling linear O(n) or O(n + k) time (k=range). These are radix or bucket variants, ideal for integers, strings, or floats with known distributions, but limited to discrete/ordinal keys (not general comparables). Theoretical foundations draw from counting arguments (e.g., pigeonhole for buckets) and digit decomposition (radix as multi-level counting). Exhaustively, we explore counting sort (range-bound), radix sort (multi-digit), and bucket sort (uniform hashing), including derivations (e.g., stable passes in LSD radix), pseudocode (neutral, Python/C++ adaptable), variants (MSD vs. LSD, parallel), stability proofs, cache effects, and concurrency. Space often O(n + k); assumptions: keys in [0..k-1] or d digits base b.

1. Counting Sort¶

Counting sort assumes keys in small range [0..k-1] (k ≤ n often), counting occurrences then placing via cumulative sums. It's a linear-time permutation, foundational for radix.

Core Mechanics:
- Count phase: Array count[k] increments for each arr[i].
- Cumulative: count[j] += count[j-1] for j=1 to k-1 (right-to-left for stability).
- Place phase: For i=n-1 downto 0: output[count[arr[i]]--] = arr[i].
- Handles negatives via offset (e.g., min_key subtract).
Complexity Analysis:

Scenario Time Space Stable? Adaptive?

Worst/Avg/Best Θ(n + k) Θ(k + n) aux (output) Yes No
- Derivation: Three linear passes: Count O(n + k), cumulative O(k), place O(n). Total Θ(n + k); if k=O(n), O(n).
- Stability Proof: Right-to-left placement + cumulative ensures earlier equals fill higher indices first (preserves order).
- Cache Effects: Sequential access excellent; count array fits L1 if k small.
Advanced Variants:
- Signed Counting: Shift keys: index = key - min_key; k = max - min +1.
- Multi-Key Counting: Chain for tuples (e.g., sort by (age, name): stable on name after age).
- Parallel Counting: Atomic increments on count (CAS); prefix-sum parallel O(n/p + log n).
- Sparse Variant: Hashmap for count if k >> n (O(n) space).
Pros/Cons and Optimizations:
- Pros: Linear, stable, simple; no comparisons.
- Cons: k >> n explodes space; range-bound.
- Optimizations: In-place if output=arr (destructive); bitmap for bits (k=2^w).
- Theoretical Extensions: As a permutation network, O(n) depth; relates to parallel prefix computation.
Applications: Small-range keys (e.g., exam scores 0-100), radix subroutine, GPU sorting (thrust::counting_sort).

Pseudocode (Standard Stable):

function countingSort(arr: array<int>, min_key: int, max_key: int):
    k = max_key - min_key + 1
    count = array<int>(k, 0)
    output = array<int>(arr.length)
    // Count
    for num in arr:
        count[num - min_key]++
    // Cumulative
    for i from 1 to k-1:
        count[i] += count[i-1]
    // Place right-to-left
    for i from arr.length-1 downto 0:
        idx = arr[i] - min_key
        output[count[idx] - 1] = arr[i]
        count[idx]--
    copy output to arr

2. Radix Sort¶

Radix sort treats keys as digit sequences (base b, d digits), sorting least/most significant digits via stable sub-sorts (usually counting). LSD (least) works for fixed-length; MSD (most) for variable.

Core Mechanics (LSD):
- For each digit position exp=1, b, b², ..., b^{d-1}: Stable sort on (key / exp) % b.
- Sub-sort: Counting on digit 0..b-1.
- LSD: Low digits first; propagates to higher via stability.
- MSD: Recursive on prefixes; uneven trees.
Complexity Analysis:

Scenario Time Space Stable? Adaptive?

Worst/Avg/Best Θ(d (n + b)) Θ(n + b) Yes Partial (MSD)
- Derivation: d passes × Θ(n + b) counting. For word-size keys (d log b = w=32/64 bits), d=w/log b; b=256 → O(n). If b=2, O(n log n) bits.
- Stability Proof: Inherent from counting sub-sorts; LSD ensures low-digit ties resolved by higher.
- Cache Effects: Sequential if b small; MSD recursive jumps.
Advanced Variants:
- MSD Radix: Recurse on digit groups; adaptive for variable lengths (e.g., strings: shorter first).
  - Mechanics: Partition into b buckets per digit; recurse non-empty; O(n + sum deg log n) where deg=digits.
- Signed LSD: Offset or two-pass (abs then sign).
- Parallel Radix: Prefix-sum on counts; GPU O(n log log n / p).
- Hybrid MSD/LSD: LSD for fixed, MSD for sparse strings.
- Byte-Wise: b=256 for 8-bit chunks; American Flag variant (in-place buckets).
Pros/Cons and Optimizations:
- Pros: Linear in n for fixed d; stable, fast for digits.
- Cons: d large or b>>n hurts; key-specific.
- Optimizations: Adaptive b (sqrt(n)); skip zero digits in MSD.
- Theoretical Extensions: As a multi-level bucket, O(n) for constant d; relates to van Emde Boas O(n log log u) for range u.
Applications: IP addresses (4 bytes), strings (suffix arrays via radix), distributed (MapReduce: local radix + global).

Pseudocode (LSD Radix, Base 10):

function radixSort(arr: array<int>, d: int):  // d = floor(log10 max) +1
    max_base = 10  // Or 256
    for exp from 1 to 10^d step *10:  // Powers
        countingSortOnDigit(arr, exp, max_base)  // Stable sub-sort

function countingSortOnDigit(arr: array<int>, exp: int, base: int):
    count = array<int>(base, 0)
    output = array<int>(arr.length)
    for num in arr:
        digit = (num / exp) % base
        count[digit]++
    for i from 1 to base-1:
        count[i] += count[i-1]
    for i from arr.length-1 downto 0:
        digit = (arr[i] / exp) % base
        output[count[digit]-1] = arr[i]
        count[digit]--
    copy output to arr

3. Bucket Sort¶

Bucket sort distributes elements into n buckets via hashing (e.g., floor(n * key) for [0,1)), then sorts buckets recursively or insertion.

Core Mechanics:
- Hash: bucket[i] = floor(n * arr[i]) % n; append to list/bucket.
- Sort each bucket (insertion if small).
- Concatenate.
- Assumes uniform keys; for non-uniform, power-of-two choices.
Complexity Analysis:

Scenario Time Space Stable? Adaptive?

Worst (skewed) O(n²) O(n) If sub-sort stable Yes (small buckets)

Average (uniform) Θ(n) O(n) Yes Yes

Best (1/bucket) Θ(n) O(n) Yes Yes
- Derivation: Avg: n buckets, each ~1 elem → O(n) total (O(1) sort/bucket). If uniform, binom(n,1/n) → O(1) per. Worst: All in one → O(n²) insertion.
- Stability Proof: Stable sub-sorts + append order preserves.
- Cache Effects: Good if buckets contiguous; lists scatter.
Advanced Variants:
- Proportional Buckets: Dynamic sizing for non-uniform (histograms).
- Recursive Bucket: MSD-like on fractions.
- Parallel Bucket: Hash to threads; local sort, merge.
- Flash Sort: Histogram + permutation fill; O(n) avg.
Pros/Cons and Optimizations:
- Pros: Linear uniform; simple.
- Cons: Assumes distribution; unstable sub-sort hurts.
- Optimizations: Use counting per bucket if range small; overflow to quicksort.
- Theoretical Extensions: As a hashing sorter, O(n) expected w/ perfect hash.
Applications: Uniform floats (graphics coords), external (disk buckets), ML data prep.

Pseudocode (Uniform [0,1) Keys):

function bucketSort(arr: array<float>):  // Assume 0 <= keys <1
    n = arr.length
    buckets = array<list<float>>(n)
    for key in arr:
        idx = floor(key * n)
        buckets[idx].append(key)
    for bucket in buckets:
        insertionSort(bucket)  // Or counting if discrete
    // Concat
    i = 0
    for bucket in buckets:
        for key in bucket:
            arr[i++] = key

Comparative Analysis and Cross-Cutting Themes¶

Sort	Key Assumption	Passes/Sub-Sorts	Parallel Ease	Space Overhead
Counting	Bounded [0,k)	3 linear	High (prefix)	O(k)
Radix LSD	Fixed d digits	d counting	Medium	O(n + b)
Bucket	Uniform dist	1 hash + n small	High	O(n)

Stability Chain: All rely on sub-sort stability; radix propagates.
Distribution Sensitivity: Bucket degrades on skew; radix robust for digits.
Hybrids: Radix + quick for large k; counting in MSD leaves.
Extensions: Quantum O(n √k) for range; distributed (Hadoop: local radix).

4. Advanced Topics: Stability, Hybrids, and External¶

Stability: Preserves equals' order (merge, insertion yes; quick no). Use for multi-key (sort by second, stable on first).
In-Place: O(1) aux (quick, heap); vs. O(n) (merge).
Adaptive: Faster on sorted (insertion O(n); timsort hybrid).
Hybrid Sorts: Timsort (Python): Merge + insertion for runs; introsort (C++ std::sort): Quick + heap + insertion.
- Complexity: O(n log n) worst; detects natural runs O(n).
External Sorting: Disk-based (n >> RAM); polyphase merge O(n log n / log B) B=block.
- Mechanics: Chunk to RAM, sort, multi-way merge files.
Parallel: Sample sort (PRAM O(log n)); bitonic O(log² n).
Quantum: O(n log n) theoretical; practical Grover hybrids.

Linearithmic	Time (Avg/Worst)	Space	Stable?	In-Place?
Merge	O(n log n)	O(n)	Yes	No
Quick	O(n log n)/O(n²)	O(log n)	No	Yes
Heap	O(n log n)	O(1)	No	Yes
Timsort	O(n log n)/O(n log n)	O(n)	Yes	Partial

Comparative Analysis and Applications¶

Algorithm	Best Case	Worst Input	Use Case
Bubble/Insertion	Nearly sorted	Reverse	Small/educational
Merge	Any	Any	External, stable
Quick	Random	Sorted	General in-place
Heap	Any	Any	Guaranteed time
Radix	Small keys	Large keys	Strings/ints

Scenario	Time	Space	Stable?	Adaptive?
Worst (skewed)	O(n²)	O(n)	If sub-sort stable	Yes (small buckets)
Average (uniform)	Θ(n)	O(n)	Yes	Yes
Best (1/bucket)	Θ(n)	O(n)	Yes	Yes