Skip to content

Searching Algorithms

Searching algorithms are fundamental procedures in computer science designed to locate a target element within a data structure, such as an array, list, or graph. They are pivotal for efficiency in applications like database queries, information retrieval, and machine learning feature matching. The choice of algorithm hinges on the data's organization (sorted/unsorted, bounded/unbounded), distribution (uniform/random), and access patterns (random/sequential). Asymptotic analysis—primarily worst-case time complexity O(f(n)) for input size n—guides selection, with space often O(1) auxiliary.

Linear (or sequential) search is the simplest, iterating through elements until the target is found or the end is reached. It requires no preprocessing, making it versatile for unsorted data.

  • Mechanics: Start from index 0; compare sequentially. Early exit on match.

  • Complexity:

    • Time: O(n) worst (target absent/end); O(1) best (first position).
    • Space: O(1) auxiliary.
    • Derivation: Up to n comparisons; average (uniform random target) Θ(n/2) = O(n).
  • Variants:

    • Sentinel Linear Search: Append target to array end; avoids bounds checks (O(n) always, but fewer instructions).
    • Parallel Linear: Divide array into p chunks for p processors; O(n/p) time.
  • Pros/Cons: Universal (any structure); slow for large n. Optimal for small/unsorted.

  • Applications: Small lists, embedded systems, initial data validation.

Pseudocode:

function linearSearch(arr: array<T>, target: T): int
    for i from 0 to arr.length - 1:
        if arr[i] == target:
            return i
    return -1  // Not found

Binary search exploits sorted order to halve the search interval repeatedly, achieving logarithmic efficiency. Prerequisite: Sorted array (ascending/descending).

  • Mechanics: Compute mid; if target == mid, done; < mid → left half; > mid → right half.

  • Complexity:

    • Time: O(log n) worst/best (halves each step; ~log₂ n comparisons).
    • Space: O(1) iterative; O(log n) recursive stack.
    • Derivation: Initial interval n; after k steps: n/2^k ≤1 → k ≥ log₂ n. Decision tree: 2^k leaves ≥ n+1 outcomes (found/not positions) → height log n.
  • Variants:

    • Bounded Binary: For known ranges (e.g., indices 0 to n-1).
    • Unbounded Binary: Exponential probe to bound (e.g., search [0, 2^k] where 2^{k-1} < n ≤ 2^k), then binary: O(log n).
    • Lower/Upper Bound: Find first ≥/> target (useful for multisets).
  • Pros/Cons: Ultra-efficient on sorted; requires sort O(n log n) upfront. Fails on unsorted.

  • Applications: Dictionary lookups, version control (git bisect), sorted database indexes.

Pseudocode (Iterative):

function binarySearch(arr: array<T> (sorted), target: T): int
    low = 0, high = arr.length - 1
    while low <= high:
        mid = low + (high - low) / 2  // Avoid overflow
        if arr[mid] == target:
            return mid
        else if arr[mid] < target:
            low = mid + 1
        else:
            high = mid - 1
    return -1

Interpolation search estimates position based on uniform value distribution, probing closer to the target probabilistically.

  • Mechanics: Probe at estimated pos = low + (target - arr[low]) × (high - low) / (arr[high] - arr[low]).

  • Complexity:

    • Time: O(log log n) average (uniform); O(n) worst (adversarial, e.g., skewed).
    • Space: O(1).
    • Derivation: Assumes uniform: Probe reduces interval by factor ~1 - 1/log n → log log n steps.
  • Variants: Double interpolation (probe both ends); for non-uniform, fallback to binary.

  • Pros/Cons: Faster than binary on uniform (e.g., telecom routing tables); degrades on clustered data.

  • Applications: Sorted uniform arrays (e.g., IP lookups, phone books).

Pseudocode:

function interpolationSearch(arr: array<T> (sorted uniform), target: T): int
    low = 0, high = arr.length - 1
    while low <= high and target >= arr[low] and target <= arr[high]:
        if low == high:
            if arr[low] == target: return low
            return -1
        pos = low + (target - arr[low]) * (high - low) / (arr[high] - arr[low])
        if arr[pos] == target:
            return pos
        else if arr[pos] < target:
            low = pos + 1
        else:
            high = pos - 1
    return -1

Jump search divides the array into √n blocks, linearly scanning blocks then binary within the block.

  • Mechanics: Jump block size √n; linear in block to find crossing point, then linear/binary in block.

  • Complexity:

    • Time: O(√n) worst; O(√n) average.
    • Space: O(1).
    • Derivation: √n jumps (each O(1) compare) + O(√n) in final block.
  • Variants: Optimized jump (adaptive size); for unbounded, exponential jumps.

  • Pros/Cons: Better than linear, no random access like binary? Wait, requires random access; simpler than binary for cache.

  • Applications: Legacy systems, when sort is partial.

Pseudocode:

function jumpSearch(arr: array<T> (sorted), target: T, n: int): int
    step = sqrt(n)
    block = 0
    while block * step < n and arr[block * step] < target:
        block++
    // Linear in block to block+1
    prev = block * step
    while prev <= min((block+1)*step, n-1):
        if arr[prev] == target: return prev
        prev++
    return -1

For hash tables, search uses hash function to index buckets directly.

  • Mechanics: Compute h(key) % size; probe chain or open-address.

  • Complexity:

    • Time: O(1) average; O(n) worst (collisions).
    • Space: O(n) table.
    • Derivation: Uniform hash → load α \<1 → O(1) probes.
  • Variants: Perfect hashing (no collisions, O(1) worst); cuckoo (evict).

  • Pros/Cons: Fastest average; hash quality critical.

  • Applications: Caches, databases (hash indexes).

Pseudocode (Chaining):

function hashSearch(table: HashTable, key): T
    index = hash(key) % table.size
    for node in table.buckets[index]:
        if node.key == key:
            return node.value
    return null

6. Graph Search Algorithms

For graphs, search finds nodes/paths; non-array, uses queues/stacks.

  • Breadth-First Search (BFS): Queue; shortest path unweighted.
    • Time: O(V + E); Space: O(V).
    • Derivation: Visit each vertex/edge once.
  • Depth-First Search (DFS): Stack/recursion; paths, cycles.
    • Time: O(V + E); Space: O(V) stack.
  • Variants: Iterative Deepening (DFS + depth limit: O(b^d) complete); A* (heuristic-guided: f = g + h).

7. Advanced and Specialized Searches

  • Ternary Search: Divide into thirds (two mids); for unimodal functions. O(log_{3/2} n) ~1.58 log n.
  • Fibonacci Search: Uses Fibonacci bounds; O(log n) like binary, but no mid calc (F_k ≈ φ^k / √5).
    • Derivation: Similar to binary, but intervals F_{k-1}, F_{k-2}.
  • Exponential Search: For unbounded sorted: Double bound i=1,2,4,... until overshoot, then binary. O(log n).
  • Approximate: Locality-Sensitive Hashing (LSH) for high-dim (e.g., nearest neighbors O(1) avg prob).

Comparative Analysis

Algorithm Prerequisite Time (Worst/Avg) Space Best For
Linear None O(n)/O(n) O(1) Unsorted, small n
Binary Sorted O(log n)/O(log n) O(1) Sorted arrays
Interpolation Sorted uniform O(n)/O(log log n) O(1) Uniform distributions
Jump Sorted O(√n)/O(√n) O(1) Block-access efficient
Hash Hash table O(n)/O(1) O(n) Key-value lookups
BFS (Graph) Graph O(V+E)/O(V+E) O(V) Shortest unweighted paths
Ternary Unimodal O(log n)/O(log n) O(1) Optimization landscapes

Advanced Considerations: Trade-Offs and Extensions

  • Preprocessing: Sort + binary: Total O(n log n + log n) vs. linear O(n).
  • Parallelism: PRAM binary O(log n / p); map-reduce for distributed graphs.
  • Cache Effects: Jump/binary better locality than linear.
  • Randomized: Skip lists (probabilistic binary-like: O(log n) expected).
  • Quantum: Grover's O(√n) unstructured (theoretical).