Searching Algorithms¶
Searching algorithms are fundamental procedures in computer science designed to locate a target element within a data structure, such as an array, list, or graph. They are pivotal for efficiency in applications like database queries, information retrieval, and machine learning feature matching. The choice of algorithm hinges on the data's organization (sorted/unsorted, bounded/unbounded), distribution (uniform/random), and access patterns (random/sequential). Asymptotic analysis—primarily worst-case time complexity O(f(n)) for input size n—guides selection, with space often O(1) auxiliary.
1. Linear Search¶
Linear (or sequential) search is the simplest, iterating through elements until the target is found or the end is reached. It requires no preprocessing, making it versatile for unsorted data.
-
Mechanics: Start from index 0; compare sequentially. Early exit on match.
-
Complexity:
- Time: O(n) worst (target absent/end); O(1) best (first position).
- Space: O(1) auxiliary.
- Derivation: Up to n comparisons; average (uniform random target) Θ(n/2) = O(n).
-
Variants:
- Sentinel Linear Search: Append target to array end; avoids bounds checks (O(n) always, but fewer instructions).
- Parallel Linear: Divide array into p chunks for p processors; O(n/p) time.
-
Pros/Cons: Universal (any structure); slow for large n. Optimal for small/unsorted.
-
Applications: Small lists, embedded systems, initial data validation.
Pseudocode:
function linearSearch(arr: array<T>, target: T): int
for i from 0 to arr.length - 1:
if arr[i] == target:
return i
return -1 // Not found
2. Binary Search¶
Binary search exploits sorted order to halve the search interval repeatedly, achieving logarithmic efficiency. Prerequisite: Sorted array (ascending/descending).
-
Mechanics: Compute mid; if target == mid, done; < mid → left half; > mid → right half.
-
Complexity:
- Time: O(log n) worst/best (halves each step; ~log₂ n comparisons).
- Space: O(1) iterative; O(log n) recursive stack.
- Derivation: Initial interval n; after k steps: n/2^k ≤1 → k ≥ log₂ n. Decision tree: 2^k leaves ≥ n+1 outcomes (found/not positions) → height log n.
-
Variants:
- Bounded Binary: For known ranges (e.g., indices 0 to n-1).
- Unbounded Binary: Exponential probe to bound (e.g., search [0, 2^k] where 2^{k-1} < n ≤ 2^k), then binary: O(log n).
- Lower/Upper Bound: Find first ≥/> target (useful for multisets).
-
Pros/Cons: Ultra-efficient on sorted; requires sort O(n log n) upfront. Fails on unsorted.
-
Applications: Dictionary lookups, version control (git bisect), sorted database indexes.
Pseudocode (Iterative):
function binarySearch(arr: array<T> (sorted), target: T): int
low = 0, high = arr.length - 1
while low <= high:
mid = low + (high - low) / 2 // Avoid overflow
if arr[mid] == target:
return mid
else if arr[mid] < target:
low = mid + 1
else:
high = mid - 1
return -1
3. Interpolation Search¶
Interpolation search estimates position based on uniform value distribution, probing closer to the target probabilistically.
-
Mechanics: Probe at estimated pos = low + (target - arr[low]) × (high - low) / (arr[high] - arr[low]).
-
Complexity:
- Time: O(log log n) average (uniform); O(n) worst (adversarial, e.g., skewed).
- Space: O(1).
- Derivation: Assumes uniform: Probe reduces interval by factor ~1 - 1/log n → log log n steps.
-
Variants: Double interpolation (probe both ends); for non-uniform, fallback to binary.
-
Pros/Cons: Faster than binary on uniform (e.g., telecom routing tables); degrades on clustered data.
-
Applications: Sorted uniform arrays (e.g., IP lookups, phone books).
Pseudocode:
function interpolationSearch(arr: array<T> (sorted uniform), target: T): int
low = 0, high = arr.length - 1
while low <= high and target >= arr[low] and target <= arr[high]:
if low == high:
if arr[low] == target: return low
return -1
pos = low + (target - arr[low]) * (high - low) / (arr[high] - arr[low])
if arr[pos] == target:
return pos
else if arr[pos] < target:
low = pos + 1
else:
high = pos - 1
return -1
4. Jump Search¶
Jump search divides the array into √n blocks, linearly scanning blocks then binary within the block.
-
Mechanics: Jump block size √n; linear in block to find crossing point, then linear/binary in block.
-
Complexity:
- Time: O(√n) worst; O(√n) average.
- Space: O(1).
- Derivation: √n jumps (each O(1) compare) + O(√n) in final block.
-
Variants: Optimized jump (adaptive size); for unbounded, exponential jumps.
-
Pros/Cons: Better than linear, no random access like binary? Wait, requires random access; simpler than binary for cache.
-
Applications: Legacy systems, when sort is partial.
Pseudocode:
function jumpSearch(arr: array<T> (sorted), target: T, n: int): int
step = sqrt(n)
block = 0
while block * step < n and arr[block * step] < target:
block++
// Linear in block to block+1
prev = block * step
while prev <= min((block+1)*step, n-1):
if arr[prev] == target: return prev
prev++
return -1
5. Hash-Based Search¶
For hash tables, search uses hash function to index buckets directly.
-
Mechanics: Compute h(key) % size; probe chain or open-address.
-
Complexity:
- Time: O(1) average; O(n) worst (collisions).
- Space: O(n) table.
- Derivation: Uniform hash → load α \<1 → O(1) probes.
-
Variants: Perfect hashing (no collisions, O(1) worst); cuckoo (evict).
-
Pros/Cons: Fastest average; hash quality critical.
-
Applications: Caches, databases (hash indexes).
Pseudocode (Chaining):
function hashSearch(table: HashTable, key): T
index = hash(key) % table.size
for node in table.buckets[index]:
if node.key == key:
return node.value
return null
6. Graph Search Algorithms¶
For graphs, search finds nodes/paths; non-array, uses queues/stacks.
- Breadth-First Search (BFS): Queue; shortest path unweighted.
- Time: O(V + E); Space: O(V).
- Derivation: Visit each vertex/edge once.
- Depth-First Search (DFS): Stack/recursion; paths, cycles.
- Time: O(V + E); Space: O(V) stack.
- Variants: Iterative Deepening (DFS + depth limit: O(b^d) complete); A* (heuristic-guided: f = g + h).
7. Advanced and Specialized Searches¶
- Ternary Search: Divide into thirds (two mids); for unimodal functions. O(log_{3/2} n) ~1.58 log n.
- Fibonacci Search: Uses Fibonacci bounds; O(log n) like binary, but no mid calc (F_k ≈ φ^k / √5).
- Derivation: Similar to binary, but intervals F_{k-1}, F_{k-2}.
- Exponential Search: For unbounded sorted: Double bound i=1,2,4,... until overshoot, then binary. O(log n).
- Approximate: Locality-Sensitive Hashing (LSH) for high-dim (e.g., nearest neighbors O(1) avg prob).
Comparative Analysis¶
| Algorithm | Prerequisite | Time (Worst/Avg) | Space | Best For |
|---|---|---|---|---|
| Linear | None | O(n)/O(n) | O(1) | Unsorted, small n |
| Binary | Sorted | O(log n)/O(log n) | O(1) | Sorted arrays |
| Interpolation | Sorted uniform | O(n)/O(log log n) | O(1) | Uniform distributions |
| Jump | Sorted | O(√n)/O(√n) | O(1) | Block-access efficient |
| Hash | Hash table | O(n)/O(1) | O(n) | Key-value lookups |
| BFS (Graph) | Graph | O(V+E)/O(V+E) | O(V) | Shortest unweighted paths |
| Ternary | Unimodal | O(log n)/O(log n) | O(1) | Optimization landscapes |
Advanced Considerations: Trade-Offs and Extensions¶
- Preprocessing: Sort + binary: Total O(n log n + log n) vs. linear O(n).
- Parallelism: PRAM binary O(log n / p); map-reduce for distributed graphs.
- Cache Effects: Jump/binary better locality than linear.
- Randomized: Skip lists (probabilistic binary-like: O(log n) expected).
- Quantum: Grover's O(√n) unstructured (theoretical).