Python¶
Python is a high-level, interpreted programming language known for its simplicity, readability, and versatility. Created by Guido van Rossum in 1991, it's widely used for web development, data analysis, automation, AI, and more. Python emphasizes code readability with its "zen" philosophy (e.g., "There should be one—and preferably only one—obvious way to do it").
Key features:
- Easy to learn: Uses indentation for code blocks instead of braces.
- Dynamically typed: No need to declare variable types.
- Interpreted: Runs code line-by-line, great for quick prototyping.
- Extensive libraries: Huge ecosystem (e.g., NumPy for math, Django for web).
- Cross-platform: Works on Windows, macOS, Linux.
Variables and Data Types¶
Variables are created on assignment. Python infers types automatically.
| Data Type | Description | Example |
|---|---|---|
int |
Whole numbers | x = 5 |
float |
Decimals | y = 3.14 |
str |
Text (strings) | name = "Alice" |
bool |
True/False | is_true = True |
list |
Ordered, mutable collection | fruits = ["apple", "banana"] |
tuple |
Ordered, immutable collection | coords = (10, 20) |
dict |
Key-value pairs | person = {"name": "Bob", "age": 30} |
set |
Unordered, unique items | unique = {1, 2, 3} |
Operations:
- Arithmetic:
+,-,*,/,//(floor division),%(modulo),**(exponent). - String concatenation:
"Hello" + " " + "World". - List indexing:
fruits[0](first item); slicing:fruits[1:3].
Control Structures¶
-
If-Else:
if x > 0: print("Positive") elif x < 0: print("Negative") else: print("Zero") -
Loops:
- For loop:
for fruit in fruits: print(fruit) - While loop:
while x < 5: x += 1
- For loop:
- Range:
for i in range(3):(0 to 2).
Functions¶
Define with def:
def greet(name):
return f"Hello, {name}!"
print(greet("World")) # Output: Hello, World!
- Parameters can have defaults:
def add(a, b=0): return a + b - Lambda (anonymous):
square = lambda x: x**2
Modules and Imports¶
Use built-in modules: import math; print(math.sqrt(16)) (outputs 4.0).
Install external via pip (e.g., pip install requests).
Asynchronous Programming (asyncio)¶
Python's async/await and asyncio provide cooperative concurrency for I/O-bound work (network, file, timers) without threading. Coroutines are defined with async def and awaited with await; the event loop schedules them.
import asyncio
async def fetch(url: str) -> str:
"""Simulate an I/O-bound request."""
await asyncio.sleep(0.5)
return f"Result of {url}"
async def main():
# Run two fetches concurrently
results = await asyncio.gather(
fetch("https://api.example.com/a"),
fetch("https://api.example.com/b"),
)
print(results) # ['Result of https://api.example.com/a', 'Result of https://api.example.com/b']
asyncio.run(main())
- Use
asyncio.run(coro)to run a top-level coroutine (Python 3.7+). - Use
asyncio.gather(*coros)to run many coroutines concurrently. - I/O libraries like
aiohttpandasyncpgprovide async versions of HTTP and PostgreSQL clients.
Bit Manipulation¶
Basic Bit Operators¶
Python provides several built-in operators for bit manipulation:
- & (AND): Returns 1 if both bits are 1
- | (OR): Returns 1 if at least one bit is 1
- ^ (XOR): Returns 1 if exactly one bit is 1
- ~ (NOT): Inverts all bits
- << (Left shift): Shifts bits to the left
- >> (Right shift): Shifts bits to the right
Examples¶
# Bitwise AND
a = 60 # 0011 1100
b = 13 # 0000 1101
print(a & b) # 12 (0000 1100)
# Bitwise OR
print(a | b) # 61 (0011 1101)
# Bitwise XOR
print(a ^ b) # 49 (0011 0001)
# Bitwise NOT (inverts all bits)
print(~a) # -61 (1100 0011 in 2's complement)
# Left Shift
print(a << 2) # 240 (1111 0000)
# Right Shift
print(a >> 2) # 15 (0000 1111)
Common Bit Manipulation Techniques¶
1. Check if a number is even or odd¶
def is_even(num):
return (num & 1) == 0
print(is_even(42)) # True
print(is_even(7)) # False
2. Check if the ith bit is set¶
def is_bit_set(num, i):
return (num & (1 << i)) != 0
print(is_bit_set(10, 1)) # True (10 is 1010 in binary, bit 1 is set)
print(is_bit_set(10, 0)) # False (bit 0 is not set)
3. Set the ith bit¶
def set_bit(num, i):
return num | (1 << i)
print(set_bit(10, 0)) # 11 (changes 1010 to 1011)
4. Clear the ith bit¶
def clear_bit(num, i):
return num & ~(1 << i)
print(clear_bit(10, 1)) # 8 (changes 1010 to 1000)
5. Toggle the ith bit¶
def toggle_bit(num, i):
return num ^ (1 << i)
print(toggle_bit(10, 0)) # 11 (changes 1010 to 1011)
print(toggle_bit(10, 1)) # 8 (changes 1010 to 1000)
6. Count set bits (Hamming weight)¶
def count_set_bits(num):
count = 0
while num:
count += num & 1
num >>= 1
return count
# Alternative using bin() function
def count_set_bits_alt(num):
return bin(num).count('1')
print(count_set_bits(10)) # 2 (1010 has two 1s)
Practical Applications¶
1. Bit Masking¶
# Using bit masks to store multiple boolean flags in a single integer
# Define flags
READ = 1 # 001
WRITE = 2 # 010
EXECUTE = 4 # 100
# Set permissions
permissions = 0
permissions |= READ # Add read permission
permissions |= WRITE # Add write permission
# Check permissions
has_read = permissions & READ != 0
has_write = permissions & WRITE != 0
has_execute = permissions & EXECUTE != 0
print(f"Read: {has_read}, Write: {has_write}, Execute: {has_execute}")
# Output: Read: True, Write: True, Execute: False
2. Power of Two¶
def is_power_of_two(num):
return num > 0 and (num & (num - 1)) == 0
print(is_power_of_two(16)) # True
print(is_power_of_two(18)) # False
Advanced Techniques¶
1. Swapping variables without a temporary variable¶
a = 5
b = 7
a = a ^ b
b = a ^ b
a = a ^ b
print(f"a = {a}, b = {b}") # a = 7, b = 5
2. Find the single number in an array where all other numbers appear twice¶
def find_single(nums):
result = 0
for num in nums:
result ^= num
return result
print(find_single([4, 1, 2, 1, 2])) # 4
Bit manipulation is particularly useful in algorithms requiring optimization, cryptography, low-level programming, and when working with binary data.
Mathematics and Geometry¶
Core Mathematical Libraries¶
- NumPy: The fundamental package for scientific computing in Python. It provides support for arrays, matrices, and many mathematical functions.
- SciPy: Built on NumPy, it adds more specialized math functions including optimization, linear algebra, integration, and statistics.
- SymPy: A library for symbolic mathematics, allowing you to work with algebraic expressions, perform calculus, and solve equations symbolically.
- Matplotlib: While primarily a plotting library, it's essential for visualizing mathematical functions and geometric shapes.
import numpy as np
# Basic arithmetic
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
print(a + b) # [5, 7, 9]
# Matrix operations
matrix_a = np.array([[1, 2], [3, 4]])
matrix_b = np.array([[5, 6], [7, 8]])
print(np.dot(matrix_a, matrix_b)) # Matrix multiplication
Geometry in Python¶
There are several approaches to working with geometry in Python:
1. Using NumPy and SciPy for Basic Geometry¶
import numpy as np
from scipy.spatial import distance
# Calculate Euclidean distance between two points
point1 = np.array([0, 0])
point2 = np.array([3, 4])
dist = distance.euclidean(point1, point2)
print(f"Distance: {dist}") # Distance: 5.0
# Calculate the area of a triangle using cross product
def triangle_area(p1, p2, p3):
# Convert points to vectors from origin
v1 = np.array(p2) - np.array(p1)
v2 = np.array(p3) - np.array(p1)
# Area is half the magnitude of the cross product
cross = np.cross(v1, v2)
return 0.5 * np.linalg.norm(cross)
area = triangle_area([0, 0], [1, 0], [0, 2])
print(f"Triangle area: {area}") # Triangle area: 1.0
2. Shapely for 2D Computational Geometry¶
Shapely is a Python package for manipulation and analysis of 2D geometric objects.
from shapely.geometry import Point, LineString, Polygon
# Create a point
point = Point(0, 0)
# Create a line
line = LineString([(0, 0), (1, 1), (1, 2)])
# Create a polygon
polygon = Polygon([(0, 0), (1, 0), (1, 1), (0, 1)])
# Check if point is in polygon
is_in = point.within(polygon)
print(f"Point is within polygon: {is_in}")
# Calculate polygon area
area = polygon.area
print(f"Polygon area: {area}") # Polygon area: 1.0
3. Matplotlib for Geometric Visualization¶
import matplotlib.pyplot as plt
import numpy as np
# Plot a circle
theta = np.linspace(0, 2*np.pi, 100)
radius = 5
x = radius * np.cos(theta)
y = radius * np.sin(theta)
plt.figure(figsize=(7, 7))
plt.plot(x, y)
plt.grid(True)
plt.axis('equal')
plt.title('Circle with radius 5')
plt.xlabel('x')
plt.ylabel('y')
# plt.show() # Uncomment to display
4. PyGeometry for 3D Geometry¶
For 3D geometry, libraries like PyGeometry or PyMesh provide tools for working with meshes and geometric operations.
Advanced Geometric Calculations¶
Computational Geometry Algorithms¶
- Convex Hull: SciPy provides functions to compute the convex hull of a set of points.
- Delaunay Triangulation: Useful for creating meshes and interpolating data.
- Voronoi Diagrams: For partitioning space into regions based on distance to points.
from scipy.spatial import ConvexHull, Delaunay
import numpy as np
import matplotlib.pyplot as plt
# Generate random points
points = np.random.rand(30, 2)
# Compute the convex hull
hull = ConvexHull(points)
# Compute Delaunay triangulation
tri = Delaunay(points)
# Plot
plt.figure(figsize=(10, 5))
# Plot convex hull
plt.subplot(1, 2, 1)
plt.plot(points[:,0], points[:,1], 'o')
for simplex in hull.simplices:
plt.plot(points[simplex, 0], points[simplex, 1], 'k-')
plt.title('Convex Hull')
# Plot Delaunay triangulation
plt.subplot(1, 2, 2)
plt.triplot(points[:,0], points[:,1], tri.simplices)
plt.plot(points[:,0], points[:,1], 'o')
plt.title('Delaunay Triangulation')
# plt.show() # Uncomment to display
Linear Algebra Applications¶
Linear algebra is fundamental to many geometric operations, and NumPy provides comprehensive support:
import numpy as np
# Define a 2D transformation matrix for rotation (45 degrees)
theta = np.radians(45)
rotation_matrix = np.array([
[np.cos(theta), -np.sin(theta)],
[np.sin(theta), np.cos(theta)]
])
# Define a point
point = np.array([1, 0])
# Apply rotation
rotated_point = np.dot(rotation_matrix, point)
print(f"Original point: {point}")
print(f"Rotated point: {rotated_point}")
Advanced Python Concepts and Techniques¶
Python is known for its simplicity and readability, but it also offers powerful advanced features that can enhance your programming skills and efficiency. This page covers key advanced Python concepts that every experienced developer should master.
- Decorators
- Context Managers
- Metaclasses
- Generator Expressions and Coroutines
- Descriptors
- Multithreading and Multiprocessing
- Abstract Base Classes
- Data Classes
- Type Hints
- AsyncIO
- Magic Methods (Dunder Methods)
- Collections Module
- itertools and functools
- Regular Expressions
- Exception Handling
- Enum
- Pathlib
- Memory Management and Garbage Collection
- Import System
- Serialization
1. Decorators¶
Decorators are one of Python's most powerful and idiomatic features for metaprogramming. They allow you to modify or enhance the behavior of functions, methods, or classes without altering their source code. At their core, decorators are higher-order functions (functions that take other functions as arguments and/or return functions). They leverage Python's first-class function support, where functions are treated as objects that can be passed around, assigned, and returned.
Basic Decorator¶
A decorator is a function that accepts another function (func) as input, defines an inner wrapper function that adds behavior around func, and returns the wrapper. The @decorator syntax is syntactic sugar for func = decorator(func).
Example:
import time
from functools import wraps
def timer(func):
@wraps(func) # Preserves metadata like __name__ and docstring
def wrapper(*args, **kwargs):
start = time.time()
result = func(*args, **kwargs)
end = time.time()
print(f"{func.__name__} took {end - start:.4f} seconds")
return result
return wrapper
@timer
def slow_function(n):
"""A slow function for demo purposes."""
time.sleep(1)
return n * 2
# Usage
result = slow_function(5)
print(f"Result: {result}")
Output:
slow_function took 1.0012 seconds
Result: 10
Key Insights:
*args, **kwargs make the wrapper universal, handling any argument signature. The wrapper must return the result of func to maintain the original function's output. Without @wraps(func), the wrapper overwrites metadata: slow_function.name would become 'wrapper', breaking introspection tools like help() or debuggers.
Parameterized Decorators (Decorator Factories)¶
Basic decorators are static. For dynamism, create a decorator factory: a function that returns a decorator. This lets the decorator accept its own arguments.
Example:
from functools import wraps
def repeat(times):
"""Factory: Returns a decorator that repeats a function N times."""
def decorator(func):
@wraps(func)
def wrapper(*args, **kwargs):
for _ in range(times):
result = func(*args, **kwargs)
return result # Return the last invocation's result
return wrapper
return decorator
@repeat(3) # Factory call: times=3
def greet(name):
"""Greets a person."""
print(f"Hello, {name}!")
# Usage
greet("Alice")
Output:
Hello, Alice!
Hello, Alice!
Hello, Alice!
Advanced Nuance: The closure over times captures the factory's argument. This is a closure in action—decorator and wrapper access times from the outer scope. For thread-safety or reentrancy, avoid mutable state in closures.
Class Decorators (classmethods)¶
Decorators aren't limited to functions; they can wrap classes to modify their behavior post-definition (e.g., adding methods, altering attributes). Example: A class decorator that auto-registers instances in a global registry.
from functools import wraps # Not strictly needed here, but good practice
class Registry:
_instances = []
@classmethod
def register(cls, class_to_decorate):
"""Class decorator: Adds instance to registry on creation."""
original_init = class_to_decorate.__init__
def new_init(self, *args, **kwargs):
original_init(self, *args, **kwargs)
cls._instances.append(self)
class_to_decorate.__init__ = new_init
return class_to_decorate
@Registry.register # Applies to the class
class MyClass:
def __init__(self, value):
self.value = value
def __repr__(self):
return f"MyClass({self.value})"
# Usage
obj1 = MyClass(42)
obj2 = MyClass(100)
print(Registry._instances) # [MyClass(42), MyClass(100)]
Deeper Dive: This monkey-patches __init__. For more complex modifications, iterate over __dict__ to add/override methods dynamically. Class decorators shine in frameworks (e.g., Flask's @app.route registers routes on a blueprint class).
Pitfall: Modifying dunder methods can break inheritance or metaclasses. Always test with issubclass and isinstance.
Advanced Techniques and Patterns¶
1. Preserving Signatures with functools.wraps and inspect
Beyond @wraps, for full signature preservation (e.g., type hints, annotations), use inspect.signature:
import functools
import inspect
def advanced_wrapper(func):
@functools.wraps(func)
def wrapper(*args, **kwargs):
sig = inspect.signature(func)
bound = sig.bind(*args, **kwargs)
bound.apply_defaults()
print(f"Called {func.__name__} with {bound.arguments}")
return func(*args, **kwargs)
# Copy full signature to wrapper
wrapper.__signature__ = inspect.signature(func)
return wrapper
@advanced_wrapper
def multiply(a: int, b: int = 2) -> int:
"""Multiplies a and b."""
return a * b
# Usage (preserves hints)
from inspect import signature
print(signature(multiply)) # (a: int, b: int = 2) -> int
print(multiply(3, b=4)) # Called multiply with {'a': 3, 'b': 4}\n12
This ensures tools like mypy or IDEs see the original signature.
2. Stackable Decorators and Execution Order
Decorators apply from bottom to top (innermost first). For logging + timing:
def logger(func):
@wraps(func)
def wrapper(*args, **kwargs):
print(f"Calling {func.__name__}")
return func(*args, **kwargs)
return wrapper
@timer
@logger # Applied after @timer
def compute(x):
return x ** 2
compute(10)
Output:
Calling compute
compute took 0.0001 seconds
The logger wraps the timer-wrapped function.
3. Caching with lru_cache (Built-in Advanced Example)
Python's functools.lru_cache is a parameterized decorator factory for memoization. It's advanced because it handles hashing, eviction (LRU), and typed keys.
from functools import lru_cache
@lru_cache(maxsize=128) # Factory args: maxsize, typed=False
def fibonacci(n: int) -> int:
if n < 2:
return n
return fibonacci(n-1) + fibonacci(n-2)
print(fibonacci(30)) # Fast due to cache
Pro Tip: For custom caches, implement with dict in a closure, but use user_function and cache_clear from @wraps for advanced control.
4. Enforcing Contracts (Design by Contract)
Decorators for preconditions/postconditions, like a simple @requires/@ensures:
def requires(condition):
def decorator(func):
@wraps(func)
def wrapper(*args, **kwargs):
if not condition(*args, **kwargs):
raise ValueError("Precondition failed")
return func(*args, **kwargs)
return wrapper
return decorator
def ensures(predicate):
def decorator(func):
@wraps(func)
def wrapper(*args, **kwargs):
result = func(*args, **kwargs)
if not predicate(result, *args, **kwargs):
raise ValueError("Postcondition failed")
return result
return wrapper
return decorator
@ensures(lambda res, x: res > x)
@requires(lambda x: x > 0)
def square(x: int) -> int:
return x * x
print(square(5)) # 25
# square(-1) # Raises ValueError: Precondition failed
This pattern scales to full contract programming libraries like deal.
5. Best Practices and Common Pitfalls
- Use
@wrapsAlways: Prevents debugging headaches. - Avoid Side Effects in Decorators: Pure functions preferred; log/metrics are OK if idempotent.
- Performance Overhead: Wrappers add call stack depth—profile with
cProfile. - Debugging Closures: Use
nonlocalfor mutable closure vars; inspect with dis module. - Metaclass Synergy: For class decorators, combine with metaclasses for ultimate customization (e.g., Django models).
- Pitfall - Infinite Recursion: If a decorator calls the function inside itself without care, boom. Always use
funcreference. - Typing Decorators: Use
typing.Protocolorinspectfor generic type preservation in stubs.
Real-World Applications
- Web Frameworks: Flask/Django route decorators bind URLs to views.
- Testing:
pytestfixtures as decorators. - Async:
@asyncio.coroutineor modern@asynccontextmanager. - Security:
@login_requiredin auth systems.
2. Context Managers¶
Context managers are a cornerstone of Python's resource management and RAII (Resource Acquisition Is Initialization) idiom. They ensure that resources (files, locks, database connections, etc.) are properly acquired and released, even in the face of exceptions. Introduced in Python 2.5 via the with statement, they're implemented using the context manager protocol: objects with __enter__() and __exit__() methods. This makes code cleaner, safer, and more readable compared to try/finally blocks.
This guide builds on decorators (which can even decorate context managers!) and assumes basic knowledge of exceptions and classes. We'll cover implementation, the contextlib module for easier creation, async variants, advanced patterns like nesting and suppression, and real-world use cases. Code snippets are executable—try them out!
1. The Core Protocol: Custom Class-Based Context Managers¶
A context manager is any object where:
__enter__(self)is called on entry to thewithblock; it returns a value (oftenself) for assignment (e.g.,as var).__exit__(self, exc_type, exc_val, exc_tb)is called on exit; it receives exception details (Noneif no exception) and can suppress it by returningTrue.
Example: A simple file-like resource simulator that tracks open/close state.
class ResourceManager:
def __init__(self, name):
self.name = name
self._is_open = False
def __enter__(self):
print(f"Acquiring {self.name}")
self._is_open = True
return self # Bind to 'as' variable
def __exit__(self, exc_type, exc_val, exc_tb):
print(f"Releasing {self.name}")
self._is_open = False
if exc_type:
print(f"Exception occurred: {exc_val}")
return False # Don't suppress; re-raise
return True # Suppress if no exception (though unnecessary here)
def use(self):
if not self._is_open:
raise RuntimeError("Resource not acquired!")
print(f"Using {self.name}")
# Usage
with ResourceManager("database") as res:
res.use()
raise ValueError("Oops!") # Simulates error
print("Block exited") # This runs anyway
Output:
Acquiring database
Using database
Exception occurred: Oops!
Releasing database
Traceback (most recent call last):
File "...", line ..., in <module>
raise ValueError("Oops!")
ValueError: Oops!
Key Insights:
__enter__setup happens first;__exit__cleanup always follows.- Returning
Truefrom__exit__suppresses the exception (rarely used; prefer handling inside). - The
asbinding is optional; if omitted,__enter__still runs but its return is ignored.
2. Generator-Based Context Managers with contextlib¶
For lighter-weight managers without full classes, use contextlib.contextmanager: a decorator that turns a generator function into a context manager. Yield once to split setup/teardown; code before yield is __enter__, after is __exit__.
Example: A timing context manager.
import time
from contextlib import contextmanager
@contextmanager
def timer(description):
start = time.time()
print(f"Starting {description}")
try:
yield # Entry point; bind to 'as timer' if needed
finally:
end = time.time()
print(f"{description} took {end - start:.4f} seconds")
# Usage
with timer("slow operation"):
time.sleep(1)
print("Inside the block")
# Advanced: Bind yield value
@contextlib.contextmanager
def temp_value(value):
print(f"Setting temp value to {value}")
try:
yield value * 2 # Return augmented value
finally:
print("Resetting temp value")
with temp_value(5) as doubled:
print(f"Using {doubled}") # Outputs 10
Output (first example):
Starting slow operation
Inside the block
slow operation took 1.0005 seconds
Nuance: The try/finally ensures teardown even if yield raises. Generators auto-handle exceptions by propagating them to __exit__-equivalent code.
3. Advanced Techniques¶
a. Nested Context Managers and ExitStack¶
Multiple with statements can nest manually, but contextlib.ExitStack dynamically manages them—ideal for loops or conditionals where the number of managers varies.
from contextlib import ExitStack, contextmanager
def create_managers(n):
return [ResourceManager(f"resource_{i}") for i in range(n)]
with ExitStack() as stack:
managers = create_managers(3)
for mgr in managers:
stack.enter_context(mgr) # Dynamic entry
# All acquired; use them
for mgr in managers:
mgr.use()
# All released on exit, even if exception
This pushes __enter__ calls onto a stack and pops __exit__ in LIFO order. Pro tip: Use stack.callback(cleanup_func) for non-context-manager cleanups.
b. Suppressing Exceptions with suppress¶
contextlib.suppress is a built-in context manager that ignores specified exceptions—great for idempotent operations.
from contextlib import suppress
import os
with suppress(FileNotFoundError):
os.remove("nonexistent.txt") # No error raised
# Custom suppression in __exit__
class SuppressErrors:
def __init__(self, *exceptions):
self.exceptions = exceptions
def __enter__(self):
return self
def __exit__(self, exc_type, exc_val, exc_tb):
return (exc_type is not None) and issubclass(exc_type, self.exceptions)
with SuppressErrors(ValueError, TypeError):
1 / 0 # ZeroDivisionError not suppressed; raises
c. Redirecting Output with redirect_stdout¶
Another contextlib gem: Temporarily redirect sys.stdout (or stderr).
from contextlib import redirect_stdout
import io
import sys
fake_output = io.StringIO()
with redirect_stdout(fake_output):
print("This won't show in console")
sys.stdout.write("Neither will this\n")
print("Captured:", repr(fake_output.getvalue()))
# Captured: 'This won't show in console\nNeither will this\n'
This uses a file-like object for capture, perfect for testing or logging.
d. Async Context Managers (Python 3.5+)¶
For coroutines, define async with using __aenter__/__aexit__ or @asynccontextmanager.
import asyncio
from contextlib import asynccontextmanager
@asynccontextmanager
async def async_timer(description):
start = time.time()
print(f"Async starting {description}")
try:
yield
finally:
print(f"Async {description} took {time.time() - start:.4f} seconds")
async def main():
async with async_timer("async op") as _:
await asyncio.sleep(1)
print("Async inside block")
asyncio.run(main())
Output:
Async starting async op
Async inside block
Async async op took 1.0003 seconds
Use in asyncio for I/O-bound resources like connections.
4. Patterns and Real-World Applications¶
- Resource Management: Files (
with open('file.txt')), locks (with threading.Lock():), DB sessions (SQLAlchemy'ssessionmaker). - Temporary State: Mocking in tests (
with patch.object(mod, 'func', mock_func):fromunittest.mock). - Chaining with Decorators: Decorate a function to wrap it in a context manager, e.g.,
@with_db_session def query(): .... - Custom Protocols: For libraries, define abstract base classes via
abc.ABCwith@abstractmethodfor__enter__etc. - Error Handling: Log in
__exit__; chain multiple managers for layered cleanup (e.g., file + lock).
Example Pattern: A transaction manager for databases.
from contextlib import contextmanager
class DBConnection:
def __init__(self):
self.conn = None # Simulated
def begin(self):
self.conn = "opened"
return self
def commit(self):
print("Committed")
def rollback(self):
print("Rolled back")
@contextmanager
def transaction(db: DBConnection):
db.begin()
try:
yield db
db.commit()
except Exception:
db.rollback()
raise
# Usage
db = DBConnection()
with transaction(db):
print("Doing work")
# raise ValueError("Fail") # Uncomment to see rollback
5. Best Practices and Pitfalls¶
- Prefer
contextlibfor Simplicity: Use generators over classes unless you need complex state. - Exception Propagation:
__exit__should re-raise unless suppressing; usesys.exc_info()for details. - Reentrancy: Ensure managers handle nested use (e.g., counters for locks).
- Performance: Minimal overhead, but avoid in hot loops; profile with
timeit. - Pitfalls:
- Forgetting
yieldin generators—leads to immediate teardown. - Mutable yields: Don't mutate yielded objects post-yield without care.
- Async pitfalls: Ensure
awaitin async managers; useasynciogathering for parallels.
- Forgetting
- Typing: Use
typing.ContextManagerorasynccontextmanagerfor stubs.
3. Metaclasses¶
Metaclasses are Python's ultimate metaprogramming tool: the "class of a class." Since everything in Python is an object—including classes—classes themselves must be instances of something. That something is a metaclass, which controls how a class is created, initialized, and customized. The default metaclass is type, but you can define custom ones to enforce rules, automate boilerplate, or alter behavior at class definition time (before instantiation).
Metaclasses build on concepts like descriptors and decorators: they're invoked during class statement evaluation, not runtime. This makes them powerful for frameworks but overkill for simple scripts—use sparingly, as they can obscure code. Assume familiarity with classes and __new__/__init__. We'll cover mechanics, customization, inheritance, and patterns with executable snippets.
1. The Basics: How Metaclasses Work¶
When Python encounters class Foo: ..., it:
- Collects the class body into a namespace dict.
- Calls the metaclass's
__prepare__(name, bases, **kwargs)(optional, returns the namespace; default is{}). - Calls the metaclass's
__new__(metacls, name, bases, namespace)to create the class object. - Calls the metaclass's
__init__(cls, name, bases, namespace)to initialize it (optional).
To specify: class Foo(metaclass=YourMeta): ....
Simple example: A metaclass that auto-adds a class attribute.
class AutoAttrMeta(type):
def __new__(mcs, name, bases, namespace):
# mcs is the metaclass (self-like)
namespace['auto_attr'] = f"Auto-added for {name}"
# Call type.__new__ to actually create the class
return super().__new__(mcs, name, bases, namespace)
class MyClass(metaclass=AutoAttrMeta):
pass # No explicit auto_attr here
print(MyClass.auto_attr) # Auto-added for MyClass
obj = MyClass()
print(obj.auto_attr) # Inherited, but class-level
Key Insight: __new__ returns the class object (like a constructor for classes). Overriding it lets you inspect/modify the namespace before creation. Without super().__new__, you'd break class instantiation.
2. Advanced Customization¶
a. __prepare__ for Namespace Control¶
By default, the class namespace is a plain dict, but __prepare__ can return an ordered dict (preserves definition order in Python 3.7+) or a custom mapping for validation.
Example: Enforce attribute order and validation.
from collections import OrderedDict
class OrderedMeta(type):
@classmethod
def __prepare__(mcs, name, bases):
return OrderedDict() # Preserves insertion order
def __new__(mcs, name, bases, namespace):
# Validate: Only allow certain attrs
allowed = {'__module__', '__qualname__', 'class_var'}
invalid = [k for k in namespace if k not in allowed]
if invalid:
raise TypeError(f"Invalid attrs in {name}: {invalid}")
return super().__new__(mcs, name, bases, namespace)
class ValidClass(metaclass=OrderedMeta):
class_var = 42 # OK
# class Invalid(metaclass=OrderedMeta):
# bad_var = "oops" # Raises TypeError
print(list(ValidClass.__dict__.keys())) # Ordered: ['__module__', '__qualname__', 'class_var', ...]
Nuance: In Python 3.6+, dicts are ordered, but OrderedDict ensures compatibility. Use for ABCs or when order matters (e.g., serialization).
b. __init__ for Post-Creation Hooks¶
After __new__, __init__ runs on the new class (like an initializer).
Example: Auto-register subclasses in a base.
class RegistryMeta(type):
def __init__(cls, name, bases, namespace):
super().__init__(name, bases, namespace)
if not hasattr(cls, '_registry'):
cls._registry = [] # Only on base
else:
cls._registry.append(cls) # Register subclasses
class Base(metaclass=RegistryMeta):
pass
class Child1(Base):
pass
class Child2(Base):
pass
print(Base._registry) # [<class '__main__.Child1'>, <class '__main__.Child2'>]
This is a common pattern for plugin systems: subclasses auto-register without explicit calls.
3. Inheritance and Metaclass Conflicts¶
Metaclasses must be compatible across inheritance. If class Child(Base1, Base2): where Base1 uses Meta1 and Base2 uses Meta2:
- Python resolves by making Meta1 inherit from Meta2 (or vice versa) if possible.
- If incompatible,
TypeError: metaclass conflict.
To handle: Define a cooperative metaclass.
class CompatibleMeta(type):
pass # Placeholder
class MetaA(type):
pass
class MetaB(CompatibleMeta, MetaA): # Inherits from both
pass
class BaseA(metaclass=MetaA):
pass
class BaseB(metaclass=MetaB): # Compatible with MetaA
pass
class Mixed(BaseA, BaseB): # No conflict
pass
print(Mixed.__class__) # <class '__main__.MetaB'>
Pro Tip: For multiple mixins, use abc.ABCMeta as a base—it's designed to cooperate.
4. Advanced Techniques and Patterns¶
a. Enforcing Interfaces (Abstract Base Classes)¶
abc.ABCMeta is a built-in metaclass for abstract classes, raising errors on incomplete implementations.
from abc import ABCMeta, abstractmethod
class AbstractShape(metaclass=ABCMeta):
@abstractmethod
def area(self):
pass
class Circle(AbstractShape):
def __init__(self, radius):
self.radius = radius
def area(self): # Must implement
return 3.14 * self.radius ** 2
# c = AbstractShape() # TypeError: Can't instantiate abstract class
c = Circle(5)
print(c.area()) # 78.5
Extend it: Override __subclasshook__ for virtual subclasses (e.g., register types dynamically).
b. Singleton Metaclass¶
Force a class to have one instance via metaclass (cleaner than decorator for classes).
class SingletonMeta(type):
_instances = {}
def __call__(cls, *args, **kwargs):
if cls not in cls._instances:
cls._instances[cls] = super().__call__(*args, **kwargs)
return cls._instances[cls]
class Logger(metaclass=SingletonMeta):
def __init__(self):
self.log = []
# Usage
log1 = Logger()
log2 = Logger()
print(log1 is log2) # True
Advanced Twist: Thread-safe with threading.Lock; or per-thread instances with threading.local().
c. Validation and Auto-Slotting¶
Metaclass for data classes: Auto-add __slots__ for memory efficiency and validate types.
class ValidatedMeta(type):
def __new__(mcs, name, bases, namespace):
# Auto-slots from attrs
attrs = {k: v for k, v in namespace.items() if not k.startswith('__')}
namespace['__slots__'] = tuple(attrs.keys())
# Type validation example
for attr, value in attrs.items():
if not callable(value) and not isinstance(value, (int, str, type(None))):
raise TypeError(f"{attr} must be basic type")
return super().__new__(mcs, name, bases, namespace)
class Point(metaclass=ValidatedMeta):
x = 0
y = "zero" # OK
# class Bad(metaclass=ValidatedMeta):
# z = [1,2] # TypeError
p = Point()
print(p.__slots__) # ('x', 'y')
This mimics dataclasses but with custom rules—pre-Python 3.7 style.
d. ORM-Like: Dynamic Attributes¶
Django models use metaclasses to scan fields and build querysets.
Simulated: Auto-generate __init__ from fields.
class Field:
def __init__(self, name, type_):
self.name = name
self.type_ = type_
class ModelMeta(type):
def __new__(mcs, name, bases, namespace):
fields = []
for key, val in namespace.items():
if isinstance(val, Field):
fields.append((key, val))
# Auto-init
def __init__(self, **kwargs):
for key, field in fields:
setattr(self, key, kwargs.get(key, field.type_()))
namespace['__init__'] = __init__
namespace['_fields'] = fields
return super().__new__(mcs, name, bases, namespace)
class User(metaclass=ModelMeta):
name = Field('name', str)
age = Field('age', int)
u = User(name="Alice", age=30)
print(u.name, u.age) # Alice 30
5. Best Practices and Pitfalls¶
- When to Use: For framework-level automation (e.g., enforcing APIs, registering components). Avoid for app logic—prefer descriptors or class decorators.
- Debugging: Metaclasses run at import time; trace with
sys.meta_path. Usetype(cls)to inspect. - Performance: Negligible at runtime (class creation is one-time), but complex
__prepare__can slow imports. - Pitfalls:
- Infinite recursion: Don't make metaclass inherit from a class using itself.
- Global side effects: Keep stateless; use class vars carefully.
- Compatibility: Test with multiple inheritance; prefer cooperative bases like
ABCMeta. - Python Versions:
__prepare__since 3.6; older? Use hacks.
- Typing: Metaclasses complicate stubs—use
typing.TypeVaror ignore in mypy for meta-level.
4. Generator Expressions and Coroutines¶
Generator expressions and coroutines are intertwined in Python's ecosystem of lazy iteration and asynchronous programming. Generator expressions provide a memory-efficient way to create iterable sequences on-the-fly, building on the iterator protocol. Coroutines extend generators for cooperative multitasking, yielding control back to a scheduler—forming the foundation for async/await in Python 3.5+. Both leverage the yield keyword and are "lazy" by nature: they produce values only when needed, avoiding the memory bloat of lists.
This assumes familiarity with iterators/generators and basic async. We'll cover syntax, mechanics, advanced patterns (e.g., chaining, exception handling), and real-world uses. Snippets are executable—test them!
1. Generator Expressions: Lazy Comprehensions¶
A generator expression (genexp) is syntactic sugar for a generator function, enclosed in parentheses: (expr for item in iterable if condition). It's like a list comprehension [...] but yields values one-by-one via the iterator protocol (__iter__ and __next__), suspending state between calls.
Basic example: Summing squares lazily.
# List comp (eager, memory-intensive for large data)
squares_list = [x**2 for x in range(1000000)]
print(sum(squares_list)) # Computes all upfront
# Genexp (lazy, streams values)
squares_gen = (x**2 for x in range(1000000))
print(sum(squares_gen)) # Same result, but only computes as summed
Output (both): 332333500000000 (but genexp uses ~constant memory).
Key Mechanics:
- Created with
()not[]; no comma needed for single-item (e.g.,(x for x in range(10))). - Immediately iterable:
gen = (x for x in range(10)); next(gen)yields 0. - Nested:
(f(x) for x in (g(y) for y in iterable))—outer yields as inner produces. - Side effects: Avoid in genexps (e.g., no
print); they're meant for pure computation.
Advanced: Chaining and Unpacking
Genexps shine in pipelines, like itertools chains.
import itertools
# Chain genexps with map/filter equivalents
data = [1, 2, 3, 4, 5]
evens_squared = (x**2 for x in (y for y in data if y % 2 == 0))
print(list(evens_squared)) # [4, 16]
# Unpack into functions expecting iterables
def consume_iter(it):
return sum(it)
result = consume_iter((x * 2 for x in range(5))) # 0+2+4+6+8=20
print(result)
Insight: Genexps are evaluated in the caller's scope, capturing variables via closure—useful for dynamic filtering.
2. Coroutines: Yielding Control¶
Coroutines are generator-like functions (def func(): yield ...) that yield from the caller, allowing pausing/resuming. Unlike plain generators (pull-based via next()), coroutines are push-based: values are sent via send() or delegated with yield from. They enable non-preemptive multitasking, where tasks voluntarily yield.
Basic coroutine: Echoing sent values.
def echo_coroutine():
print("Coroutine started")
while True:
received = yield # Pauses here; receives sent value
print(f"Received: {received}")
# Usage: Prime with next() to reach first yield
coro = echo_coroutine()
next(coro) # Advance to first yield
coro.send("Hello") # Yields control, sends "Hello"
coro.send("World") # Continues from yield
Output:
Coroutine started
Received: Hello
Received: World
Mechanics Deep Dive:
yieldsuspends; returns to caller.send(value)resumes, injectingvalueinto theyieldexpression.throw(exc)injects exceptions;close()terminates withGeneratorExit.- Coroutines are one-shot: after
return, they're done (raisesStopIteration).
Delegation with yield from (Python 3.3+): Sub-delegates to another iterable/coroutine, yielding its values transparently.
def sub_coroutine():
yield 1
yield 2
return "Sub done"
def delegating_coro():
result = yield from sub_coroutine() # Yields 1,2; gets return value
yield f"Delegated result: {result}"
c = delegating_coro()
print(next(c)) # 1
print(next(c)) # 2
print(next(c)) # Delegated result: Sub done
This chains coroutines, propagating exceptions and returns—key for async subroutines.
3. Advanced Techniques¶
a. Exception Handling in Generators/Coroutines¶
Both handle exceptions mid-iteration via try/except around yield.
def safe_gen():
yield 1
try:
yield 1 / 0 # Would raise ZeroDivisionError
except ZeroDivisionError:
yield "Handled error"
yield 3
g = safe_gen()
print([next(g) for _ in range(3)]) # [1, 'Handled error', 3]
# For coroutines: Inject via throw()
c = echo_coroutine()
next(c)
try:
c.throw(ValueError("Injected!"))
except ValueError:
print("Caught in coro") # Coro can handle or propagate
Pro Tip: Use yield from to propagate exceptions through chains.
b. Infinite Genexps and Coroutines¶
For streams: (random.random() for _ in count()) with itertools.count().
Coroutine for producer-consumer:
import queue
def producer(q):
while True:
item = yield # Wait for send (None to start)
q.put(item * 2)
def consumer(q):
while True:
yield q.get() # Yield consumed item
# Simulated scheduler
q = queue.Queue()
p = producer(q)
next(p) # Prime
c = consumer(q)
next(c) # Prime
p.send(5) # Producer queues 10
print(next(c)) # Consumer yields 10
This mimics async queues.
c. Coroutines and Async/Await (Python 3.5+)¶
Coroutines evolved into native async def functions, which are awaitables. await is syntactic sugar for yield from on a coroutine object.
import asyncio
async def ticker(name, interval):
while True:
print(f"{name}: tick")
await asyncio.sleep(interval) # Yields control to event loop
async def main():
await asyncio.gather(ticker("A", 1), ticker("B", 2))
asyncio.run(main()) # Runs until interrupted
Under the hood: async def returns a coroutine object; await sends/resumes it. Legacy coroutines interop via asyncio.coroutine decorator (deprecated).
d. Genexps in Coroutines¶
Embed for lazy data: async def process(): async for x in (y**2 async for y in large_source): ... (hypothetical; use async_generator in 3.6+).
4. Patterns and Real-World Applications¶
- Data Processing: Genexps for ETL pipelines (e.g., pandas iterrows() with genexp filters).
- Concurrency: Coroutines for parsers (e.g., state machines in network protocols).
- Async I/O:
asynciowebsockets, where coroutines handle multiplexing. - Tying to Prior Topics: Decorate coroutines with
@contextmanagerfor async contexts; metaclasses can enforceasyncin class methods. - Greenlets/Gevent: Legacy coroutine libs monkey-patch
yieldfor transparent concurrency.
5. Best Practices and Pitfalls¶
- Genexps: Use for one-pass iteration; convert to lists only if needed (e.g.,
list(gen)for multiple passes). Avoid complex logic—prefer generator functions for readability. - Coroutines: Always prime with
next()orawait; handleStopIterationin wrappers. Useasyncioover raw coroutines for modern code. - Memory: Both are lightweight, but infinite ones need explicit termination.
- Pitfalls:
- Scope leaks: Genexps capture loop vars (
for i in range(10): gen = (i for _ in ...)—wrongi). - Send/throw mismatches: Sending to unprimed coro raises
TypeError. - Debugging: Use
sys.set_coroutine_wrapperorasynciodebug mode; inspect withinspect.getgeneratorstate().
- Scope leaks: Genexps capture loop vars (
- Typing:
typing.Generator[YieldType, SendType, ReturnType]; for async:AsyncGenerator.
5. Descriptors¶
Descriptors are Python's mechanism for attribute delegation and customization, powering properties, methods, and even slots. They're the backbone of many "magic" features: when you access obj.attr, Python checks if attr is a descriptor (implements __get__, __set__, or __delete__) and delegates to it instead of direct access. This enables lazy evaluation, validation, caching, and more—without subclassing.
Descriptors are classes that define how attributes are accessed, set, or deleted on instances or classes. They're invoked during attribute lookup, making them a form of metaprogramming at the instance level (complementing metaclasses at the class level). Assume knowledge of classes and __getattribute__. We'll cover the protocol, types (data vs. non-data), advanced patterns, and integration with prior topics like decorators and metaclasses. Snippets are executable!
1. The Descriptor Protocol¶
A descriptor is any class with at least one of:
__get__(self, instance, owner): Called on read (instance.attrorClass.attr).__set__(self, instance, value): Called on write (instance.attr = value).__delete__(self, instance): Called on del (del instance.attr).
instance is the object (or None for class access); owner is the class.
Basic read-only descriptor:
class ReadOnlyDescriptor:
def __init__(self, initial_value):
self.value = initial_value
def __get__(self, instance, owner):
if instance is None:
return self # Class-level access returns the descriptor
return self.value # Instance access returns the value
# No __set__ or __delete__ → read-only
class MyClass:
attr = ReadOnlyDescriptor("Hello") # Class attr: the descriptor instance
obj = MyClass()
print(obj.attr) # Hello (calls __get__)
print(MyClass.attr) # <ReadOnlyDescriptor object> (descriptor itself)
# obj.attr = "World" # AttributeError: no __set__
Key Insights:
- Descriptors live in the class namespace (
MyClass.__dict__), not instances—shared across all instances. - Lookup order: Instance dict → Class dict (descriptors) → Bases →
None. __getattribute__triggers this automatically; override for custom behavior.
2. Data vs. Non-Data Descriptors¶
- Data descriptors: Have
__set__(or__delete__); take precedence over instance dicts. - Non-data: Only
__get__; instance attrs shadow them.
Example: Overridable vs. enforced.
class OverridableDescriptor:
def __init__(self, default):
self.default = default
def __get__(self, instance, owner):
if instance is None:
return self
return instance._cache.get(self, self.default) # Non-data: shadowed by _cache
def __set__(self, instance, value): # Makes it data
instance._cache[self] = value
class EnforcedDescriptor:
def __init__(self, value):
self.value = value
def __get__(self, instance, owner):
if instance is None:
return self
return self.value
# No __set__: Read-only, not shadowed
class Example:
over = OverridableDescriptor("Default Over")
enforced = EnforcedDescriptor("Enforced Value")
obj = Example()
print(obj.over) # Default Over
obj.over = "Overridden"
print(obj.over) # Overridden (uses instance _cache)
print(obj.enforced) # Enforced Value
# obj.enforced = "Nope" # AttributeError
# print(obj.enforced) # Still Enforced Value (not shadowed)
Nuance: Data descriptors ensure consistency (e.g., @property is a data descriptor via __set__ delegation).
3. Advanced Techniques¶
a. Read-Write Descriptor with Validation¶
Full CRUD-like with type checking.
class ValidatedDescriptor:
def __init__(self, name, type_):
self.name = name
self.type_ = type_
def __set_name__(self, owner, name): # Python 3.6+: Auto-sets self.name
self.name = name
def __get__(self, instance, owner):
if instance is None:
return self
return instance.__dict__.get(self.name, None)
def __set__(self, instance, value):
if not isinstance(value, self.type_):
raise TypeError(f"{self.name} must be {self.type_.__name__}")
instance.__dict__[self.name] = value
def __delete__(self, instance):
instance.__dict__.pop(self.name, None)
class Person:
name = ValidatedDescriptor("name", str)
age = ValidatedDescriptor("age", int)
p = Person()
p.name = "Alice"
p.age = 30
print(p.name, p.age) # Alice 30
# p.age = "thirty" # TypeError
del p.name
print(hasattr(p, 'name')) # False
Pro Tip: __set_name__ avoids hardcoding names—use for generic descriptors.
b. Lazy Evaluation and Caching¶
Descriptor for computed attrs, cached on first access.
import time
class LazyCached:
def __init__(self, compute_func):
self.compute = compute_func
self.cache = {}
def __get__(self, instance, owner):
if instance is None:
return self
if id(instance) not in self.cache:
print(f"Computing for {instance}")
self.cache[id(instance)] = self.compute(instance)
return self.cache[id(instance)]
def expensive_calc(obj):
time.sleep(1) # Simulate work
return obj.value ** 2
class DataHolder:
value = 5
squared = LazyCached(expensive_calc)
d1 = DataHolder()
print(d1.squared) # Computing... → 25
print(d1.squared) # Instant: 25
d2 = DataHolder()
print(d2.squared) # Re-computes for new instance
Per-instance caching via id(instance); use weakrefs for GC safety.
c. Method Descriptors (Under the Hood)¶
Functions are descriptors! def method(self): → __get__ binds self, returning a bound method.
def unbound_method(self):
return f"Hello, {self.name}"
class Greeter:
greet = unbound_method # Function object
g = Greeter()
g.name = "World"
print(g.greet()) # Hello, World (bound method)
print(Greeter.greet(g)) # Same, explicit self
staticmethod/classmethod override __get__ to skip/insert cls.
4. Patterns and Real-World Applications¶
- Properties:
@propertyis a descriptor factory:@property def x(self): ...creates a read-only data descriptor. - Slots:
__slots__uses hidden descriptors for memory-efficient instances (ties to metaclasses). - ORM Fields: SQLAlchemy's
Columnis a descriptor:__get__queries DB,__set__queues updates. - Validation Layers: Chain descriptors (e.g., via
__get__delegation) for multi-step checks. - Tying to Prior Topics:
- Decorators: Decorate descriptor methods:
@lru_cache def __get__(self, ...):. - Metaclasses: Auto-apply descriptors: Scan namespace in
__new__and wrap attrs. - Context Managers: Descriptor that acquires resources on access:
__get__enters a context. - Generators/Coroutines: Lazy descriptor yielding a generator:
def __get__(self): yield from compute().
- Decorators: Decorate descriptor methods:
Example: Descriptor with context manager integration.
from contextlib import contextmanager
class ManagedDescriptor:
def __get__(self, instance, owner):
if instance is None:
return self
@contextmanager
def managed():
print("Acquiring resource")
try:
yield instance._resource
finally:
print("Releasing")
instance._resource = "data"
return managed() # Returns context manager
class SecureClass:
secret = ManagedDescriptor()
obj = SecureClass()
with obj.secret as res:
print(res) # Acquiring... data ... Releasing
5. Best Practices and Pitfalls¶
- Use When: For reusable attribute behaviors (e.g., library code). Prefer
@propertyfor simple cases. - Sharing: Descriptors are class-level; use instance-specific state carefully (e.g., weak dicts).
- Performance:
__get__/__set__add overhead—profile for hot paths. - Pitfalls:
- Shadowing: Non-data descriptors can be accidentally shadowed; use data for enforcement.
- Infinite recursion: If
__get__accessesself.descriptor, boom—useobject.__getattribute__. - Pickling: Descriptors don't auto-pickle instance state; implement
__reduce__. - Typing: Use
typing.Protocolfor descriptor stubs:class Desc(Protocol): def __get__(self, ...) -> Any: ....
- Debugging:
vars(obj)skips descriptors; useobject.__dict__orinspect.getattr_static.
6. Multithreading and Multiprocessing¶
Python's concurrency model addresses CPU-bound and I/O-bound tasks differently due to the Global Interpreter Lock (GIL)—a mutex that serializes execution in CPython, limiting true parallelism in threads. Multithreading (via threading) excels for I/O-bound work (e.g., network calls) where threads block, releasing the GIL. Multiprocessing (via multiprocessing) bypasses the GIL by using separate processes, enabling true parallelism for CPU-bound tasks (e.g., computations). Both use high-level primitives like locks and queues for synchronization.
This builds on coroutines (async for cooperative multitasking) and context managers (e.g., with lock:). Assume basic threading knowledge. We'll cover mechanics, advanced patterns (pools, shared state), and trade-offs with code snippets—executable in Python 3.12+.
1. Multithreading: Cooperative Concurrency¶
Threads share memory, making them lightweight but prone to race conditions. The GIL means only one thread executes Python bytecode at a time, but C extensions (e.g., NumPy) can release it.
Basic example: Parallel I/O simulation.
import threading
import time
import requests # For real I/O
def fetch_url(url, delay):
time.sleep(delay) # Simulate I/O wait (GIL released)
response = requests.get(url)
print(f"{threading.current_thread().name}: Fetched {url} in {response.elapsed.total_seconds():.2f}s")
threads = []
urls = ["https://httpbin.org/delay/1", "https://httpbin.org/delay/2", "https://httpbin.org/delay/1.5"]
for url in urls:
t = threading.Thread(target=fetch_url, args=(url, 0.1), name=f"Fetcher-{len(threads)}")
t.start()
threads.append(t)
for t in threads:
t.join() # Wait for all
Output (approximate; concurrent due to I/O):
Fetcher-0: Fetched https://httpbin.org/delay/1 in 1.02s
Fetcher-2: Fetched https://httpbin.org/delay/1.5 in 1.52s
Fetcher-1: Fetched https://httpbin.org/delay/2 in 2.01s
Key Mechanics:
Thread(target=func, args=...): Creates/starts a thread.join(): Blocks until done; usedaemon=Truefor non-joinable background threads.- Thread-local storage:
threading.local()for per-thread data.
Advanced: Synchronization Primitives
- Locks (
threading.Lock()): Prevent races. - RLocks (reentrant): Allow same-thread re-acquisition.
- Semaphores/Events/Barriers: For signaling/coordination.
Example: Thread-safe counter.
import threading
class ThreadSafeCounter:
def __init__(self):
self._value = 0
self._lock = threading.Lock()
def increment(self):
with self._lock: # Context manager for RAII
self._value += 1
def value(self):
with self._lock:
return self._value
def worker(counter, n):
for _ in range(n):
counter.increment()
counter = ThreadSafeCounter()
threads = [threading.Thread(target=worker, args=(counter, 1000)) for _ in range(5)]
for t in threads:
t.start()
for t in threads:
t.join()
print(counter.value()) # 5000 (without lock: unpredictable)
Without with lock:, you'd get lost updates due to non-atomic +=.
Pitfall: Deadlocks—use acquire=False in RLock or threading.Timer for timeouts.
2. Multiprocessing: True Parallelism¶
Processes have separate memory, avoiding GIL but with higher overhead (forking, IPC). Use multiprocessing for CPU-bound work.
Basic example: Parallel computation.
from multiprocessing import Process, cpu_count
import time
def compute_squares(n):
start = time.time()
squares = [i**2 for i in range(n)]
print(f"{Process.name}: Computed {len(squares)} squares in {time.time() - start:.2f}s")
if __name__ == '__main__': # Protect against recursive forking
num_processes = min(4, cpu_count())
n = 10**6
processes = [Process(target=compute_squares, args=(n // num_processes,))
for _ in range(num_processes)]
start_total = time.time()
for p in processes:
p.start()
for p in processes:
p.join()
print(f"Total time: {time.time() - start_total:.2f}s")
Output (varies by CPU; faster than serial for multi-core):
ForkProcess-1: Computed 250000 squares in 0.05s
ForkProcess-2: Computed 250000 squares in 0.05s
...
Total time: 0.06s
Key Mechanics:
Process(target=...): Spawns OS process.if __name__ == '__main__': Essential on Unix (fork) and Windows (spawn); prevents infinite loops.- Startup methods:
multiprocessing.set_start_method('spawn')for isolation (e.g., in Jupyter).
Advanced: Pools and Shared State
- Pool: Reuses processes for task batches (
apply,map,starmap). - Shared memory:
Value/Arrayfor scalars/arrays;Managerfor dicts/lists (proxied).
Example: Parallel map with pool.
from multiprocessing import Pool
import math
def prime_factors(n):
factors = []
while n % 2 == 0:
factors.append(2)
n //= 2
for i in range(3, int(math.sqrt(n)) + 1, 2):
while n % i == 0:
factors.append(i)
n //= i
if n > 2:
factors.append(n)
return factors
if __name__ == '__main__':
numbers = [i for i in range(100, 200)]
with Pool(processes=4) as pool: # Context manager auto-closes/joins
results = pool.map(prime_factors, numbers)
print(results[:3]) # [[2, 2, 5, 5], [101], [2, 3, 17]]
For shared state:
from multiprocessing import Process, Value, Lock
def update_shared(v, lock, increments):
for _ in range(increments):
with lock:
v.value += 1
if __name__ == '__main__':
shared_value = Value('i', 0) # Shared int
lock = Lock()
procs = [Process(target=update_shared, args=(shared_value, lock, 1000))
for _ in range(5)]
for p in procs:
p.start()
for p in procs:
p.join()
print(shared_value.value) # 5000
Manager for complex types: manager = multiprocessing.Manager(); d = manager.dict().
3. Comparison and Hybrids¶
| Aspect | Multithreading | Multiprocessing |
|---|---|---|
| GIL Impact | Serial bytecode; good for I/O | Bypassed; good for CPU |
| Memory | Shared (fast, but races) | Isolated (safe, but copy-on-write) |
| Overhead | Low (context switch) | High (fork/IPC) |
| Use Cases | Web scraping, GUI events | ML training, simulations |
| Primitives | threading.Queue, Lock |
multiprocessing.Queue, Lock |
| Scalability | Limited by GIL (~1000 threads max) | Limited by cores (~OS limit) |
Hybrid: concurrent.futures: Unified API for both.
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor, as_completed
def slow_task(n):
time.sleep(0.1) # I/O-like
return n**2
# ThreadPool for I/O
with ThreadPoolExecutor(max_workers=3) as executor:
futures = [executor.submit(slow_task, i) for i in range(5)]
for future in as_completed(futures):
print(future.result())
# ProcessPool for CPU (swap ThreadPoolExecutor → ProcessPoolExecutor)
submit() returns Future; result() blocks; cancel() for control. Integrates with async via asyncio wrappers.
4. Advanced Patterns¶
- Queues for IPC: Thread-safe (
threading.Queue) or process-safe (multiprocessing.Queue); use for producer-consumer. - Futures and Callbacks:
future.add_done_callback(func)for non-blocking. - Avoiding GIL in Threads: Use
multiprocessingfor pure Python CPU; ornumba/Cythonfor speedups. - Tying to Prior Topics:
- Coroutines: Prefer
asynciofor I/O (single-threaded, scalable); threads for blocking libs. - Context Managers: All primitives support
with(e.g.,with Pool():). - Descriptors/Metaclasses: Thread-safe descriptors with locks in
__get__. - Generators: Yield from queues in threaded producers.
- Coroutines: Prefer
Example: Threaded generator with queue.
import queue
import threading
def producer(q, items):
for item in items:
q.put(item ** 2)
q.put(None) # Sentinel
def consumer(q):
while True:
item = q.get()
if item is None:
break
yield item # Generator yields from queue
if __name__ == '__main__':
q = queue.Queue()
items = range(5)
prod_thread = threading.Thread(target=producer, args=(q, items))
prod_thread.start()
gen = consumer(q)
print(list(gen)) # [0, 1, 4, 9, 16]
prod_thread.join()
5. Best Practices and Pitfalls¶
- Choose Wisely: I/O → threads/asyncio; CPU → multiprocessing. Profile with
cProfile+threadingevents. - Synchronization: Always lock shared state; use
atomicops where possible (e.g.,queueis lock-free internally). - Debugging: Set
threading.settrace(); usefaulthandlerfor segfaults. For processes:mp.set_start_method('spawn', force=True). - Pitfalls:
- GIL surprises: Pure Python loops don't parallelize in threads.
- Fork bombs: Limit processes; handle
SIGINTwithsignal. - Data serialization: Pickle fails on lambdas/unpicklables in multiprocessing.
- Resource leaks: Always
join()or usewith; daemon threads/processes exit with main.
- Performance: Threads: 10-100x overhead for switches; Processes: 10-100ms startup. Scale with
cpu_count(). - Typing:
typing_extensionsforThreadPoolExecutorgenerics.
7. Abstract Base Classes¶
Abstract Base Classes are Python's way to define interfaces and enforce contracts in a duck-typed world. Part of the abc module (since Python 2.6), ABCs let you specify what a class should do (abstract methods) without dictating how. They use a metaclass (ABCMeta) to prevent instantiation of incomplete subclasses and support runtime registration for virtual subclasses—blending structural typing with nominal checks.
ABCs shine for libraries (e.g., collections.abc for iterables) and large codebases, promoting polymorphism without inheritance hierarchies. They tie into metaclasses (ABCs use one) and descriptors (abstract properties). Assume class basics; we'll dive into syntax, enforcement, advanced hooks, and patterns with snippets.
1. The Basics: Defining and Enforcing Abstracts¶
An ABC is a class inheriting from abc.ABC (or using metaclass=ABCMeta). Mark methods as abstract with @abstractmethod; subclasses must implement them or remain abstract.
Simple example: Shape interface.
from abc import ABC, abstractmethod
class Shape(ABC):
@abstractmethod
def area(self):
"""Compute area; must be implemented."""
pass
@abstractmethod
def perimeter(self):
pass
# Concrete method: Shared implementation
def describe(self):
return f"A shape with area {self.area()}"
class Rectangle(Shape):
def __init__(self, width, height):
self.width = width
self.height = height
def area(self):
return self.width * self.height
def perimeter(self):
return 2 * (self.width + self.height)
# s = Shape() # TypeError: Can't instantiate abstract class Shape
r = Rectangle(3, 4)
print(r.area()) # 12
print(r.describe()) # A shape with area 12
Key Mechanics:
- Abstract methods can't have bodies (except
pass); concrete ones can. - Subclasses are abstract until all abstracts are implemented—
issubclass(Rectangle, Shape)→True. - Instantiation fails with
TypeErrorif abstracts remain. @abstractmethodmust wrap the method; use on__init__sparingly (forces subclass init).
2. Abstract Properties and Class Methods¶
ABCs extend to properties and class/static methods via descriptors.
Example: Enforced property with abstract class method.
from abc import ABC, abstractmethod, abstractproperty # abstractproperty deprecated; use property + abstractmethod
class Vehicle(ABC):
@property
@abstractmethod
def wheels(self):
"""Number of wheels; abstract property."""
pass
@classmethod
@abstractmethod
def category(cls):
"""Vehicle category."""
pass
class Car(Vehicle):
@property
def wheels(self):
return 4
@classmethod
def category(cls):
return "Land"
print(Car().wheels) # 4
print(Car.category()) # Land
# Incomplete subclass
class Incomplete(Vehicle):
pass # TypeError on issubclass or instantiation
Nuance: Stack @property under @abstractmethod for read-only abstracts. For setters, add @wheel.setter below.
3. Advanced Techniques¶
a. Virtual Subclasses with __subclasshook__¶
ABCs support "structural" subtyping: Register classes at runtime without inheritance via ABC.register(subclass). Customize checks with __subclasshook__ for duck-typing.
Example: Registering a non-inheriting class.
from abc import ABC, abstractmethod
class Iterable(ABC):
@abstractmethod
def __iter__(self):
pass
@classmethod
def __subclasshook__(cls, subclass):
# Duck-type: If subclass has __iter__, consider it a virtual subclass
if cls is Iterable:
return any('__iter__' in sc.__dict__ for sc in subclass.__mro__)
return NotImplemented
class MyList:
def __init__(self, items):
self.items = items
def __iter__(self):
return iter(self.items)
Iterable.register(MyList) # Registers virtually
print(issubclass(MyList, Iterable)) # True (via hook)
print(isinstance(MyList([1,2]), Iterable)) # True
Deeper Dive: __subclasshook__ runs on issubclass/isinstance; return True/False/NotImplemented. Built-in ABCs like collections.abc.Sequence use this for list, tuple, etc.
b. Multiple Inheritance and Coexistence¶
ABCs cooperate with other metaclasses via ABCMeta's design.
from abc import ABCMeta, ABC
class OtherMeta(type):
def __new__(mcs, name, bases, namespace):
namespace['extra'] = "From OtherMeta"
return super().__new__(mcs, name, bases, namespace)
class CompatibleABC(ABC, metaclass=OtherMeta): # ABCMeta as base handles conflicts
@abstractmethod
def method(self):
pass
class Impl(CompatibleABC):
def method(self):
return "Implemented"
print(Impl().extra) # From OtherMeta
4. Patterns and Real-World Applications¶
- Interface Definition: Use for plugins:
Plugin(ABC)with@abstractmethod def run(self):. - Collections ABCs:
collections.abcprovidesMapping,MutableSequence—extend or register for custom types. - Callback Systems: Abstract event handlers in frameworks (e.g., Django signals).
- Tying to Prior Topics:
- Metaclasses: ABCs are metaclasses; extend
ABCMeta.__new__for custom enforcement. - Descriptors: Abstract properties use descriptors; combine with validated descriptors for typed abstracts.
- Context Managers: Abstract
__enter__/__exit__in aContextManager(ABC). - Generators/Coroutines: Abstract
async defmethods (Python 3.9+;@abstractmethodon async). - Concurrency: Thread-safe ABCs with locks in concrete impls.
- Metaclasses: ABCs are metaclasses; extend
Example: Abstract async iterator.
from abc import ABC, abstractmethod
import asyncio
class AsyncIterable(ABC):
@abstractmethod
async def __aiter__(self):
pass
@abstractmethod
async def __anext__(self):
pass
class AsyncCounter(AsyncIterable):
def __init__(self, stop):
self.count = 0
self.stop = stop
async def __aiter__(self):
return self
async def __anext__(self):
if self.count >= self.stop:
raise StopAsyncIteration
self.count += 1
await asyncio.sleep(0.1) # Simulate async work
return self.count
async def main():
async for num in AsyncCounter(3):
print(num) # 1 2 3
asyncio.run(main())
5. Best Practices and Pitfalls¶
- When to Use: For public APIs or when
isinstancechecks matter. Prefer duck typing otherwise. - Minimalism: Only abstract what's essential; provide concrete helpers.
- Testing:
pytestwithabcmocks; checkissubclassin tests. - Pitfalls:
- Over-abstracting: Leads to boilerplate; start concrete, abstract later.
- Inheritance order: Abstracts must come before concretes in MRO.
- Async abstracts: Ensure
@abstractmethodwrapsasync def. - Performance: Negligible;
isinstanceadds a check but optimizes well.
- Typing:
typing.Protocolfor structural (no enforcement); ABCs for nominal + structural.
8. Data Classes¶
Data classes, introduced in Python 3.7 via the dataclasses module, are a declarative way to create classes that primarily store data. They reduce boilerplate by auto-generating special methods like __init__, __repr__, __eq__, __ne__, __hash__ (if not frozen), and optionally __str__ or __slots__. Under the hood, they use descriptors and metaclasses to inspect fields (annotated attributes) and generate code—tying directly into descriptors and metaclasses from our prior discussions.
Data classes shine for DTOs (data transfer objects), configs, or simple models, promoting readability over manual __init__ drudgery. They're customizable via parameters and hooks, but not a full ORM replacement (use with Pydantic for validation/serialization). Assume basic class knowledge; we'll cover syntax, advanced customization, inheritance pitfalls, and patterns with executable snippets.
1. The Basics: Defining and Using Data Classes¶
Import dataclasses and decorate your class with @dataclass. Fields are type-annotated instance variables; the metaclass auto-generates methods based on them.
Simple example: A point class.
from dataclasses import dataclass
@dataclass
class Point:
x: float = 0.0
y: float = 0.0
# No __init__ needed!
# Usage
p1 = Point(1.5, 2.5)
p2 = Point(y=3.0, x=4.0) # Kwargs work
print(p1) # Point(x=1.5, y=2.5) # Auto __repr__
print(p1 == Point(1.5, 2.5)) # True # Auto __eq__
print(hash(p1)) # Hashable by default
Key Mechanics:
- Fields: Annotated attrs (e.g.,
x: int); defaults go right of=. - Generated methods:
__init__: Takes fields as args (required first, then optionals).__repr__:ClassName(field=val, ...)—great for debugging.__eq__: Compares all fields by==.
- Order: Fields in definition order for
__init__sig and__repr__. - No annotations? Use
dataclass.field()for dynamic fields (advanced).
2. Customization Options¶
The @dataclass decorator accepts params: init=True (default), repr=True, eq=True, order=False, unsafe_hash=False, frozen=False, match_args=True, kw_only=False (3.10+), slots=False (3.10+).
Example: Frozen (immutable) config with ordering.
from dataclasses import dataclass
from typing import ClassVar # For class-level, non-field vars
@dataclass(frozen=True, order=True) # Immutable; supports < > etc. via generated __lt__
class Config:
host: str = "localhost"
port: int = 8080
debug: bool = False
_internal: ClassVar[str] = "ignored" # Not a field; no init/repr
c1 = Config("example.com", 443)
c2 = Config("localhost", 80)
print(c1 < c2) # True (orders by fields)
# c1.host = "new" # FrozenError: cannot assign to field 'host'
print(c1) # Config(host='example.com', port=443, debug=False)
Nuance: frozen=True sets __setattr__ to raise dataclasses.FrozenInstanceError; order=True generates comparison methods (use sparingly, as it assumes total order).
3. Advanced Techniques¶
a. __post_init__ for Custom Initialization¶
Hook after auto-__init__ for validation or computed fields.
from dataclasses import dataclass, field
import re
@dataclass
class Email:
address: str
validated: bool = field(init=False, default=False) # Not in __init__
def __post_init__(self):
if not re.match(r"[^@]+@[^@]+\.[^@]+", self.address):
raise ValueError(f"Invalid email: {self.address}")
self.validated = True # Computed post-init
e = Email("user@example.com")
print(e.validated) # True
# Email("invalid") # Raises ValueError
field() customizes: default=..., default_factory=list (callable for mutables), init=False (exclude from __init__), repr=False, hash=False, compare=False, metadata={...} (arbitrary data).
b. Inheritance and Field Overrides¶
Data classes support inheritance, but fields from base are included unless shadowed.
@dataclass
class Base:
name: str
id: int = field(default=0, metadata={"pk": True})
@dataclass
class Derived(Base):
age: int = 0 # Inherits name/id
# Can override: id: int = field(init=False)
d = Derived("Alice", 123, 30)
print(d) # Derived(name='Alice', id=123, age=30)
print(d.id) # 123
Pitfall: Default mutables (e.g., default=[]) shared across instances—use default_factory=list instead.
c. asdict/astuple and replace¶
Utilities for serialization and immutability.
from dataclasses import dataclass, asdict, astuple, replace
@dataclass
class Inventory:
item: str
quantity: int = 1
inv = Inventory("apple", 5)
print(asdict(inv)) # {'item': 'apple', 'quantity': 5} # Dict
print(astuple(inv)) # ('apple', 5) # Tuple
new_inv = replace(inv, quantity=10) # Immutable update
print(new_inv) # Inventory(item='apple', quantity=10)
replace is like dataclass version of namedtuple._replace.
d. Slots for Efficiency (Python 3.10+)¶
slots=True generates __slots__ (from descriptors/metaclasses), reducing memory (~20-50% less) and speeding attribute access.
@dataclass(slots=True)
class EfficientPoint:
x: float
y: float
p = EfficientPoint(1.0, 2.0)
# p.__dict__ # AttributeError: 'EfficientPoint' object has no attribute '__dict__'
print(p.x) # Fast lookup
Combines with frozen for tuple-like perf.
4. Patterns and Real-World Applications¶
- Config Objects: Frozen data classes for app settings (immutable, hashable for caching).
- DTOs in APIs: With
asdictfor JSON serialization (e.g., FastAPI models). - Named Tuples Alternative: More flexible (defaults, post-init); use data classes for mutability needs.
- Tying to Prior Topics:
- Descriptors: Fields are descriptor-based; customize with
field()+ custom descriptors. - Metaclasses: Data classes use a metaclass (
_process_class); extend for auto-field registration. - ABCs: Inherit from
dataclass+ ABC:@dataclass class MyModel(ABC): @abstractmethod def validate(self): .... - Context Managers: Data class holding resources:
@dataclass class Session: conn: Connection; def __enter__(self): .... - Generators/Coroutines: Yield data class instances in generators for structured iteration.
- Concurrency: Frozen data classes as thread-safe shared data (immutable).
- Decorators: Decorate
__post_init__for logging/validation.
- Descriptors: Fields are descriptor-based; customize with
Example: Data class with generator integration.
@dataclass
class Result:
value: int
status: str = "ok"
def process_data(items: list[int]) -> list[Result]:
return [Result(i**2) for i in items] # Structured output
# Or generator:
def stream_results(items):
for i in items:
yield Result(i)
print(list(stream_results([1,2,3]))) # [Result(value=1, status='ok'), ...]
5. Best Practices and Pitfalls¶
- Annotations Always: Required for fields; use
InitVarfor__init__-only vars. - Immutability: Prefer
frozen=Truefor hashability; usereplaceover mutation. - Validation:
__post_init__or external libs likepydantic(data classes + validators). - Performance:
slots=Truefor large collections; avoid deep nesting. - Pitfalls:
- Inheritance order: Base data classes first; shadows can confuse
__eq__. - Mutables:
default_factorymandatory for lists/dicts. - Pickling: Works, but slots need
__getstate__for custom. - Typing: Auto-generates stubs; use
dataclasses.dataclass_transformfor decorator typing (3.11+).
- Inheritance order: Base data classes first; shadows can confuse
- When Not to Use: Complex logic—stick to regular classes; or ultra-perf (namedtuples).
9. Type Hints¶
Type hints (introduced in Python 3.5 via PEP 484) are optional annotations for function signatures, variables, and classes that describe expected types—without enforcing them at runtime (Python remains dynamically typed). They're for static analysis (e.g., mypy, Pyright), IDEs (autocompletion, refactoring), and self-documentation. Advanced usage involves generics, protocols (structural typing), literals, unions/intersections, and runtime introspection via typing/typing_extensions. They integrate with prior topics: descriptors for typed properties, data classes for auto-typed fields, ABCs for protocols, and metaclasses for type-checked class creation.
This assumes basic hints (e.g., def func(x: int) -> str: ...); we'll focus on sophisticated features, runtime uses, and patterns. Snippets use from typing import ...; test with mypy for validation.
1. Core Mechanics: Beyond Basics¶
Hints use the typing module (stdlib) or typing_extensions (backports). Key: Forward references (strings for recursive types), Annotated for metadata.
Example: Recursive type with annotations.
from typing import List, Optional, TypeVar, Annotated
T = TypeVar('T') # Generic type variable
class Node:
def __init__(self, value: T, children: Optional[List['Node[T]']] = None):
self.value = value
self.children = children or []
def process_nodes(nodes: List[Annotated[Node[int], "root nodes"]]) -> int:
"""Annotated for extra info (e.g., tools like FastAPI use metadata)."""
return sum(n.value for n in nodes)
# Usage
root = Node(1, [Node(2), Node(3)])
print(process_nodes([root])) # 1
Insights:
'Node[T]'as string avoids NameError for forward refs.Annotated[T, metadata]attaches extras (e.g., validators, descriptions); ignored by mypy but used by frameworks.- Type vars (
TypeVar) parameterize generics (next section).
2. Generics: Parameterized Types¶
Generics (Generic, TypeVar) make reusable types (e.g., List[T]) concrete. Define with Generic[T], use in ABCs or classes.
Example: Typed stack class.
from typing import Generic, TypeVar, List
T = TypeVar('T')
class Stack(Generic[T]):
def __init__(self) -> None:
self._items: List[T] = []
def push(self, item: T) -> None:
self._items.append(item)
def pop(self) -> T:
return self._items.pop()
def __len__(self) -> int:
return len(self._items)
# Usage: mypy infers IntStack ~ Stack[int]
int_stack: Stack[int] = Stack()
int_stack.push(42)
print(int_stack.pop()) # 42
# str_stack: Stack[str] = Stack() # mypy: Incompatible types
Advanced: Bounded and Multi-Param Generics
Use bound=Base for constraints; multiple vars for bivariant types.
from typing import TypeVar
class Shape(ABC): # From prior ABCs
@abstractmethod
def area(self) -> float: ...
S = TypeVar('S', bound=Shape) # Only Shape or subclass
def draw_shapes(shapes: List[S]) -> List[S]: # Preserves type
return [s for s in shapes if s.area() > 0]
# Multi: Dict[K, V] with K hashable
K = TypeVar('K')
V = TypeVar('V')
typing_extensions adds TypeVarTuple (3.11+ backport) for variadics: def func(*args: *Ts) -> tuple[*Ts]: ....
3. Protocols: Structural Typing¶
Protocols (PEP 544) enable duck typing with isinstance checks: A type is a protocol if it implements the methods/attrs, no inheritance needed. Define via Protocol ABC.
Example: Drawable protocol.
from typing import Protocol
class Drawable(Protocol):
def draw(self) -> None: ...
width: int # Attr, not method
class Circle:
width: int = 10 # Satisfies protocol
def draw(self) -> None:
print("Drawing circle")
def render(drawable: Drawable) -> None:
drawable.draw()
c = Circle()
render(c) # OK; isinstance(c, Drawable) → True (structural)
Nuance: Runtime isinstance uses __instancecheck__ (via metaclass); static mypy verifies structurally. Extend ABCs with protocols for hybrid nominal/structural.
Advanced: Narrowing with reveal_type
Use TypedDict for dict subsets; NotRequired (3.11+) for partials.
from typing import TypedDict, NotRequired
class User(TypedDict):
name: str
age: NotRequired[int] # Optional in 3.11+
user: User = {"name": "Alice"} # OK; age not required
4. Literals, Unions, and Intersections¶
- Literal: Exact values (
Literal['a', 'b', 1]). - Union (
|in 3.10+):int | str. - Intersection (
&in 3.10+):TypeGuardfor narrowing.
Example: Literal union with guards.
from typing import Literal, Union, TypeGuard
Action = Literal['add', 'remove', 'query']
def is_add_action(action: Union[Action, str]) -> TypeGuard[Literal['add']]:
return action == 'add'
def handle_action(action: Union[Action, str]) -> str:
if is_add_action(action):
reveal_type(action) # mypy: Literal['add']
return "Added"
return "Other"
print(handle_action('add')) # Added
Runtime Introspection: Use typing.get_type_hints(func) or inspect.signature for dynamic checks.
import inspect
def typed_func(x: int, y: str) -> bool:
return len(y) > x
sig = inspect.signature(typed_func)
ann = sig.parameters['x'].annotation # <class 'int'>
print(ann) # <class 'int'>
5. Advanced Techniques and Patterns¶
a. Callables and Overloads¶
Callable[[Arg1, Arg2], Return] for functions; @overload for ambiguous signatures.
from typing import overload, Callable
@overload
def add(a: int, b: int) -> int: ...
@overload
def add(a: str, b: str) -> str: ...
def add(a: Union[int, str], b: Union[int, str]) -> Union[int, str]:
if isinstance(a, int):
return a + b # mypy knows int
return a + b
add_handler: Callable[[int, int], int] = add # OK
b. TypeVars in Context Managers/Generators¶
Tie to prior: Typed yields/sends.
from typing import Generator, ContextManager
def typed_gen(x: int) -> Generator[int, None, None]:
yield x ** 2
@contextmanager
def typed_cm() -> ContextManager[str]:
yield "resource"
c. Final and TypedDict for Exhaustiveness¶
Final (3.8+) prevents subclassing/override; LiteralString for safe strings.
d. Integration Patterns¶
- Descriptors:
@property def x(self) -> int: ...types the getter. - Data Classes: Auto-hints fields;
@dataclasspreserves annotations. - Metaclasses: Validate hints in
__prepare__:if not isinstance(ann, type): raise TypeError. - ABCs/Protocols:
@runtime_checkableon protocols for runtimeisinstance. - Concurrency:
Awaitable[T]for async funcs;concurrent.futures.Future[T]. - Generators/Coroutines:
Generator[Yield, Send, Return];Coroutine[Any, Any, T].
Example: Protocol for async context.
from typing import AsyncContextManager, Protocol
class AsyncResource(Protocol):
async def __aenter__(self) -> str: ...
async def __aexit__(self, *args) -> None: ...
async def use_resource(res: AsyncResource) -> None:
async with res as data:
print(data)
6. Best Practices and Pitfalls¶
- Tools: Run
mypy --strictor Pyright;pyrightis faster for large codebases. - Consistency: Annotate everything; use
Anysparingly (propagates unknowns). - Runtime: No enforcement—use
pydanticortypeguardfor optional checks. -
Pitfalls:
- Circular imports: Use strings (
'Module.Class'). - Over-specification:
Unionexplosion—prefer protocols for flexibility. - Versioning:
typing_extensionsfor 3.13+ features (e.g.,TypeIs). - Performance: Annotations add no runtime cost; introspection is cheap.
- Circular imports: Use strings (
-
Typing the Typing:
typing_extensions.dataclass_transformfor custom@dataclass-like decorators.
10. AsyncIO¶
AsyncIO (introduced in Python 3.4, stabilized in 3.5) is Python's standard library for asynchronous I/O programming, enabling concurrent execution of I/O-bound tasks (e.g., network requests, file reads) in a single thread via cooperative multitasking. It builds directly on coroutines (from our generator/coroutine discussion): async def defines awaitable functions, and await yields control to the event loop. Unlike multithreading (which uses OS threads and the GIL), AsyncIO is lightweight, scalable (thousands of tasks), and deterministic—no race conditions from shared memory.
AsyncIO shines for servers (e.g., FastAPI, aiohttp), but requires explicit yielding—it's not "set it and forget it." It ties into context managers (async versions), type hints (Awaitable[T]), data classes (for task results), and ABCs/protocols (e.g., AsyncIterable). Assume coroutine basics; we'll cover the event loop, tasks/futures, advanced patterns, and pitfalls with executable snippets (Python 3.10+ for modern syntax).
1. Core Concepts: Coroutines, Await, and the Event Loop¶
- Coroutines:
async defreturns a coroutine object;awaitsuspends until complete. - Event Loop: The scheduler (
asyncio.get_event_loop()) runs coroutines, switching onawaitpoints. - Run with
asyncio.run()(3.7+), which creates/manages the loop.
Basic example: Concurrent delays.
import asyncio
async def say_after(delay, what):
await asyncio.sleep(delay) # Yields to loop (I/O simulation)
print(what)
async def main():
# Schedule coroutines
task1 = asyncio.create_task(say_after(1, "hello"))
task2 = asyncio.create_task(say_after(2, "world"))
print(f"Started at {asyncio.get_event_loop().time()}")
await task1 # Suspend until done
await task2
print(f"Finished at {asyncio.get_event_loop().time()}")
# Run the async main
asyncio.run(main())
Output (approximate):
Started at 0.0
hello
world
Finished at 2.001
Key Insights:
asyncio.sleep(0)yields without delay—polls for readiness.create_task()schedules on the loop;awaitwaits (but allows switching).- Single-threaded: All runs in one OS thread, no GIL issues for I/O.
2. Tasks and Futures: Concurrency Primitives¶
- Future: Abstract result holder (
asyncio.Future); resolved later. - Task: Concrete Future for coroutines (
asyncio.Task); auto-schedules.
Gather multiple tasks with asyncio.gather() for parallel execution.
Example: Parallel web fetches (using aiohttp—install if needed, but assume available).
import asyncio
import aiohttp # pip install aiohttp
async def fetch(session, url):
async with session.get(url) as response:
return await response.text()
async def main():
urls = [
"https://httpbin.org/delay/1",
"https://httpbin.org/delay/2",
"https://httpbin.org/delay/1.5"
]
async with aiohttp.ClientSession() as session:
tasks = [fetch(session, url) for url in urls]
results = await asyncio.gather(*tasks) # Parallel; first to complete order
print(f"Fetched {len(results)} pages")
# results: [text1, text2, text3]
asyncio.run(main())
Nuance: gather() propagates exceptions from any task (use return_exceptions=True to collect them). For ordered results, use asyncio.as_completed().
3. Advanced Techniques¶
a. Async Context Managers and Iterators¶
Tie to context managers: Define with __aenter__/__aexit__ or @asynccontextmanager.
Example: Async database session.
from contextlib import asynccontextmanager
import asyncio
class MockDB:
async def query(self, sql):
await asyncio.sleep(0.1)
return f"Results for {sql}"
@asynccontextmanager
async def db_session():
db = MockDB()
print("Opening DB session")
try:
yield db
finally:
print("Closing DB session")
async def main():
async with db_session() as db:
result = await db.query("SELECT * FROM users")
print(result)
asyncio.run(main())
For iteration: async for on AsyncIterable (ABC from collections.abc).
class AsyncCounter:
def __init__(self, stop):
self.current = 0
self.stop = stop
def __aiter__(self):
return self
async def __anext__(self):
if self.current >= self.stop:
raise StopAsyncIteration
await asyncio.sleep(0.1)
self.current += 1
return self.current
async def main():
async for num in AsyncCounter(3):
print(num) # 1 2 3 (with delays)
asyncio.run(main())
b. Timeouts, Cancellation, and Shields¶
asyncio.wait_for(coro, timeout) aborts on timeout; shield(task) prevents cancellation.
async def long_task():
await asyncio.sleep(5)
return "Done"
async def main():
try:
result = await asyncio.wait_for(long_task(), timeout=2)
except asyncio.TimeoutError:
print("Timed out")
# Shield: Protects inner task from outer cancel
shielded = asyncio.shield(long_task())
# asyncio.get_event_loop().call_soon_threadsafe(lambda: shielded.cancel()) # Would ignore
asyncio.run(main())
Cancellation: Tasks check if cancelled(): raise asyncio.CancelledError in loops.
c. Queues and Semaphores for Coordination¶
asyncio.Queue for producer-consumer; Semaphore limits concurrency (e.g., rate limiting).
async def producer(queue, items):
for item in items:
await queue.put(item)
print(f"Produced {item}")
async def consumer(queue, name):
while True:
item = await queue.get()
if item is None: # Sentinel
break
print(f"{name} consumed {item}")
queue.task_done()
async def main():
queue = asyncio.Queue()
producers = [asyncio.create_task(producer(queue, range(3)))]
consumers = [asyncio.create_task(consumer(queue, f"Consumer-{i}")) for i in range(2)]
await asyncio.gather(*producers)
for c in consumers:
await queue.put(None) # End consumers
await asyncio.gather(*consumers)
asyncio.run(main())
4. Patterns and Real-World Applications¶
- Web Servers:
asyncio+aiohttpfor non-blocking APIs; integrates with data classes for typed responses. - Streams: Async iterators for real-time data (e.g., WebSockets).
- Hybrid Concurrency: Run blocking code in thread pool:
loop.run_in_executor(None, blocking_func). - Tying to Prior Topics:
- Coroutines: AsyncIO is coroutine-based;
yield fromevolved toawait. - Context Managers: Async versions for resources (e.g., connections).
- Multithreading: AsyncIO for I/O; threads for CPU (use
ProcessPoolExecutorwithrun_in_executor). - ABCs/Protocols:
AsyncIterable,Awaitablefor typed async. - Data Classes:
@dataclass class Response: data: str; status: intfor structured yields. - Type Hints:
async def func() -> Awaitable[str]: ...;AsyncGenerator[Yield, Send]. - Descriptors: Async properties:
@property async def data(self): ...(await in getter). - Metaclasses: Enforce async methods in ABCs.
- Coroutines: AsyncIO is coroutine-based;
Example: Typed async generator.
from typing import AsyncGenerator
import asyncio
async def async_stream(items: list[int]) -> AsyncGenerator[int, None]:
for item in items:
await asyncio.sleep(0.1)
yield item ** 2
async def main():
async for squared in async_stream([1, 2, 3]):
print(squared)
asyncio.run(main())
5. Best Practices and Pitfalls¶
- Prefer
asyncio.run(main()): Top-level entry; nest loops sparingly. - Debug Mode:
asyncio.run(main(), debug=True)logs switches; usenest_asyncioin Jupyter. - Exceptions: Await all tasks; use
add_done_callbackon futures. - Performance: Minimal overhead; scales to 100k+ connections. Profile with
asyncioevents. -
Pitfalls:
- Blocking Calls: Wrap sync I/O in
run_in_executor(e.g.,requests.get). - Nested Loops: Avoid; use
nest_asyncio.apply()for REPLs. - Cancellation Races: Always await tasks; handle
CancelledError. - GIL Irrelevant: Pure I/O, but CPU loops starve the loop—yield often.
- Typing: Use
mypywithasynciostubs;TypedDictfor event data.
- Blocking Calls: Wrap sync I/O in
-
Libraries:
aiofilesfor async files,aioredisfor Redis.
11. Magic Methods (Dunder Methods)¶
Magic methods (also called dunder methods, short for "double underscore") are special methods that define how objects behave with built-in operations. They're the foundation of Python's object model, enabling operator overloading, context managers, iteration, and more. Understanding them is crucial for creating Pythonic, intuitive classes.
Magic methods are invoked implicitly by Python's interpreter—you rarely call them directly. They follow the pattern __methodname__ and integrate deeply with Python's data model. This builds on descriptors (which power some magic methods) and metaclasses (which can enforce magic method contracts).
1. Object Creation and Representation¶
__init__, __new__, and __del__¶
class Person:
def __new__(cls, name, age):
"""Called before __init__; returns the instance."""
print(f"Creating {name}")
return super().__new__(cls) # Must return instance
def __init__(self, name, age):
"""Initializes the instance after creation."""
self.name = name
self.age = age
def __del__(self):
"""Called when object is garbage collected (unreliable timing)."""
print(f"Deleting {self.name}")
p = Person("Alice", 30) # Creating Alice
# p goes out of scope → Deleting Alice (eventually)
Key Insights:
__new__is a class method (receivescls);__init__is an instance method.__new__is useful for singletons, immutable types, or subclassing built-ins.__del__is unreliable—use context managers for cleanup.
__repr__ and __str__¶
class Point:
def __init__(self, x, y):
self.x = x
self.y = y
def __repr__(self):
"""Unambiguous representation for developers."""
return f"Point({self.x}, {self.y})"
def __str__(self):
"""Human-readable representation."""
return f"({self.x}, {self.y})"
p = Point(3, 4)
print(repr(p)) # Point(3, 4) # Used by debuggers
print(str(p)) # (3, 4) # Used by print()
print(p) # (3, 4) # print() calls __str__
Best Practice: __repr__ should be unambiguous; ideally, eval(repr(obj)) == obj.
2. Comparison and Ordering¶
class Version:
def __init__(self, major, minor, patch):
self.major = major
self.minor = minor
self.patch = patch
def __eq__(self, other):
"""Equality: == and != (if __ne__ not defined)."""
if not isinstance(other, Version):
return NotImplemented
return (self.major, self.minor, self.patch) == (other.major, other.minor, other.patch)
def __ne__(self, other):
"""Inequality: != (optional; defaults to not __eq__)."""
return not self.__eq__(other)
def __lt__(self, other):
"""Less than: <."""
if not isinstance(other, Version):
return NotImplemented
return (self.major, self.minor, self.patch) < (other.major, other.minor, other.patch)
def __le__(self, other):
"""Less than or equal: <=."""
return self.__lt__(other) or self.__eq__(other)
# __gt__ and __ge__ can be auto-generated from __lt__ with functools.total_ordering
v1 = Version(1, 2, 3)
v2 = Version(1, 2, 4)
print(v1 < v2) # True
print(v1 >= v2) # False
Pro Tip: Use @functools.total_ordering to auto-generate missing comparison methods from __eq__ and one of __lt__, __le__, __gt__, or __ge__.
__hash__ for Hashable Objects¶
@dataclass(frozen=True) # Frozen makes it hashable
class Point:
x: int
y: int
def __hash__(self):
"""Must be consistent with __eq__; immutable objects should be hashable."""
return hash((self.x, self.y))
p1 = Point(1, 2)
p2 = Point(1, 2)
print(hash(p1) == hash(p2)) # True
print({p1, p2}) # {Point(x=1, y=2)} # Only one in set
Rule: If __eq__ is defined, __hash__ must be defined (or set to None to make unhashable).
3. Attribute Access¶
__getattr__, __setattr__, __delattr__, and __getattribute__¶
class DynamicAttributes:
def __init__(self):
self._data = {}
def __getattr__(self, name):
"""Called when attribute not found in normal lookup."""
if name.startswith('_'):
raise AttributeError(f"'{type(self).__name__}' has no attribute '{name}'")
return self._data.get(name, f"Default for {name}")
def __setattr__(self, name, value):
"""Called on all attribute assignments."""
if name.startswith('_'):
super().__setattr__(name, value) # Normal assignment
else:
if not hasattr(self, '_data'):
super().__setattr__('_data', {})
self._data[name] = value
def __delattr__(self, name):
"""Called on del obj.attr."""
if name in self._data:
del self._data[name]
else:
super().__delattr__(name)
obj = DynamicAttributes()
obj.name = "Alice"
print(obj.name) # Alice
print(obj.age) # Default for age
del obj.name
print(obj.name) # Default for name
Advanced: __getattribute__ (use with caution—intercepts all attribute access):
class LoggingAccess:
def __getattribute__(self, name):
print(f"Accessing {name}")
return super().__getattribute__(name) # Must call super to avoid recursion
obj = LoggingAccess()
obj.x = 5
print(obj.x) # Accessing x\n5
Pitfall: Infinite recursion if __getattribute__ accesses self.attr without super().
4. Container Protocols¶
__len__, __getitem__, __setitem__, __delitem__, __contains__¶
class CustomList:
def __init__(self, items):
self._items = list(items)
def __len__(self):
"""Called by len()."""
return len(self._items)
def __getitem__(self, key):
"""Called by obj[key] and slicing."""
if isinstance(key, slice):
return CustomList(self._items[key])
return self._items[key]
def __setitem__(self, key, value):
"""Called by obj[key] = value."""
self._items[key] = value
def __delitem__(self, key):
"""Called by del obj[key]."""
del self._items[key]
def __contains__(self, item):
"""Called by 'in' operator."""
return item in self._items
def __iter__(self):
"""Called by iter() and for loops."""
return iter(self._items)
cl = CustomList([1, 2, 3, 4, 5])
print(len(cl)) # 5
print(cl[1:3]) # CustomList([2, 3])
print(3 in cl) # True
cl[0] = 10
print(cl[0]) # 10
5. Callable Objects¶
__call__¶
class Counter:
def __init__(self):
self.count = 0
def __call__(self, *args, **kwargs):
"""Makes instance callable like a function."""
self.count += 1
return self.count
counter = Counter()
print(counter()) # 1
print(counter()) # 2
print(counter()) # 3
Useful for function-like objects, decorators, or stateful callables.
6. Arithmetic Operations¶
class Vector:
def __init__(self, x, y):
self.x = x
self.y = y
def __add__(self, other):
"""+ operator."""
if not isinstance(other, Vector):
return NotImplemented
return Vector(self.x + other.x, self.y + other.y)
def __sub__(self, other):
"""- operator."""
return Vector(self.x - other.x, self.y - other.y)
def __mul__(self, scalar):
"""* operator."""
if not isinstance(scalar, (int, float)):
return NotImplemented
return Vector(self.x * scalar, self.y * scalar)
def __rmul__(self, scalar):
"""Right multiplication: scalar * vector."""
return self.__mul__(scalar)
def __truediv__(self, scalar):
"""/ operator (true division)."""
return Vector(self.x / scalar, self.y / scalar)
def __repr__(self):
return f"Vector({self.x}, {self.y})"
v1 = Vector(1, 2)
v2 = Vector(3, 4)
print(v1 + v2) # Vector(4, 6)
print(v1 * 3) # Vector(3, 6)
print(2 * v1) # Vector(2, 4) # Uses __rmul__
Reflected Operations: __radd__, __rsub__, etc., handle cases where the left operand doesn't support the operation.
7. Context Manager Protocol¶
__enter__ and __exit__¶
The context manager protocol is implemented via __enter__ and __exit__ methods. For a comprehensive guide, see Section 2: Context Managers.
class FileManager:
def __init__(self, filename, mode):
self.filename = filename
self.mode = mode
self.file = None
def __enter__(self):
"""Called on entry to 'with' block."""
self.file = open(self.filename, self.mode)
return self.file
def __exit__(self, exc_type, exc_val, exc_tb):
"""Called on exit; can suppress exceptions by returning True."""
if self.file:
self.file.close()
return False # Don't suppress exceptions
with FileManager("test.txt", "w") as f:
f.write("Hello")
# File automatically closed
Note: See Section 2: Context Managers for detailed coverage of context managers, contextlib, async context managers, and advanced patterns.
8. Iterator Protocol¶
__iter__ and __next__¶
The iterator protocol enables objects to be iterated over in for loops. For comprehensive coverage of generators and iterators, see Section 4: Generator Expressions and Coroutines.
class Countdown:
def __init__(self, start):
self.current = start
def __iter__(self):
"""Returns iterator (often self)."""
return self
def __next__(self):
"""Returns next value or raises StopIteration."""
if self.current <= 0:
raise StopIteration
self.current -= 1
return self.current + 1
for num in Countdown(5):
print(num) # 5 4 3 2 1
Note: See Section 4: Generator Expressions and Coroutines for detailed coverage of generators, generator expressions, coroutines, and the iterator protocol.
9. Advanced Patterns¶
__getattribute__ for Attribute Caching¶
class CachedAttribute:
def __init__(self, compute_func):
self.compute = compute_func
self.cache_name = f"_cached_{id(compute_func)}"
def __get__(self, instance, owner):
if instance is None:
return self
if not hasattr(instance, self.cache_name):
setattr(instance, self.cache_name, self.compute(instance))
return getattr(instance, self.cache_name)
class Expensive:
@CachedAttribute
def result(self):
print("Computing...")
return sum(range(1000000))
e = Expensive()
print(e.result) # Computing... → 499999500000
print(e.result) # 499999500000 (cached)
__slots__ for Memory Efficiency¶
class Point:
__slots__ = ('x', 'y') # Prevents __dict__ creation
def __init__(self, x, y):
self.x = x
self.y = y
p = Point(1, 2)
# p.z = 3 # AttributeError: 'Point' object has no attribute 'z'
Trade-off: Saves memory (~40% for many instances) but prevents dynamic attributes.
10. Best Practices and Pitfalls¶
- Return
NotImplementedfor unsupported operations (allows Python to try reflected ops). - Consistency:
__hash__must match__eq__; hashable objects should be immutable. - Performance: Magic methods add minimal overhead; profile before optimizing.
-
Pitfalls:
- Infinite recursion in
__getattribute__—always usesuper(). __del__timing is unreliable—use context managers.__slots__breaks inheritance if base classes have__dict__.__getattr__vs__getattribute__: former only for missing attrs; latter intercepts all.
- Infinite recursion in
-
Integration: Magic methods work with descriptors, metaclasses, and ABCs for powerful abstractions.
12. Collections Module¶
The collections module provides specialized container datatypes that extend built-in types (list, dict, tuple, set) with additional functionality. These are essential for efficient, Pythonic code in many scenarios—from counting items to managing ordered mappings.
1. Counter: Counting Hashable Objects¶
Counter is a dict subclass for counting hashable objects. It's unordered (like dicts in Python < 3.7) but provides convenient counting operations.
from collections import Counter
# Basic counting
words = ['apple', 'banana', 'apple', 'orange', 'banana', 'apple']
counter = Counter(words)
print(counter) # Counter({'apple': 3, 'banana': 2, 'orange': 1})
# Access counts
print(counter['apple']) # 3
print(counter['grape']) # 0 (doesn't raise KeyError)
# Update with another iterable
counter.update(['apple', 'grape', 'grape'])
print(counter) # Counter({'apple': 4, 'banana': 2, 'orange': 1, 'grape': 2})
# Most common
print(counter.most_common(2)) # [('apple', 4), ('banana', 2)]
# Arithmetic operations
c1 = Counter(a=3, b=1)
c2 = Counter(a=1, b=2)
print(c1 + c2) # Counter({'a': 4, 'b': 3})
print(c1 - c2) # Counter({'a': 2}) # Negative counts removed
print(c1 & c2) # Counter({'a': 1, 'b': 1}) # Intersection (min)
print(c1 | c2) # Counter({'a': 3, 'b': 2}) # Union (max)
Use Cases: Word frequency, inventory tracking, voting systems.
2. defaultdict: Dictionary with Default Factory¶
defaultdict automatically creates entries for missing keys using a factory function.
from collections import defaultdict
# Default to empty list
dd = defaultdict(list)
dd['fruits'].append('apple')
dd['fruits'].append('banana')
print(dd['fruits']) # ['apple', 'banana']
print(dd['vegetables']) # [] # Auto-created
# Default to 0 (for counting)
counts = defaultdict(int)
for word in ['apple', 'banana', 'apple']:
counts[word] += 1
print(counts) # defaultdict(<class 'int'>, {'apple': 2, 'banana': 1})
# Custom factory
def default_factory():
return {'count': 0, 'items': []}
dd = defaultdict(default_factory)
dd['group1']['count'] = 5
print(dd['group1']) # {'count': 5, 'items': []}
Use Cases: Grouping, graph adjacency lists, nested structures.
3. OrderedDict: Remember Insertion Order¶
OrderedDict maintains insertion order (now standard in Python 3.7+ dicts, but useful for compatibility and explicit ordering).
from collections import OrderedDict
od = OrderedDict()
od['first'] = 1
od['second'] = 2
od['third'] = 3
print(list(od.keys())) # ['first', 'second', 'third']
# Move to end
od.move_to_end('first')
print(list(od.keys())) # ['second', 'third', 'first']
# Pop last item
last = od.popitem(last=True)
print(last) # ('first', 1)
# Pop first item
first = od.popitem(last=False)
print(first) # ('second', 2)
Use Cases: LRU caches, maintaining order in configs, ordered mappings.
4. deque: Double-Ended Queue¶
deque (pronounced "deck") is a thread-safe, memory-efficient queue with O(1) appends/pops from both ends.
from collections import deque
# Create deque
d = deque([1, 2, 3])
print(d) # deque([1, 2, 3])
# Append/pop from right
d.append(4)
d.appendleft(0)
print(d) # deque([0, 1, 2, 3, 4])
# Pop from both ends
right = d.pop() # 4
left = d.popleft() # 0
print(d) # deque([1, 2, 3])
# Rotate
d.rotate(1) # Rotate right by 1
print(d) # deque([3, 1, 2])
d.rotate(-1) # Rotate left by 1
print(d) # deque([1, 2, 3])
# Max length (bounded deque)
bounded = deque(maxlen=3)
bounded.extend([1, 2, 3])
bounded.append(4) # Removes 1
print(bounded) # deque([2, 3, 4], maxlen=3)
Use Cases: Queues, stacks, sliding window algorithms, rotating buffers.
5. ChainMap: Combine Multiple Mappings¶
ChainMap groups multiple dicts into a single view, searching them in order.
from collections import ChainMap
defaults = {'color': 'red', 'size': 'medium'}
user_prefs = {'color': 'blue'}
env_vars = {'size': 'large'}
# Chain: user_prefs → env_vars → defaults
config = ChainMap(user_prefs, env_vars, defaults)
print(config['color']) # 'blue' (from user_prefs)
print(config['size']) # 'large' (from env_vars)
print(config.get('theme', 'default')) # 'default' (not found)
# New child map (prepended)
config = config.new_child({'theme': 'dark'})
print(config['theme']) # 'dark'
# Parents
print(list(config.parents)) # [ChainMap({'color': 'blue'}, {'size': 'large'}, ...)]
Use Cases: Configuration precedence, variable scoping, layered settings.
6. namedtuple: Tuple Subclass with Named Fields¶
namedtuple creates tuple subclasses with named fields, improving readability.
from collections import namedtuple
# Define named tuple
Point = namedtuple('Point', ['x', 'y'])
p = Point(1, 2)
print(p.x, p.y) # 1 2
print(p[0], p[1]) # 1 2 # Still indexable
# Unpacking
x, y = p
print(x, y) # 1 2
# _replace (creates new instance)
p2 = p._replace(x=10)
print(p2) # Point(x=10, y=2)
# _asdict
print(p._asdict()) # {'x': 1, 'y': 2}
# _fields
print(Point._fields) # ('x', 'y')
# Default values (Python 3.7+)
Person = namedtuple('Person', ['name', 'age', 'city'], defaults=['Unknown'])
p1 = Person('Alice', 30)
print(p1) # Person(name='Alice', age=30, city='Unknown')
Use Cases: Records, data structures, return values from functions.
7. Advanced Patterns¶
LRU Cache with OrderedDict¶
from collections import OrderedDict
class LRUCache:
def __init__(self, capacity):
self.cache = OrderedDict()
self.capacity = capacity
def get(self, key):
if key not in self.cache:
return -1
# Move to end (most recently used)
self.cache.move_to_end(key)
return self.cache[key]
def put(self, key, value):
if key in self.cache:
self.cache.move_to_end(key)
self.cache[key] = value
if len(self.cache) > self.capacity:
# Remove least recently used (first item)
self.cache.popitem(last=False)
cache = LRUCache(2)
cache.put(1, 'a')
cache.put(2, 'b')
print(cache.get(1)) # 'a'
cache.put(3, 'c') # Evicts 2
print(cache.get(2)) # -1
Grouping with defaultdict¶
from collections import defaultdict
# Group by key
data = [('fruit', 'apple'), ('fruit', 'banana'), ('vegetable', 'carrot')]
grouped = defaultdict(list)
for category, item in data:
grouped[category].append(item)
print(dict(grouped)) # {'fruit': ['apple', 'banana'], 'vegetable': ['carrot']}
8. Best Practices and Pitfalls¶
- Counter: Use for frequency counting;
most_common()is efficient. - defaultdict: Prevents KeyError; choose appropriate factory (list, int, set).
- deque: Use for queues/stacks;
maxlenfor bounded buffers. - OrderedDict: Explicit ordering;
move_to_end()for LRU. - ChainMap: Multiple lookup chains; updates affect first mapping only.
- namedtuple: Immutable records; use dataclasses for mutability needs.
-
Pitfalls:
Counterarithmetic removes zero/negative counts.defaultdictfactory called for missing keys—can have side effects.dequethread-safe for appends/pops, but not for iteration.ChainMapupdates only first mapping; usenew_child()for scoping.
-
Performance: All are optimized C implementations; use when appropriate.
13. itertools and functools¶
The itertools and functools modules provide powerful tools for functional programming, iteration, and function manipulation. They enable elegant, efficient solutions to common problems without explicit loops or boilerplate.
1. itertools: Iterator Building Blocks¶
itertools provides functions for creating and manipulating iterators efficiently.
Infinite Iterators¶
import itertools
# count: Infinite counter
counter = itertools.count(start=10, step=2)
print([next(counter) for _ in range(5)]) # [10, 12, 14, 16, 18]
# cycle: Cycle through iterable
cycle_iter = itertools.cycle(['A', 'B', 'C'])
print([next(cycle_iter) for _ in range(7)]) # ['A', 'B', 'C', 'A', 'B', 'C', 'A']
# repeat: Repeat value
repeat_iter = itertools.repeat(5, times=3)
print(list(repeat_iter)) # [5, 5, 5]
Finite Iterators¶
# accumulate: Running totals
nums = [1, 2, 3, 4, 5]
print(list(itertools.accumulate(nums))) # [1, 3, 6, 10, 15]
print(list(itertools.accumulate(nums, operator.mul))) # [1, 2, 6, 24, 120]
# chain: Chain iterables
print(list(itertools.chain([1, 2], [3, 4], [5]))) # [1, 2, 3, 4, 5]
# chain.from_iterable: Chain from iterable of iterables
print(list(itertools.chain.from_iterable([[1, 2], [3, 4]]))) # [1, 2, 3, 4]
# compress: Filter by boolean mask
data = ['A', 'B', 'C', 'D']
selectors = [True, False, True, False]
print(list(itertools.compress(data, selectors))) # ['A', 'C']
# dropwhile/takewhile: Drop/take while condition
nums = [1, 4, 6, 4, 1]
print(list(itertools.dropwhile(lambda x: x < 5, nums))) # [6, 4, 1]
print(list(itertools.takewhile(lambda x: x < 5, nums))) # [1, 4]
# filterfalse: Filter false values
print(list(itertools.filterfalse(lambda x: x % 2 == 0, range(10)))) # [1, 3, 5, 7, 9]
# groupby: Group consecutive elements
data = [1, 1, 2, 2, 2, 3, 3]
for key, group in itertools.groupby(data):
print(f"{key}: {list(group)}")
# 1: [1, 1]
# 2: [2, 2, 2]
# 3: [3, 3]
# islice: Slice iterator
print(list(itertools.islice(range(10), 2, 8, 2))) # [2, 4, 6]
# starmap: Map with unpacked args
points = [(1, 2), (3, 4), (5, 6)]
print(list(itertools.starmap(lambda x, y: x + y, points))) # [3, 7, 11]
# tee: Split iterator into n independent iterators
it1, it2 = itertools.tee(range(5), 2)
print(list(it1)) # [0, 1, 2, 3, 4]
print(list(it2)) # [0, 1, 2, 3, 4]
Combinatoric Iterators¶
# product: Cartesian product
print(list(itertools.product([1, 2], ['a', 'b'])))
# [(1, 'a'), (1, 'b'), (2, 'a'), (2, 'b')]
# permutations: All permutations
print(list(itertools.permutations([1, 2, 3], 2)))
# [(1, 2), (1, 3), (2, 1), (2, 3), (3, 1), (3, 2)]
# combinations: All combinations (order doesn't matter)
print(list(itertools.combinations([1, 2, 3], 2)))
# [(1, 2), (1, 3), (2, 3)]
# combinations_with_replacement: Combinations with repetition
print(list(itertools.combinations_with_replacement([1, 2], 2)))
# [(1, 1), (1, 2), (2, 2)]
2. functools: Higher-Order Functions¶
functools provides tools for working with functions and callable objects.
functools.partial: Partial Application¶
from functools import partial
def multiply(x, y, z):
return x * y * z
# Fix first argument
multiply_by_2 = partial(multiply, 2)
print(multiply_by_2(3, 4)) # 24 (2 * 3 * 4)
# Fix multiple arguments
multiply_2_3 = partial(multiply, 2, 3)
print(multiply_2_3(4)) # 24
# With keyword arguments
def power(base, exponent):
return base ** exponent
square = partial(power, exponent=2)
print(square(5)) # 25
functools.reduce: Functional Reduction¶
from functools import reduce
import operator
# Sum
nums = [1, 2, 3, 4, 5]
print(reduce(operator.add, nums)) # 15
# Product
print(reduce(operator.mul, nums)) # 120
# Custom function
print(reduce(lambda acc, x: acc + x**2, nums, 0)) # 55 (sum of squares)
# Maximum with initializer
print(reduce(max, [3, 1, 4, 1, 5], 0)) # 5
functools.wraps: Preserve Function Metadata¶
from functools import wraps
def my_decorator(func):
@wraps(func) # Preserves __name__, __doc__, etc.
def wrapper(*args, **kwargs):
"""Wrapper docstring."""
return func(*args, **kwargs)
return wrapper
@my_decorator
def greet(name):
"""Greet someone."""
return f"Hello, {name}!"
print(greet.__name__) # 'greet' (not 'wrapper')
print(greet.__doc__) # 'Greet someone.' (not 'Wrapper docstring.')
functools.lru_cache: Memoization¶
from functools import lru_cache
import time
@lru_cache(maxsize=128)
def fibonacci(n):
if n < 2:
return n
return fibonacci(n-1) + fibonacci(n-2)
# First call (computes)
start = time.time()
result = fibonacci(30)
print(f"Time: {time.time() - start:.4f}s") # ~0.0001s (cached)
# Second call (from cache)
start = time.time()
result = fibonacci(30)
print(f"Time: {time.time() - start:.4f}s") # ~0.0000s
# Cache info
print(fibonacci.cache_info()) # CacheInfo(hits=28, misses=31, maxsize=128, currsize=31)
functools.total_ordering: Auto-Generate Comparison Methods¶
from functools import total_ordering
@total_ordering
class Student:
def __init__(self, name, grade):
self.name = name
self.grade = grade
def __eq__(self, other):
return self.grade == other.grade
def __lt__(self, other):
return self.grade < other.grade
s1 = Student("Alice", 85)
s2 = Student("Bob", 90)
print(s1 <= s2) # True (auto-generated)
print(s1 >= s2) # False (auto-generated)
functools.singledispatch: Function Overloading¶
from functools import singledispatch
@singledispatch
def process(data):
"""Default implementation."""
return f"Processing {type(data).__name__}"
@process.register
def _(data: int):
return f"Processing integer: {data}"
@process.register
def _(data: str):
return f"Processing string: {data}"
@process.register(list)
def _(data):
return f"Processing list with {len(data)} items"
print(process(42)) # Processing integer: 42
print(process("hello")) # Processing string: hello
print(process([1, 2, 3])) # Processing list with 3 items
print(process(3.14)) # Processing float
functools.cached_property: Cached Property (Python 3.8+)¶
from functools import cached_property
class DataProcessor:
def __init__(self, data):
self.data = data
@cached_property
def expensive_computation(self):
"""Computed once, cached thereafter."""
print("Computing...")
return sum(x**2 for x in self.data)
dp = DataProcessor([1, 2, 3, 4, 5])
print(dp.expensive_computation) # Computing... → 55
print(dp.expensive_computation) # 55 (cached)
3. Advanced Patterns¶
Chaining Iterators¶
import itertools
# Process data in chunks
def process_in_chunks(data, chunk_size):
it = iter(data)
while True:
chunk = list(itertools.islice(it, chunk_size))
if not chunk:
break
yield sum(chunk)
data = range(10)
print(list(process_in_chunks(data, 3))) # [3, 12, 21, 9]
Windowed Iteration¶
def sliding_window(iterable, n):
"""Yield sliding windows of size n."""
it = iter(iterable)
window = list(itertools.islice(it, n))
if len(window) == n:
yield tuple(window)
for item in it:
window = window[1:] + [item]
yield tuple(window)
print(list(sliding_window([1, 2, 3, 4, 5], 3)))
# [(1, 2, 3), (2, 3, 4), (3, 4, 5)]
Functional Pipelines¶
from functools import partial
import itertools
# Pipeline: filter → map → reduce
data = range(10)
pipeline = itertools.chain(
itertools.filterfalse(lambda x: x % 2 == 0, data),
itertools.starmap(lambda x: x**2, [(x,) for x in range(5)])
)
print(list(pipeline)) # [1, 3, 5, 7, 9, 0, 1, 4, 9, 16]
4. Best Practices and Pitfalls¶
- itertools: Use for memory-efficient iteration; lazy evaluation saves memory.
- functools.partial: Reduces argument repetition; useful for callbacks.
- lru_cache: Use for expensive, pure functions; watch memory usage.
- total_ordering: Simplifies comparison classes; requires
__eq__and one comparison. -
Pitfalls:
- Infinite iterators can cause infinite loops—use
isliceortakewhile. groupbyrequires sorted data for grouping; usesorted()first.lru_cachedoesn't work with unhashable arguments; usefunctools.cache(3.9+) for simple cases.partialdoesn't preserve function signature for type checkers.
- Infinite iterators can cause infinite loops—use
-
Performance: All are optimized; prefer over manual loops for readability and speed.
14. Regular Expressions¶
Regular expressions (regex) are powerful pattern-matching tools for text processing. Python's re module provides comprehensive regex support, enabling search, replace, split, and validation operations on strings.
1. Basic Pattern Matching¶
import re
# Search for pattern
text = "The quick brown fox jumps over the lazy dog"
match = re.search(r'fox', text)
if match:
print(f"Found at index {match.start()}") # Found at index 16
# Match at start
match = re.match(r'The', text)
print(match is not None) # True
# Find all occurrences
matches = re.findall(r'\b\w{4}\b', text) # 4-letter words
print(matches) # ['quick', 'jumps', 'over', 'lazy']
# Find all with positions
for match in re.finditer(r'\b\w{4}\b', text):
print(f"{match.group()} at {match.start()}-{match.end()}")
2. Common Patterns¶
# Character classes
re.findall(r'[aeiou]', 'hello') # ['e', 'o']
re.findall(r'[^aeiou]', 'hello') # ['h', 'l', 'l'] (negation)
re.findall(r'[a-z]', 'Hello123') # ['e', 'l', 'l', 'o']
# Quantifiers
re.findall(r'\d+', 'I have 5 apples and 10 oranges') # ['5', '10']
re.findall(r'\d*', 'abc123') # ['', '', '', '123', '']
re.findall(r'\d?', 'abc123') # ['', '', '', '1', '2', '3', '']
re.findall(r'\d{2,4}', '123456789') # ['1234', '5678'] (2-4 digits)
# Anchors
re.findall(r'^\d+', '123abc') # ['123'] (start)
re.findall(r'\d+$', 'abc123') # ['123'] (end)
re.findall(r'\b\w+', 'hello world') # ['hello', 'world'] (word boundary)
# Groups
match = re.search(r'(\d{3})-(\d{3})-(\d{4})', 'Phone: 555-123-4567')
if match:
print(match.groups()) # ('555', '123', '4567')
print(match.group(1)) # '555'
3. Advanced Patterns¶
Named Groups¶
pattern = r'(?P<area>\d{3})-(?P<exchange>\d{3})-(?P<number>\d{4})'
match = re.search(pattern, 'Phone: 555-123-4567')
if match:
print(match.group('area')) # '555'
print(match.groupdict()) # {'area': '555', 'exchange': '123', 'number': '4567'}
Non-Capturing Groups¶
# (?:...) non-capturing group
match = re.search(r'(?:Mr|Mrs|Ms)\. (\w+)', 'Hello Mr. Smith')
if match:
print(match.group(1)) # 'Smith' (only one group)
# Lookahead/lookbehind
re.findall(r'\w+(?=\.)', 'cat. dog. bird') # ['cat', 'dog', 'bird'] (positive lookahead)
re.findall(r'(?<=Mr\. )\w+', 'Mr. Smith and Mrs. Jones') # ['Smith'] (positive lookbehind)
Alternation and Flags¶
# Alternation
re.findall(r'cat|dog', 'I have a cat and a dog') # ['cat', 'dog']
# Flags
text = 'Hello\nWorld\nPython'
re.findall(r'^[a-z]+', text, re.MULTILINE | re.IGNORECASE) # ['ello', 'orld', 'ython']
# Common flags:
# re.IGNORECASE (re.I): Case-insensitive
# re.MULTILINE (re.M): ^ and $ match line boundaries
# re.DOTALL (re.S): . matches newline
# re.VERBOSE (re.X): Allow comments and whitespace
4. Substitution and Splitting¶
# Replace
text = "Hello, world! Hello, Python!"
result = re.sub(r'Hello', 'Hi', text)
print(result) # "Hi, world! Hi, Python!"
# Replace with function
def replacer(match):
return match.group(0).upper()
result = re.sub(r'\b\w{5}\b', replacer, text)
print(result) # "HELLO, world! HELLO, Python!"
# Replace with backreferences
text = "2023-12-25"
result = re.sub(r'(\d{4})-(\d{2})-(\d{2})', r'\3/\2/\1', text)
print(result) # "25/12/2023"
# Split
text = "apple,banana;cherry:date"
result = re.split(r'[,;:]', text)
print(result) # ['apple', 'banana', 'cherry', 'date']
5. Compiled Patterns¶
For repeated use, compile patterns for better performance:
pattern = re.compile(r'\b\w+\b')
text = "Hello world Python"
# Use compiled pattern
matches = pattern.findall(text)
print(matches) # ['Hello', 'world', 'Python']
# Methods available on compiled pattern
pattern.search(text)
pattern.match(text)
pattern.findall(text)
pattern.sub('WORD', text)
6. Practical Examples¶
Email Validation¶
email_pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
emails = ['user@example.com', 'invalid.email', 'test@domain.co.uk']
for email in emails:
if re.match(email_pattern, email):
print(f"{email} is valid")
else:
print(f"{email} is invalid")
Extracting Data¶
log_line = "2023-12-25 10:30:45 ERROR Database connection failed"
pattern = r'(\d{4}-\d{2}-\d{2}) (\d{2}:\d{2}:\d{2}) (\w+) (.+)'
match = re.match(pattern, log_line)
if match:
date, time, level, message = match.groups()
print(f"Date: {date}, Time: {time}, Level: {level}, Message: {message}")
Text Cleaning¶
def clean_text(text):
# Remove extra whitespace
text = re.sub(r'\s+', ' ', text)
# Remove special characters (keep alphanumeric and spaces)
text = re.sub(r'[^a-zA-Z0-9\s]', '', text)
return text.strip()
dirty = "Hello!!! World... Python???"
print(clean_text(dirty)) # "Hello World Python"
7. Best Practices and Pitfalls¶
- Raw Strings: Always use
r'...'for patterns to avoid escape issues. - Compile Patterns: Use
re.compile()for repeated patterns. - Greedy vs Non-Greedy:
.*is greedy;.*?is non-greedy. -
Pitfalls:
- Catastrophic backtracking:
(a+)+bon'a'*100 + 'c'can be slow. - Over-matching:
.*matches everything—be specific. - Escaping: Special chars need escaping:
\.for literal dot. - Groups: Use
(?:...)for non-capturing groups when not needed.
- Catastrophic backtracking:
-
Performance: Simple patterns are fast; complex ones can be slow—profile if needed.
15. Exception Handling¶
Exception handling in Python allows graceful error recovery and proper resource cleanup. Beyond basic try/except, Python provides advanced features like exception chaining, context managers, custom exceptions, and exception hierarchies.
1. Basic Exception Handling¶
try:
result = 10 / 0
except ZeroDivisionError:
print("Cannot divide by zero")
except ValueError as e:
print(f"Value error: {e}")
except Exception as e:
print(f"Unexpected error: {e}")
else:
print("No exception occurred")
finally:
print("This always runs")
2. Exception Chaining and Context¶
Exception Chaining (Python 3+)¶
def process_data(data):
try:
return int(data)
except ValueError as e:
raise TypeError(f"Invalid data type: {data}") from e
try:
process_data("abc")
except TypeError as e:
print(f"Error: {e}")
print(f"Caused by: {e.__cause__}") # Original ValueError
Exception Context (Python 3.3+)¶
def inner():
try:
1 / 0
except ZeroDivisionError:
raise ValueError("Inner error")
def outer():
try:
inner()
except ValueError as e:
raise RuntimeError("Outer error") from None # Suppress chain
try:
outer()
except RuntimeError as e:
print(f"Error: {e}")
print(f"Context: {e.__context__}") # None (suppressed)
3. Custom Exceptions¶
class ValidationError(Exception):
"""Base exception for validation errors."""
pass
class EmailValidationError(ValidationError):
"""Raised when email validation fails."""
def __init__(self, email, message="Invalid email format"):
self.email = email
self.message = message
super().__init__(f"{message}: {email}")
class AgeValidationError(ValidationError):
"""Raised when age validation fails."""
pass
def validate_user(email, age):
if '@' not in email:
raise EmailValidationError(email)
if age < 0 or age > 150:
raise AgeValidationError(f"Invalid age: {age}")
try:
validate_user("invalid", 200)
except ValidationError as e:
print(f"Validation failed: {e}")
except EmailValidationError as e:
print(f"Email error: {e.email}")
4. Exception Hierarchies¶
class BaseAPIError(Exception):
"""Base exception for API errors."""
def __init__(self, message, status_code=500):
self.message = message
self.status_code = status_code
super().__init__(message)
class ClientError(BaseAPIError):
"""4xx errors."""
def __init__(self, message, status_code=400):
super().__init__(message, status_code)
class ServerError(BaseAPIError):
"""5xx errors."""
def __init__(self, message, status_code=500):
super().__init__(message, status_code)
class NotFoundError(ClientError):
"""404 Not Found."""
def __init__(self, resource):
super().__init__(f"Resource not found: {resource}", 404)
self.resource = resource
# Catching base class catches all subclasses
try:
raise NotFoundError("user/123")
except BaseAPIError as e:
print(f"API Error {e.status_code}: {e.message}")
5. Advanced Patterns¶
Retry with Exponential Backoff¶
import time
import random
def retry_with_backoff(func, max_retries=3, base_delay=1):
"""Retry function with exponential backoff."""
for attempt in range(max_retries):
try:
return func()
except Exception as e:
if attempt == max_retries - 1:
raise
delay = base_delay * (2 ** attempt) + random.uniform(0, 1)
print(f"Attempt {attempt + 1} failed: {e}. Retrying in {delay:.2f}s...")
time.sleep(delay)
def unreliable_function():
if random.random() < 0.7:
raise ConnectionError("Connection failed")
return "Success"
result = retry_with_backoff(unreliable_function)
print(result)
Exception Suppression¶
from contextlib import suppress
# Suppress specific exceptions
with suppress(FileNotFoundError, PermissionError):
with open("nonexistent.txt") as f:
print(f.read())
# Custom suppression
class SuppressExceptions:
def __init__(self, *exceptions):
self.exceptions = exceptions
def __enter__(self):
return self
def __exit__(self, exc_type, exc_val, exc_tb):
return exc_type is not None and issubclass(exc_type, self.exceptions)
with SuppressExceptions(ValueError, TypeError):
int("not a number") # Suppressed
print("Continuing...")
Exception Logging¶
import logging
logging.basicConfig(level=logging.ERROR)
def log_exceptions(func):
"""Decorator to log exceptions."""
def wrapper(*args, **kwargs):
try:
return func(*args, **kwargs)
except Exception as e:
logging.exception(f"Exception in {func.__name__}: {e}")
raise
return wrapper
@log_exceptions
def risky_function():
raise ValueError("Something went wrong")
try:
risky_function()
except ValueError:
pass # Already logged
6. Best Practices and Pitfalls¶
- Specific Exceptions: Catch specific exceptions, not bare
except:. - Exception Hierarchy: Create custom exceptions for your domain.
- Resource Cleanup: Use
finallyor context managers for cleanup. - Exception Chaining: Use
fromto preserve exception context. -
Pitfalls:
- Catching
Exceptiontoo broadly hides bugs. - Swallowing exceptions silently makes debugging hard.
except:without exception type catchesSystemExitandKeyboardInterrupt.- Re-raising without context loses original traceback.
- Catching
-
Performance: Exception handling is fast; don't use for control flow.
16. Enum¶
The enum module provides enumeration types for creating sets of named constants. Enums improve code readability, type safety, and prevent magic numbers/strings.
1. Basic Enum Usage¶
from enum import Enum
class Color(Enum):
RED = 1
GREEN = 2
BLUE = 3
# Access by name
print(Color.RED) # Color.RED
print(Color.RED.name) # 'RED'
print(Color.RED.value) # 1
# Comparison
print(Color.RED == Color.GREEN) # False
print(Color.RED is Color.RED) # True
# Iteration
for color in Color:
print(color) # Color.RED, Color.GREEN, Color.BLUE
# Membership
print(Color.RED in Color) # True
2. Enum Variants¶
IntEnum: Integer Enums¶
from enum import IntEnum
class Priority(IntEnum):
LOW = 1
MEDIUM = 2
HIGH = 3
# Can compare with integers
print(Priority.HIGH > 2) # True
print(Priority.LOW + 1) # 2
Flag: Bitwise Flags¶
from enum import Flag, auto
class Permission(Flag):
READ = auto()
WRITE = auto()
EXECUTE = auto()
# Combine flags
rw = Permission.READ | Permission.WRITE
print(rw) # Permission.READ|WRITE
# Check flags
print(Permission.READ in rw) # True
print(Permission.EXECUTE in rw) # False
# All combinations
all_perms = Permission.READ | Permission.WRITE | Permission.EXECUTE
StrEnum: String Enums (Python 3.11+)¶
from enum import StrEnum
class Status(StrEnum):
PENDING = "pending"
ACTIVE = "active"
INACTIVE = "inactive"
# Can use as strings
print(f"Status: {Status.PENDING}") # Status: pending
print(Status.PENDING.upper()) # 'PENDING'
3. Advanced Features¶
Auto Values¶
from enum import Enum, auto
class Direction(Enum):
NORTH = auto()
SOUTH = auto()
EAST = auto()
WEST = auto()
print(Direction.NORTH.value) # 1
print(Direction.SOUTH.value) # 2
Custom Methods¶
class Planet(Enum):
MERCURY = (3.303e+23, 2.4397e6)
VENUS = (4.869e+24, 6.0518e6)
EARTH = (5.976e+24, 6.37814e6)
def __init__(self, mass, radius):
self.mass = mass
self.radius = radius
@property
def surface_gravity(self):
G = 6.67300E-11
return G * self.mass / (self.radius * self.radius)
print(Planet.EARTH.surface_gravity) # 9.802652743337129
Unique Decorator¶
from enum import Enum, unique
@unique
class Status(Enum):
ACTIVE = 1
INACTIVE = 2
# PENDING = 1 # Would raise ValueError: duplicate values
# Without @unique, duplicates are allowed (aliases)
4. Functional API¶
from enum import Enum
# Create enum from names
Color = Enum('Color', ['RED', 'GREEN', 'BLUE'])
print(Color.RED) # Color.RED
# Create enum from name-value pairs
Status = Enum('Status', [('PENDING', 1), ('ACTIVE', 2), ('INACTIVE', 3)])
print(Status.PENDING.value) # 1
# From dictionary
Priority = Enum('Priority', {'LOW': 1, 'MEDIUM': 2, 'HIGH': 3})
5. Best Practices and Pitfalls¶
- Use Enums: Replace magic numbers/strings with enums for clarity.
- IntEnum: Use when you need integer comparison/arithmetic.
- Flag: Use for bitwise combinations (permissions, options).
-
Pitfalls:
- Enum values must be hashable (no mutable types).
- Enum members are singletons—compare with
isfor identity. - Don't modify enum values after creation.
-
Integration: Enums work with type hints, dataclasses, and serialization.
17. Pathlib¶
pathlib (introduced in Python 3.4) provides an object-oriented interface for filesystem paths, replacing the older os.path module with a more intuitive, cross-platform API.
1. Basic Path Operations¶
from pathlib import Path
# Create Path objects
p = Path('/home/user/documents/file.txt')
p = Path('documents', 'file.txt') # Cross-platform path joining
p = Path.cwd() / 'documents' / 'file.txt' # Using / operator
# Path components
print(p.name) # 'file.txt'
print(p.stem) # 'file'
print(p.suffix) # '.txt'
print(p.suffixes) # ['.txt']
print(p.parent) # Path('documents')
print(p.parts) # ('documents', 'file.txt')
# Absolute vs relative
print(p.is_absolute()) # False
abs_p = p.resolve() # Absolute path
print(abs_p.is_absolute()) # True
2. File and Directory Operations¶
from pathlib import Path
# Check existence
p = Path('file.txt')
print(p.exists()) # True/False
print(p.is_file()) # True if file
print(p.is_dir()) # True if directory
# Create directories
dir_path = Path('new_dir')
dir_path.mkdir(exist_ok=True) # Create if doesn't exist
dir_path.mkdir(parents=True, exist_ok=True) # Create parents too
# Read/write files
p = Path('data.txt')
p.write_text('Hello, World!') # Write text
content = p.read_text() # Read text
p.write_bytes(b'Binary data') # Write bytes
data = p.read_bytes() # Read bytes
# Open file (context manager)
with p.open('r') as f:
content = f.read()
3. Path Manipulation¶
from pathlib import Path
# Join paths
base = Path('/home/user')
file = base / 'documents' / 'file.txt'
print(file) # /home/user/documents/file.txt
# Change parts
p = Path('/home/user/documents/file.txt')
new_p = p.with_name('newfile.txt') # Change filename
new_p = p.with_suffix('.pdf') # Change extension
new_p = p.with_stem('newfile') # Change stem (Python 3.9+)
# Relative paths
base = Path('/home/user')
target = Path('/home/user/documents/file.txt')
relative = target.relative_to(base) # documents/file.txt
print(relative) # documents/file.txt
4. Directory Traversal¶
from pathlib import Path
# Iterate directory
dir_path = Path('.')
for item in dir_path.iterdir():
print(item)
# Find files by pattern
for py_file in dir_path.glob('*.py'):
print(py_file)
# Recursive glob
for py_file in dir_path.rglob('*.py'):
print(py_file)
# List directory contents
files = list(dir_path.iterdir())
py_files = list(dir_path.glob('*.py'))
5. Advanced Operations¶
from pathlib import Path
import shutil
# File operations
p = Path('source.txt')
p.rename('target.txt') # Rename
p.replace('target.txt') # Replace (atomic on Unix)
# Copy (requires shutil)
import shutil
shutil.copy(p, 'backup.txt')
# Remove
p.unlink() # Remove file
dir_path.rmdir() # Remove empty directory
shutil.rmtree(dir_path) # Remove directory tree
# File stats
p = Path('file.txt')
print(p.stat().st_size) # File size
print(p.stat().st_mtime) # Modification time
# Touch (create/update timestamp)
p.touch()
6. Path Comparison and Matching¶
from pathlib import Path
# Comparison
p1 = Path('/home/user/file.txt')
p2 = Path('/home/user/file.txt')
print(p1 == p2) # True (normalized comparison)
# Match patterns
p = Path('document.txt')
print(p.match('*.txt')) # True
print(p.match('doc*.txt')) # True
# Pure paths (no filesystem access)
from pathlib import PurePath, PurePosixPath, PureWindowsPath
pure = PurePath('/home/user/file.txt')
print(pure.parts) # ('/', 'home', 'user', 'file.txt')
7. Best Practices and Pitfalls¶
- Use Pathlib: Prefer over
os.pathfor new code (Python 3.6+). - Path Operations: Use
/operator for joining paths. - Cross-Platform: Pathlib handles Windows/Unix differences automatically.
-
Pitfalls:
Pathobjects are not strings—convert withstr()when needed.glob()returns generator—convert to list if needed multiple times.resolve()follows symlinks; useabsolute()if you don't want that.
-
Integration: Works with
open(),shutil, and most file operations.
18. Memory Management and Garbage Collection¶
Understanding Python's memory management is crucial for writing efficient code and debugging memory issues. Python uses automatic memory management via reference counting and a cyclic garbage collector.
1. Reference Counting¶
Python primarily uses reference counting to manage memory:
import sys
# Reference count
x = [1, 2, 3]
print(sys.getrefcount(x)) # Usually 2 (x + temporary in getrefcount)
# Increase references
y = x
print(sys.getrefcount(x)) # 3
# Decrease references
del y
print(sys.getrefcount(x)) # 2
# Object deleted when refcount reaches 0
del x
# Object is garbage collected
2. Garbage Collection¶
Cyclic Garbage Collection¶
import gc
# Circular references
class Node:
def __init__(self, value):
self.value = value
self.next = None
# Create cycle
a = Node(1)
b = Node(2)
a.next = b
b.next = a # Cycle!
# Reference counting can't handle cycles
del a, b
# gc.collect() will clean up cycles
# Manual collection
gc.collect() # Returns number of objects collected
# GC statistics
print(gc.get_stats()) # Collection statistics
GC Configuration¶
import gc
# Get thresholds
print(gc.get_threshold()) # (700, 10, 10) - generation thresholds
# Set thresholds
gc.set_threshold(500, 5, 5) # More aggressive collection
# Disable/enable
gc.disable()
gc.enable()
# Debug
gc.set_debug(gc.DEBUG_LEAK) # Debug mode
3. Weak References¶
Weak references don't prevent garbage collection:
import weakref
class Data:
def __init__(self, value):
self.value = value
obj = Data(42)
weak_ref = weakref.ref(obj)
print(weak_ref()) # <Data object> (alive)
del obj
print(weak_ref()) # None (collected)
# WeakValueDictionary
weak_dict = weakref.WeakValueDictionary()
obj = Data(100)
weak_dict['key'] = obj
print(weak_dict['key']) # <Data object>
del obj
print('key' in weak_dict) # False (removed automatically)
4. Memory Profiling¶
import sys
import tracemalloc
# Start tracing
tracemalloc.start()
# Your code
data = [i for i in range(1000000)]
# Get snapshot
snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics('lineno')
# Print top 10
for stat in top_stats[:10]:
print(stat)
# Get current size
current, peak = tracemalloc.get_traced_memory()
print(f"Current: {current / 1024 / 1024:.2f} MB")
print(f"Peak: {peak / 1024 / 1024:.2f} MB")
tracemalloc.stop()
5. Memory Optimization Techniques¶
__slots__ for Memory Efficiency¶
class Point:
__slots__ = ('x', 'y') # Prevents __dict__ creation
def __init__(self, x, y):
self.x = x
self.y = y
# Saves ~20-50% memory for many instances (varies by use case)
points = [Point(i, i*2) for i in range(1000000)]
Generators for Large Data¶
# Bad: Creates list in memory
def squares_list(n):
return [x**2 for x in range(n)]
# Good: Generator (lazy)
def squares_gen(n):
return (x**2 for x in range(n))
# Use generator
for square in squares_gen(1000000):
if square > 100:
break # Only computes what's needed
Object Pooling¶
class ObjectPool:
def __init__(self, factory, max_size=100):
self.factory = factory
self.pool = []
self.max_size = max_size
def get(self):
if self.pool:
return self.pool.pop()
return self.factory()
def put(self, obj):
if len(self.pool) < self.max_size:
self.pool.append(obj)
# Reuse objects instead of creating new ones
pool = ObjectPool(lambda: [])
obj = pool.get()
pool.put(obj) # Reuse
6. Best Practices and Pitfalls¶
- Use Generators: For large datasets, use generators instead of lists.
- Weak References: Use for caches and observer patterns.
__slots__: Use for classes with many instances.-
Pitfalls:
- Circular references prevent automatic cleanup—use weakrefs.
- Large objects in global scope persist—clean up explicitly.
sys.getrefcount()includes temporary reference—subtract 1.- GC can cause pauses—tune thresholds for real-time systems.
-
Profiling: Use
tracemallocormemory_profilerto find leaks.
19. Import System¶
Python's import system is powerful and extensible. Understanding it enables dynamic imports, plugin systems, and custom import behaviors.
1. Basic Import Mechanisms¶
# Standard imports
import math
from math import sqrt
from math import pi as PI
# Import all (not recommended)
from math import *
# Conditional imports
try:
import optional_module
except ImportError:
optional_module = None
2. Dynamic Imports¶
# Using __import__
module_name = 'math'
math_module = __import__(module_name)
print(math_module.sqrt(16)) # 4.0
# Using importlib (preferred)
import importlib
module = importlib.import_module('math')
print(module.sqrt(16)) # 4.0
# Reload module
importlib.reload(module) # Reloads module (useful for development)
3. Import Hooks and Finders¶
Custom Import Finder¶
import sys
import importlib.util
class CustomFinder:
def find_spec(self, name, path, target=None):
if name == 'my_custom_module':
# Create spec from code string
code = '''
def hello():
return "Hello from custom module!"
'''
spec = importlib.util.spec_from_loader(name, loader=None)
spec.loader = importlib.util.module_from_spec(spec)
exec(code, spec.loader.__dict__)
return spec
return None
# Register finder
sys.meta_path.insert(0, CustomFinder())
# Now can import
import my_custom_module
print(my_custom_module.hello()) # Hello from custom module!
4. Module Loading¶
import importlib.util
# Load from file
spec = importlib.util.spec_from_file_location("mymodule", "/path/to/mymodule.py")
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)
# Load from string
code = '''
def func():
return "Loaded from string"
'''
module = importlib.util.module_from_spec(importlib.util.spec_from_loader("temp", loader=None))
exec(code, module.__dict__)
print(module.func()) # Loaded from string
5. Package Imports¶
# Package structure
# mypackage/
# __init__.py
# module1.py
# module2.py
# Import from package
from mypackage import module1
from mypackage.module2 import function
# Relative imports (within package)
# In module1.py:
# from . import module2
# from .module2 import function
6. __import__ and importlib¶
import importlib
# Import module
module = importlib.import_module('os.path')
# Get module from sys.modules
import sys
if 'mymodule' in sys.modules:
module = sys.modules['mymodule']
# Check if module is package
print(importlib.util.find_spec('os').submodule_search_locations) # Not None = package
7. Best Practices and Pitfalls¶
- Use
importlib: Prefer over__import__for dynamic imports. - Avoid
import *: Pollutes namespace and makes code unclear. - Relative Imports: Use in packages; absolute imports elsewhere.
-
Pitfalls:
- Circular imports cause
AttributeError—restructure code. sys.modulescache can cause stale imports—usereload().- Import hooks are advanced—use sparingly.
__init__.pyrequired for packages (Python < 3.3).
- Circular imports cause
-
Performance: Imports are cached—first import is slower.
20. Serialization¶
Serialization converts Python objects to formats suitable for storage or transmission. Python provides several serialization mechanisms, each with different trade-offs.
1. Pickle: Python Native Serialization¶
pickle serializes Python objects to binary format:
import pickle
# Serialize
data = {'name': 'Alice', 'age': 30, 'scores': [85, 90, 88]}
with open('data.pkl', 'wb') as f:
pickle.dump(data, f)
# Deserialize
with open('data.pkl', 'rb') as f:
loaded_data = pickle.load(f)
print(loaded_data) # {'name': 'Alice', 'age': 30, 'scores': [85, 90, 88]}
# In-memory serialization
serialized = pickle.dumps(data)
loaded = pickle.loads(serialized)
Security Warning: Only unpickle data from trusted sources—pickle can execute arbitrary code!
2. JSON: Text-Based Serialization¶
JSON is human-readable and language-agnostic:
import json
# Serialize
data = {'name': 'Alice', 'age': 30, 'scores': [85, 90, 88]}
json_str = json.dumps(data)
print(json_str) # {"name": "Alice", "age": 30, "scores": [85, 90, 88]}
# Deserialize
loaded = json.loads(json_str)
print(loaded) # {'name': 'Alice', 'age': 30, 'scores': [85, 90, 88]}
# With files
with open('data.json', 'w') as f:
json.dump(data, f, indent=2) # Pretty print
with open('data.json', 'r') as f:
loaded = json.load(f)
Custom JSON Encoders¶
import json
from datetime import datetime
class DateTimeEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, datetime):
return obj.isoformat()
return super().default(obj)
data = {'timestamp': datetime.now()}
json_str = json.dumps(data, cls=DateTimeEncoder)
print(json_str) # {"timestamp": "2023-12-25T10:30:45.123456"}
3. Advanced Serialization¶
Protocol Buffers (protobuf)¶
# Requires: pip install protobuf
# Define schema in .proto file, then:
# protoc --python_out=. schema.proto
# import schema_pb2
# message = schema_pb2.Person()
# message.name = "Alice"
# message.age = 30
# serialized = message.SerializeToString()
MessagePack: Binary JSON¶
# Requires: pip install msgpack
import msgpack
data = {'name': 'Alice', 'age': 30}
packed = msgpack.packb(data)
unpacked = msgpack.unpackb(packed)
print(unpacked) # {'name': 'Alice', 'age': 30}
4. Serialization with Dataclasses¶
from dataclasses import dataclass, asdict
import json
@dataclass
class Person:
name: str
age: int
scores: list
p = Person('Alice', 30, [85, 90, 88])
# Convert to dict
data = asdict(p)
json_str = json.dumps(data)
# From JSON
loaded_dict = json.loads(json_str)
p2 = Person(**loaded_dict)
5. Best Practices and Pitfalls¶
- JSON: Use for human-readable, cross-language data.
- Pickle: Use for Python-only, trusted data.
- Security: Never unpickle untrusted data.
-
Pitfalls:
- Pickle version compatibility—use same Python version.
- JSON doesn't support all Python types—use custom encoders.
- Large objects—consider streaming or chunking.
- Circular references—handle with custom serializers.
-
Performance: Pickle is faster; JSON is more portable.