diff --git "a/Python/app.js" "b/Python/app.js" --- "a/Python/app.js" +++ "b/Python/app.js" @@ -83,7 +83,7 @@ const MODULE_CONTENT = {
| Type | Mutable | Ordered | Hashable | Use Case |
|---|---|---|---|---|
| list | ✓ | ✓ | ✗ | Sequential data, time series, feature lists |
| bytearray | ✓ | ✓ | ✗ | Mutable binary buffers |
| Operation | list | dict | set |
|---|---|---|---|
| Lookup by index/key | O(1) | O(1) | — |
| Search (x in ...) | O(n) | O(1) | O(1) |
| Insert/Append | O(1) end, O(n) middle | O(1) | O(1) |
| Delete | O(n) | O(1) | O(1) |
| Sort | O(n log n) | — | — |
| Iteration | O(n) | O(n) | O(n) |
Real-world impact: Checking if an item exists in a list of 1M elements = ~50ms. In a set = ~0.00005ms. That's 1,000,000x faster. Always use sets/dicts for membership testing.
+ +a = [1, 2, 3] — the list lives on the heap; a is a name that points to it. b = a makes both point to the same list — no copy is made. This is called aliasing.a = [1,2,3] creates a list on the heap; a points to it. b = a makes both point to the same list. This is aliasing — the #1 source of bugs in beginner Python code.Reference Counting: Python uses reference counting + cyclic garbage collector. Each object tracks how many names point to it. When count hits 0, memory is freed immediately. del doesn't always free memory — it just decrements the reference count.
Integer Interning: Python caches integers from -5 to 256 and short strings. So a = 100; b = 100; a is b is True, but a = 1000; b = 1000; a is b may be False. Never use is for value comparison — always use ==.
Garbage Collection Generations: CPython has 3 generations (gen0, gen1, gen2). New objects start in gen0. Objects that survive a collection move to the next generation. Long-lived objects (gen2) are collected less frequently. Use gc.get_stats() to monitor.
Reference Counting: Each object tracks how many names reference it. When count = 0, freed immediately. del decrements the count, doesn't necessarily free memory.
Integer Interning: Python caches integers -5 to 256. So a = 100; b = 100; a is b → True. But a = 1000; b = 1000; a is b → may be False. Never use is for value comparison.
Garbage Collection: 3 generations (gen0, gen1, gen2). New objects in gen0. Survivors promoted. Use gc.collect() after deleting large ML models.
yield, consuming O(1) memory regardless of data size. A list of 1 billion items = ~8GB RAM. A generator of 1 billion items = ~100 bytes. The Iterator Protocol: any object with __iter__ and __next__ methods. Generators are just syntactic sugar for iterators.yield suspends state, return terminates. A list of 1B items = ~8GB. A generator = ~100 bytes. The Iterator Protocol: any object with __iter__ + __next__. Generator expressions: (x**2 for x in range(10**9)) — O(1) memory.yield vs return: return terminates the function. yield suspends it, saving the entire stack frame (local variables, instruction pointer). The next next() call resumes from where it left off.
yield from: Delegates to a sub-generator. yield from iterable is equivalent to for item in iterable: yield item but also forwards send() and throw() calls.
Generator Expressions: (x**2 for x in range(10**9)) — uses O(1) memory. List comprehension [x**2 for x in range(10**9)] — tries to allocate ~8GB. Always prefer generator expressions for large data.
yield from: Delegates to sub-generator. Forwards send() and throw(). Essential for building composable data pipelines.
send(): Two-way communication with generators (coroutines). value = yield result — both receives and produces values.
Functions in Python are first-class objects — they can be passed as arguments, returned from other functions, and assigned to variables. A closure is a function that captures variables from its enclosing scope. This is the foundation of decorators, callbacks, and functional programming in Python.
+Functions are first-class objects — passed as args, returned, assigned. A closure captures variables from enclosing scope. Foundation of decorators, callbacks, and functional programming.
-def append_to(element, target=[]): — This default list is shared across ALL calls! Default arguments are evaluated ONCE at function definition time, not at call time. Fix: use target=None then if target is None: target = [].
+ def f(x, lst=[]): — list shared across ALL calls. Fix: lst=None.[lambda: i for i in range(5)] — all return 4! Fix: lambda i=i: i.list(a) copies outer list but shares inner objects.s += "text" in a loop creates new string every time — O(n²). Use ''.join(parts).Late Binding Closures: [lambda: i for i in range(5)] — all lambdas return 4! Variables in closures are looked up at call time, not definition time. Fix: [lambda i=i: i for i in range(5)].
Tuple Assignment Gotcha: a = ([1,2],); a[0] += [3] raises TypeError AND modifies the list! The += first mutates the list in-place (succeeds), then tries to reassign the tuple element (fails).
BaseException → Exception (catch this) → ValueError, TypeError, KeyError, FileNotFoundError, ConnectionError...except:. (2) Catch specific exceptions. (3) Use else for success path. (4) finally always runs. (5) Create custom exceptions for your project.
+ | Class | Purpose | Why It Matters in DS |
|---|---|---|
| defaultdict | Dict with default factory | Group data without KeyError: defaultdict(list) |
| Counter | Count hashable objects | Label distribution: Counter(y_train) |
| namedtuple | Lightweight immutable class | Return multiple values with names, not indices |
| OrderedDict | Dict remembering insertion order | Legacy (dicts are ordered 3.7+), useful for move_to_end() |
| deque | Double-ended queue | Sliding window computations, BFS algorithms |
| ChainMap | Stack multiple dicts | Layer config: defaults → env → CLI overrides |
| Class | Purpose | Project Use Case |
| defaultdict | Dict with default factory | Group data: defaultdict(list) |
| Counter | Count hashable objects | Label distribution, word frequency |
| namedtuple | Lightweight immutable class | Return multiple named values |
| deque | Double-ended queue | Sliding window, BFS, ring buffer |
| ChainMap | Stack multiple dicts | Config layers: defaults → env → CLI |
| OrderedDict | Ordered dict (legacy) | move_to_end() for LRU cache |
| Function | What It Does | DS Use Case |
|---|---|---|
chain() | Concatenate iterables | Merge multiple data files lazily |
islice() | Slice any iterator | Take first N records from generator |
groupby() | Group consecutive elements | Process sorted log entries by date |
product() | Cartesian product | Generate hyperparameter grid |
| Function | What It Does | Project Use |
chain() | Concatenate iterables lazily | Merge data files |
islice() | Slice any iterator | Take first N from generator |
groupby() | Group consecutive elements | Process sorted logs by date |
product() | Cartesian product | Hyperparameter grid |
combinations() | All r-length combos | Feature interaction pairs |
starmap() | map() with unpacked args | Apply function to paired data |
accumulate() | Running total/custom accumulator | Cumulative sums, running max |
tee() | Clone an iterator N times | Multiple passes over data stream |
accumulate() | Running accumulator | Cumulative sums, running max |
tee() | Clone iterator N times | Multiple passes over stream |
f-strings (3.6+) are the fastest formatting method. They support expressions: f"{accuracy:.2%}" → "95.23%", f"{x=}" (3.8+) → "x=42" for debugging. Interning: Python interns string literals and identifiers. 'hello' is 'hello' is True because both point to the same interned object.
| Format | Read | Write | Best For |
|---|---|---|---|
| JSON | json.load(f) | json.dump(obj, f) | Configs, API responses |
| CSV | csv.DictReader(f) | csv.DictWriter(f) | Tabular data (small) |
| YAML | yaml.safe_load(f) | yaml.dump(obj, f) | Config files |
| Pickle | pickle.load(f) | pickle.dump(obj, f) | Python objects, models |
| Parquet | pd.read_parquet() | df.to_parquet() | Large DataFrames (fast) |
| SQLite | sqlite3.connect() | SQL queries | Local database |
Stop using os.path.join(). Use pathlib.Path — object-oriented, cross-platform, reads like English. Path('data') / 'train' / 'images' builds paths. path.glob('*.csv') finds files. path.read_text() reads without open().
Stop using os.path.join(). Use pathlib.Path: Path('data') / 'train' / 'images'. Methods: .glob(), .read_text(), .mkdir(parents=True), .exists(), .suffix, .stem. Cross-platform, readable, powerful.
| Tool | Best For | Key Feature |
|---|---|---|
| venv | Simple projects | Built-in, lightweight |
| conda | DS/ML (C dependencies) | Handles CUDA, MKL |
| conda | DS/ML (C deps) | Handles CUDA, MKL, OpenCV |
| poetry | Modern packaging | Lock files, deterministic builds |
| uv | Speed (Rust-based) | 10-100x faster than pip |
| uv | Speed | 10-100x faster pip (Rust-based) |
| pip-tools | Requirements pinning | pip-compile for lock files |
f-strings (3.6+): f"{accuracy:.2%}" → "95.23%". f"{x=}" (3.8+) → "x=42" for debugging. f"{name!r}" → shows repr. regex: re.compile(pattern) for repeated use. re.sub() for cleaning. re.findall() for extraction. Always compile patterns used in loops.
argparse: Built-in CLI parsing. click: Decorator-based, more Pythonic. typer: Modern, uses type hints. Every production project needs a CLI for: training, evaluation, data processing, deployment scripts.
`, code: `Answer: Lists are mutable, tuples immutable. Deeper: tuples are hashable (can be dict keys), use less memory (no over-allocation), and signal intent ("this shouldn't change"). Use tuples for (lat, lon) pairs, function return values, dict keys. Use lists for collections that grow.
Answer: The GIL prevents true multi-threading for CPU-bound tasks. But NumPy, Pandas, and scikit-learn release the GIL during C-level computations. So vectorized operations ARE parallel at the C level. For pure Python CPU work, use multiprocessing. For I/O, threading works fine.
Answer: copy.copy() copies outer container but shares inner objects. copy.deepcopy() recursively copies everything. Real scenario: list of dicts (configs). Shallow copy means modifying one config modifies all. Pandas .copy() is deep by default — but df2 = df is NOT a copy.
Answer: def f(x, lst=[]): — the default list is created ONCE at function definition and shared across all calls. So f(1); f(2) gives [1, 2] not [2]. Fix: use lst=None then if lst is None: lst = []. This is the #1 Python gotcha in interviews.
Answer: Generators yield values one at a time using yield, consuming O(1) memory. A list of 1B items = ~8GB. A generator = ~100 bytes. Critical for: reading large files, streaming data, batch training. yield from delegates to sub-generators. Generator expressions: (x for x in data).
Answer: Python resolves names in order: Local → Enclosing → Global → Built-in. This is why list = [1,2] breaks list(). Use nonlocal for enclosing scope, global for module scope.
Answer: (1) pd.read_csv(chunksize=50000), (2) usecols=['needed'], (3) dtype={'col': 'int32'}, (4) Dask for lazy Pandas, (5) DuckDB for SQL on CSV with zero overhead, (6) Polars for fast out-of-core processing.
Answer: Dict: O(1) via hash tables (open addressing). List: O(n) linear scan. Dict hashes the key to compute slot index, handles collisions via probing. Sets use the same mechanism. x in my_set is O(1) but x in my_list is O(n).
Answer: Two mechanisms: (1) Reference counting — freed when count hits 0. (2) Cyclic GC — detects reference cycles (A→B→A). Runs on 3 generations. Long-lived objects collected less often. gc.collect() forces collection — useful after deleting large ML models.
__slots__ and when to use it?Answer: By default, Python objects store attributes in a __dict__ (a dict per instance). __slots__ replaces this with a fixed-size array. Saves ~40% memory per instance. Use when creating millions of small objects (data points, nodes). Trade-off: can't add attributes dynamically.
Answer: Tuples: immutable, hashable (dict keys), less memory. Lists: mutable, growable. Use tuples for fixed data (coordinates, config). Use lists for collections that change. Tuples signal "this shouldn't be modified."
Answer: GIL prevents multi-threading for CPU-bound Python. But NumPy/Pandas release the GIL during C operations. For pure Python CPU work → multiprocessing. For I/O → threading works. For data science, the GIL rarely matters.
Answer: copy.copy(): outer container copied, inner objects shared. copy.deepcopy(): everything copied recursively. Real trap: df2 = df is NOT a copy — it's aliasing. Use df.copy().
Answer: def f(x, lst=[]): — default list created ONCE and shared. Fix: lst=None; if lst is None: lst = []. #1 Python interview gotcha.
Answer: O(1) memory. 1B items as list = 8GB. As generator = 100 bytes. Use for: file processing, streaming, batch training. yield from for composition.
Answer: Name lookup order: Local → Enclosing → Global → Built-in. nonlocal for enclosing scope, global for module. list = [1] shadows built-in list().
Answer: (1) pd.read_csv(chunksize=N), (2) usecols=['needed'], (3) dtype={'col':'int32'}, (4) Dask, (5) DuckDB for SQL on CSV, (6) Polars for Rust-speed.
Answer: Dicts use hash tables. Key → hash → slot index. O(1) average. Lists scan linearly. x in set is O(1) but x in list is O(n). For 1M items: microseconds vs milliseconds.
Answer: (1) Reference counting — freed at count=0. (2) Cyclic GC — detects A→B→A cycles. 3 generations. gc.collect() after deleting large models.
Answer: Replaces per-instance __dict__ with fixed array. ~40% memory savings. Use for millions of small objects. Trade-off: no dynamic attributes.
Answer: src/package/ layout. pyproject.toml for config. tests/ with pytest. configs/ for YAML. Makefile for common commands. Separate data, models, training, serving.
is and ==?Answer: == checks value equality. is checks identity (same memory). Use is only for singletons: x is None, x is True. Integer interning makes 256 is 256 True but 1000 is 1000 may be False.
| Feature | Python List | NumPy ndarray |
|---|---|---|
| Storage | Array of pointers to objects | Contiguous block of raw typed data |
| Type | Each element can differ | Homogeneous — all same dtype |
| Operations | Python loop (bytecode) | Compiled C/Fortran loops |
| Memory | ~28 bytes per int + pointer | 8 bytes per int64 (no overhead) |
| SIMD | Not possible | Uses CPU vector instructions |
| Storage | Pointers to objects | Contiguous typed data |
| Memory per int | ~28 bytes + pointer | 8 bytes (int64) |
| Operations | Python loop | Compiled C/Fortran |
| SIMD | Impossible | CPU vector instructions |
arr[::2] doubles row stride. C-order (row-major): rows contiguous. Fortran-order: columns contiguous. Iterate along last axis for best performance.Every ndarray has a strides tuple — bytes to jump in each dimension. For a (3,4) float64 array: strides = (32, 8). Slicing creates views (no copy) by adjusting strides. arr[::2] doubles the row stride.
X - X.mean(axis=0) → (1000,5) - (5,) works!Ufuncs are vectorized functions that operate element-wise. They support: .reduce() (fold along axis), .accumulate() (running total), .outer() (outer product), .at() (unbuffered in-place). Example: np.add.reduce(arr) = arr.sum() but works with custom ufuncs too.
Vectorized element-wise functions. Advanced methods: .reduce() (fold), .accumulate() (running total), .outer() (outer product), .at() (unbuffered in-place). Create custom with np.frompyfunc().
| dtype | Bytes | When to Use |
|---|---|---|
| float32 | 4 | Deep learning (GPU prefers this), 50% less memory |
| float64 | 8 | Default. Scientific computing, high-precision stats |
| int32 | 4 | Indices, counts, most integer data |
| float16 | 2 | Mixed-precision training, inference |
| float32 | 4 | Deep learning, GPU (50% less memory) |
| float64 | 8 | Default. Statistics, scientific computing |
| float16 | 2 | Mixed-precision inference |
| int32 | 4 | Indices, counts |
| int8 | 1 | Quantized models |
| bool | 1 | Masks for filtering |
np.einsum can express any tensor operation: matrix multiply, trace, transpose, batch ops. Often faster than chaining NumPy functions because it avoids intermediate arrays.
Einstein summation: express ANY tensor operation. Matrix multiply: 'ik,kj->ij'. Batch matmul: 'bij,bjk->bik'. Trace: 'ii->'. Often faster than chaining NumPy calls — avoids intermediate arrays.
X.T @ X → Gram matrix (basis of linear regression)U, S, Vt = np.linalg.svd(X) → PCA, dimensionality reductionnp.linalg.svd(X) → PCA, dimensionality reductionnp.linalg.eigh(cov) → Covariance eigenvectorsnp.linalg.norm(X, axis=1) → L2 norms for distancenp.linalg.lstsq(X, y) → Stable linear regression (preferred over inv)np.linalg.norm(X, axis=1) → L2 norms for distancesnp.linalg.lstsq(X, y) → Stable linear regressionnp.linalg.inv() → AVOID! Use solve() instead (numerically stable)np.random.default_rng(42) is the modern way (NumPy 1.17+). Uses PCG64 algorithm — better statistical properties, thread-safe. Old np.random.seed(42) is global state, not thread-safe. Always use default_rng() in new code.
Modern: rng = np.random.default_rng(42) (NumPy 1.17+). PCG64 algorithm, thread-safe. Old np.random.seed(42) is global, not thread-safe. Always use default_rng() in projects.
Images are just 3D arrays: (height, width, channels). Crop: img[100:200, 50:150]. Resize: scipy. Normalize: img / 255.0. Augment: flip img[:, ::-1], rotate with scipy.ndimage. Foundation of all computer vision.
Answer: (1) Contiguous memory — cache-friendly. (2) Compiled C loops. (3) SIMD instructions — 4-8 floats simultaneously. Together: 50-100x speedup.
Answer: Views share data (slicing creates views). Copies duplicate. arr[::2] = view, arr[[0,2,4]] (fancy indexing) = copy. Check with np.shares_memory(a, b).
Answer: Compare shapes right-to-left. Dims must be equal or one must be 1. (3,1) + (1,4) → (3,4). No memory copied — strides adjusted internally. Gotcha: (3,) + (3,4) fails — reshape to (3,1) first.
Answer: axis=0 = operate down rows (collapses rows). axis=1 = across columns (collapses columns). For (100,5): mean(axis=0) → (5,) per feature. mean(axis=1) → (100,) per sample.
Answer: Center: X_c = X - X.mean(0). Covariance: cov = X_c.T @ X_c / (n-1). Eigendecompose: vals, vecs = np.linalg.eigh(cov). Project: X_pca = X_c @ vecs[:,-k:]. Or use SVD directly.
Answer: np.dot: confusing for 3D+. @: clean matrix multiply, broadcasts. einsum: most flexible. Use @ for readability, einsum for complex ops.
Answer: np.isnan(arr) detects. np.nanmean(arr) — nan-safe aggregation. Gotcha: np.nan == np.nan is False! IEEE 754 standard.
Answer: C-order stores rows contiguously. Iterating along last axis is fastest (cache-friendly). For column-heavy ops, Fortran can be faster. NumPy defaults to C. Convert with np.asfortranarray().
Answer: (1) Contiguous memory (cache-friendly). (2) Compiled C loops. (3) SIMD instructions. Together: 50-100x speedup.
Answer: Slicing = view (shares data). Fancy indexing = copy. Check: np.shares_memory(a, b). Views are dangerous: modifying view modifies original.
Answer: Right-to-left: dims must equal or one is 1. (3,1) + (1,4) → (3,4). No memory copied. Gotcha: (3,) + (3,4) fails — reshape to (3,1).
Answer: axis=0: operate down rows (collapse rows). axis=1: across columns (collapse columns). (100,5): mean(axis=0)→(5,). mean(axis=1)→(100,).
Answer: Center, compute covariance, eigendecompose (eigh), sort by eigenvalue, project onto top-k eigenvectors. Or SVD directly.
Answer: @: clean, broadcasts. np.dot: confusing for 3D+. einsum: most flexible, any tensor op. Use @ for readability.
Answer: np.isnan() detects. np.nanmean() ignores NaN. Gotcha: NaN == NaN is False (IEEE 754).
Answer: C: rows contiguous (default). Fortran: columns contiguous (LAPACK/BLAS). Iterate last axis for speed. Convert: np.asfortranarray().
df.iterrows() is 100x slower than vectorized ops.df.loc[0:5] includes row 5. df.iloc[0:5] excludes row 5..loc (label) and .iloc (position) — never chain indexing.df.loc[0:5] includes 5. df.iloc[0:5] excludes 5.df[mask]['col'] = x creates copy. Use df.loc[mask, 'col'] = x.df2 = df is NOT a copy. Use df2 = df.copy().df.dtypes and df.isna().sum() first.
Chained indexing (df[df.x > 0]['y'] = 5) may create a temporary copy. Fix: df.loc[df.x > 0, 'y'] = 5. In Pandas 2.0+, Copy-on-Write mode eliminates this entirely.
The most powerful Pandas operation. (1) Split into groups, (2) Apply function to each, (3) Combine results. GroupBy is lazy — no computation until aggregation. Key methods: agg() (reduce), transform() (broadcast), filter() (keep/drop groups), apply() (flexible).
Most powerful Pandas operation. (1) Split → (2) Apply function → (3) Combine results. GroupBy is lazy — no computation until aggregation. Key methods:
+| Method | Output Shape | Use Case |
|---|---|---|
agg() | Reduced (one row/group) | Sum, mean, count per group |
transform() | Same as input | Fill with group mean, normalize within group |
filter() | Subset of groups | Keep groups with N > 100 |
apply() | Flexible | Custom function per group |
| Feature | Before (1.x) | After (2.0+) |
|---|---|---|
| Backend | NumPy only | Apache Arrow backend option |
| Copy semantics | Confusing | Copy-on-Write (explicit) |
| Backend | NumPy only | Apache Arrow option |
| Copy semantics | Confusing | Copy-on-Write |
| String dtype | object | string[pyarrow] (faster) |
| Nullable types | NaN for everything | pd.NA (proper null) |
| Index dtypes | int64 default | Matches data dtype |
| Feature | Pandas | Polars |
|---|---|---|
| Speed | 1x | 5-50x faster (Rust) |
| Memory | Higher | Lower (Arrow-native) |
| Parallelism | Single-threaded | Multi-threaded by default |
| Speed | 1x | 5-50x (Rust) |
| Parallelism | Single-threaded | Multi-threaded auto |
| API | Eager | Lazy + Eager |
| Ecosystem | Massive | Growing |
| When to use | EDA, legacy projects | Large data, production pipelines |
| Ecosystem | Massive | Growing fast |
| Use when | EDA, small-med data, legacy | Large data, production |
Fluent API style. More readable, no intermediate variables. Use .assign() instead of df['col'] = .... Use .pipe() for custom functions. Use .query() for readable filtering.
| Method | How | When |
|---|---|---|
merge() | SQL-style joins on columns | Combine tables on shared keys |
join() | Joins on index | Index-based combining |
concat() | Stack along axis | Append rows/columns |
Common pitfall: Merge produces more rows than expected = many-to-many join. Always check: len(merged) vs len(left).
| Strategy | Savings | When to Use |
|---|---|---|
| Category dtype | 90%+ | Columns with few unique strings |
| Strategy | Savings | When |
| Category dtype | 90%+ | Few unique strings |
| Downcast numerics | 50-75% | int64 → int32/int16 |
| Sparse arrays | 80%+ | Columns mostly zeros/NaN |
| PyArrow backend | 30-50% | String-heavy DataFrames |
| Sparse arrays | 80%+ | Mostly zeros/NaN |
| PyArrow backend | 30-50% | String-heavy data |
| Read only needed columns | Variable | usecols=['a','b'] |
.rolling(N) — fixed-size sliding window. .expanding() — cumulative from start. .ewm(span=N) — exponentially weighted. All support .mean(), .std(), .apply(func). Critical for time series feature engineering: lag features, moving averages, volatility.
.rolling(N): fixed sliding window. .expanding(): cumulative. .ewm(span=N): exponentially weighted. All support .mean(), .std(), .apply(). Essential for: lag features, moving averages, volatility, Bollinger bands.
df.pivot_table(values, index, columns, aggfunc) — summarize data by two categorical dimensions. pd.crosstab() — frequency table of two categorical columns. Essential for EDA and business reporting.
Fluent API: .assign() instead of df['col']=. .pipe(func) for custom. .query('col > 5') for readable filters. No intermediate variables = cleaner, reproducible pipelines.
Answer: Chained indexing may modify a copy. Fix: df.loc[mask, 'col'] = val. Pandas 2.0+ Copy-on-Write: pd.options.mode.copy_on_write = True.
Answer: merge(): SQL joins on columns. join(): joins on index. concat(): stack along axis. Use merge for column joins, concat for stacking.
Answer: map(): Series element-wise. apply(): rows/columns. transform(): same shape output. All are slow — prefer vectorized operations.
Answer: agg() reduces — one value per group. transform() broadcasts — same shape as input. Use transform for "fill with group mean" patterns.
Answer: Hierarchical indexing — multiple levels. Use for pivot tables, panel data (entity + time). Access with .xs() or tuple: df.loc[('A', 2023)]. Convert back with .reset_index().
Answer: Pandas: mature ecosystem, EDA, small-medium data. Polars: 5-50x faster (Rust), multi-threaded, lazy evaluation, better for large data and production pipelines. Polars for new projects with big data.
Answer: (1) dropna(thresh=N), (2) fillna(method='ffill') for time series, (3) fillna(df.median()) for ML, (4) interpolate(method='time'). Always check df.isna().sum() first.
Answer: Chained indexing modifies copy. Fix: df.loc[mask, 'col'] = val. Pandas 2.0+ Copy-on-Write eliminates this.
Answer: merge: SQL joins on columns. join: on index. concat: stack along axis. Use merge for column joins, concat for appending.
Answer: map: Series element-wise. apply: rows/columns. transform: same-shape output. All slow — prefer vectorized when possible.
Answer: agg reduces. transform broadcasts back. Use transform for "fill with group mean" or "normalize within group" patterns.
Answer: (1) dropna(thresh=N), (2) fillna(method='ffill') for time series, (3) fillna(df.median()) for ML, (4) interpolate(method='time'). Always check df.isna().sum() first.
Answer: Polars: 5-50x faster (Rust), multi-threaded, lazy eval. Pandas: mature ecosystem, wide compatibility. New projects with big data → Polars.
Answer: Hierarchical indexing. Use for pivot tables, panel data. Access with .xs() or tuple. Reset with .reset_index().
Answer: (1) Read only needed columns. (2) Downcast dtypes. (3) Category for strings. (4) Sparse for zeros. (5) PyArrow backend. (6) Process in chunks. Can reduce 5GB to 1GB.
Three layers: Backend (rendering), Artist (everything drawn), Scripting (pyplot). Figure contains Axes (subplots). Each Axes has Axis objects. Always prefer OO API (fig, ax = plt.subplots()) over pyplot for production.
rcParams: Control global defaults. Set plt.rcParams['font.size'] = 14 once. Create a style file for consistency across all project figures. Use plt.style.use('seaborn-v0_8-whitegrid') for clean defaults.
Three layers: Backend (rendering), Artist (everything drawn), Scripting (pyplot). Figure → Axes (subplots) → Axis objects. Always use OO API: fig, ax = plt.subplots().
rcParams: Global defaults. plt.rcParams['font.size'] = 14. Create style files for project consistency. plt.style.use('seaborn-v0_8-whitegrid').
Three API levels: Figure-level (relplot, catplot, displot — own figure), Axes-level (scatterplot, boxplot — on existing axes), Objects API (0.12+, composable). Seaborn auto-computes statistics (regression lines, confidence intervals, density estimates).
+Three API levels: Figure-level (relplot, catplot, displot), Axes-level (scatterplot, boxplot), Objects API (0.12+). Auto-computes regression lines, confidence intervals, density estimates.
JavaScript-powered charts with hover, zoom, selection. plotly.express for quick plots, plotly.graph_objects for full control. Integrates with Dash for production dashboards. Supports 3D, maps, and animations. Export to HTML for sharing.
JavaScript-powered: hover, zoom, selection. plotly.express for quick plots. plotly.graph_objects for control. Integrates with Dash for production dashboards. Supports 3D, maps, animations. Export to HTML.
| What to Visualize | Chart | Why |
|---|---|---|
| Class distribution | Bar chart | Detect imbalance |
| Feature distributions | Histogram/KDE grid | Find skew, outliers |
| Feature correlations | Heatmap (triangular) | Multicollinearity |
| Training curves | Line plot (loss/acc vs epoch) | Detect overfit/underfit |
| Model comparison | Box plot of CV scores | Compare variance |
| Confusion matrix | Annotated heatmap | Error analysis |
| ROC curve | Line plot + AUC | Threshold selection |
| Feature importance | Horizontal bar | Model interpretation |
| SHAP values | Beeswarm/waterfall | Individual predictions |
dpi=300Answer: Matplotlib: full control, publication figures. Seaborn: statistical EDA, beautiful defaults. Plotly: interactive dashboards, stakeholders. Rule: Seaborn for EDA, Matplotlib for papers, Plotly for stakeholders.
Answer: (1) PCA/t-SNE/UMAP to 2D, (2) Pair plots, (3) Parallel coordinates, (4) Correlation heatmap, (5) SHAP summary plots.
Answer: (1) alpha transparency, (2) hexbin, (3) 2D KDE, (4) random sampling, (5) Datashader for millions of points.
Answer: Clear title stating conclusion, minimal chart junk, annotate key points, consistent color, one insight per chart. Tell a story — what action should they take?
Answer: Figure = entire canvas. Axes = single plot area. fig, axes = plt.subplots(2,2) = 4 plots. Always use OO API: ax.plot() not plt.plot().
Answer: Colorblind-safe palettes (viridis), don't rely on color alone, add shapes/patterns, sufficient contrast, alt text, large fonts (12pt+).
Answer: Matplotlib: full control, papers. Seaborn: statistical EDA, beautiful. Plotly: interactive, stakeholders. Rule: Seaborn→EDA, Matplotlib→papers, Plotly→stakeholders.
Answer: (1) PCA/t-SNE/UMAP to 2D, (2) Pair plots, (3) Parallel coordinates, (4) Correlation heatmap, (5) SHAP plots.
Answer: alpha, hexbin, 2D KDE, random sampling, Datashader for millions of points.
Answer: Title states conclusion. One insight per chart. Annotate key points. Consistent color. Minimal chart junk. Tell a story.
Answer: Figure = canvas. Axes = plot area. fig, axes = plt.subplots(2,2). Use OO API: ax.plot() not plt.plot().
Answer: Colorblind palettes (viridis), shapes not just color, sufficient contrast, alt text, 12pt+ fonts.
Answer: Training curves (loss/acc vs epoch), confusion matrix (heatmap), ROC/AUC, feature importance (horizontal bars), SHAP for interpretability.
functools.wraps to preserve function metadata (name, docstring, signature).functools.wraps.Common patterns: Retry with exponential backoff, caching, rate limiting, authentication, input validation, deprecation warnings.
Managing resources reliably. with blocks guarantee cleanup even on errors. Two approaches: (1) Class-based (__enter__/__exit__), (2) @contextlib.contextmanager with yield. Use for: file handles, DB connections, GPU locks, temporary settings.
Guarantee resource cleanup. Two approaches: (1) Class-based (__enter__/__exit__), (2) @contextlib.contextmanager with yield. Use for: files, DB connections, GPU locks, temporary settings, timers.
| Feature | namedtuple | dataclass | Pydantic | |
|---|---|---|---|---|
| Mutable | ✗ | ✓ (default) | ✓ (v2) | |
| Validation | ✗ | ✗ (manual) | ✓ (automatic) | |
| Default values | Limited | ✓ | ✓ | |
| Inheritance | ✗ | ✓ | ✓ | |
| JSON serialization | Manual | Manual | Built-in | |
| Performance | Fastest | Fast | Slower (validation) | |
| Use case | Immutable records | Data containers | API models, configs | |
| Feature | namedtuple | dataclass | Pydantic | attrs |
| Mutable | ✗ | ✓ | ✓ (v2) | ✓ |
| Validation | ✗ | ✗ | ✓ (auto) | ✓ (validators) |
| JSON | ✗ | ✗ | ✓ (built-in) | via cattrs |
| Performance | Fastest | Fast | Medium | Fast |
| Use for | Records | Data containers | API models | Complex classes |
| Hint | Meaning | Example |
|---|---|---|
int, str, float | Basic types | def f(x: int) -> str: |
list[int] | List of ints (3.9+) | scores: list[int] = [] |
dict[str, Any] | Dict with str keys | config: dict[str, Any] |
Optional[int] | int or None | x: int | None (3.10+) |
Union[int, str] | int or str | id: int | str |
Callable[[int], str] | Function signature | Callbacks, decorators |
TypeVar('T') | Generic type | Generic containers |
dict[str, Any] | Dict str keys | config: dict[str, Any] |
int | None | Optional (3.10+) | x: int | None = None |
Callable[[int], str] | Function type | Callbacks |
TypeVar | Generic | Generic containers |
Literal | Exact values | Literal['train','test'] |
TypedDict | Dict with typed keys | JSON schemas |
Async is for I/O-bound tasks (API calls, DB queries, file reads). NOT for CPU-bound work (use multiprocessing). The event loop manages coroutines cooperatively. asyncio.gather() runs multiple coroutines concurrently. aiohttp for async HTTP, asyncpg for async PostgreSQL.
For I/O-bound tasks: API calls, DB queries, file reads. NOT for CPU (use multiprocessing). Event loop manages coroutines cooperatively. asyncio.gather() runs concurrently. Game changer: 100 API calls in ~1s vs 100s sequentially.
A descriptor is any object implementing __get__, __set__, or __delete__. @property is a descriptor. They control attribute access at the class level. Used in Django ORM fields, SQLAlchemy columns, and dataclass fields.
| Pattern | Use Case | Python Implementation |
|---|---|---|
| Strategy | Swap algorithms | Pass function/class as argument |
| Factory | Create objects by name | Registry dict: models['rf'] |
| Observer | Training callbacks | Event system with hooks |
| Pipeline | Data transformations | Chain of fit→transform |
| Singleton | Model cache, DB pool | Module-level or metaclass |
| Template | Training loop | ABC with abstract methods |
| Registry | Auto-register models | Class decorator + dict |
Classes are objects too. Metaclasses define how classes behave. type is the default metaclass. Use for: auto-registering subclasses (model registry), enforcing interface standards, singleton pattern. Most developers should use class decorators instead — metaclasses are a last resort.
Any object implementing __get__/__set__/__delete__. @property is a descriptor. Control attribute access at class level. Used in Django ORM, SQLAlchemy, dataclass fields.
By default, instances store attributes in __dict__. __slots__ replaces with a fixed tuple. Saves ~40% memory per instance. Use when creating millions of objects. Trade-off: can't add dynamic attributes. Especially useful for data-heavy classes.
Classes are objects. Metaclasses define how classes behave. type is the default. Use for: auto-registration, interface enforcement, singleton. Most should use class decorators instead.
Replaces __dict__ with fixed array. ~40% memory savings per instance. Use for millions of small objects. Trade-off: no dynamic attributes.
multiprocessing.Pool or concurrent.futures.ProcessPoolExecutor. Each process has its own GIL. Share data via: multiprocessing.Queue, shared_memory, or serialize (pickle). Overhead: process creation ~100ms. Only use for expensive computations.
Answer: C3 Linearization algorithm for multiple inheritance. Access via ClassName.mro(). Ensures bases searched after subclasses, preserving definition order.
Answer: namedtuple: immutable, fastest. dataclass: mutable, flexible, no validation. Pydantic: auto-validation, JSON serialization, API models. Choose based on whether you need validation.
Answer: async: I/O-bound, many connections (1000s of API calls). threading: I/O-bound, simpler code. multiprocessing: CPU-bound (bypasses GIL). NumPy already releases GIL internally.
Answer: It's a descriptor — implements __get__, __set__, __delete__. When you access obj.x, Python's attribute lookup finds the descriptor on the class and calls __get__.
Answer: Three nested functions: (1) Factory takes params, returns decorator. (2) Decorator takes function, returns wrapper. (3) Wrapper executes logic. Use @wraps(func) always.
Answer: Replaces __dict__ with fixed-size array. Saves ~40% memory per instance. Can't add dynamic attributes. Use for millions of small objects.
Answer: A function that captures variables from enclosing scope. The captured variables survive after the enclosing function returns. Use case: factory functions, decorators, callbacks. Example: make_multiplier(3) returns a function that multiplies by 3.
Answer: C3 Linearization for multiple inheritance. ClassName.mro() shows order. Subclasses before bases, left-to-right.
Answer: dataclass: no validation, fast, standard library. Pydantic: auto-validation, JSON serialization, API models. Use Pydantic for external data, dataclass for internal.
Answer: async: I/O-bound, 1000s connections. threading: I/O, simpler. multiprocessing: CPU-bound (bypasses GIL). NumPy releases GIL internally.
Answer: It's a descriptor with __get__/__set__. Attribute access triggers descriptor protocol. Used for computed attributes and validation.
Answer: Three nested functions: factory(params) → decorator(func) → wrapper(*args). Use @wraps(func) always.
Answer: Fixed array instead of __dict__. ~40% less memory. No dynamic attributes. Use for millions of objects.
Answer: Function capturing enclosing scope variables. Use: factory functions, decorators, callbacks. make_multiplier(3) returns function multiplying by 3.
Answer: Python makes many patterns trivial: Strategy = pass a function. Singleton = module variable. Factory = dict of classes. Observer = list of callables. Python prefers simplicity.
fit(X, y). Transformers have transform(X). Predictors have predict(X). This consistency allows seamless swapping and composition via Pipelines.fit(X, y). Transformers: transform(X). Predictors: predict(X). Consistency allows seamless swapping and composition via Pipelines.Real data has mixed types. ColumnTransformer applies different transformations to different column sets: StandardScaler for numerics, OneHotEncoder for categoricals, TfidfVectorizer for text. All in one pipeline.
+Real data has mixed types. ColumnTransformer applies different transformations per column set: StandardScaler for numerics, OneHotEncoder for categoricals, TfidfVectorizer for text. All in one pipeline.
Inherit from BaseEstimator + TransformerMixin. Implement fit(X, y) and transform(X). TransformerMixin gives you fit_transform() for free. Use check_is_fitted(self) to validate state.
Inherit BaseEstimator + TransformerMixin. Implement fit(X, y) and transform(X). TransformerMixin gives fit_transform() free. Use check_is_fitted() for safety.
| Strategy | When to Use | Gotcha |
|---|---|---|
| KFold | General purpose | Doesn't preserve class ratios |
| StratifiedKFold | Classification (imbalanced) | Preserves class distribution |
| Strategy | When | Key Point |
| KFold | General | Doesn't preserve class ratios |
| StratifiedKFold | Imbalanced classification | Preserves class distribution |
| TimeSeriesSplit | Time-ordered data | Train always before test |
| GroupKFold | Grouped data (patients) | Same group never in train+test |
| LeaveOneOut | Very small datasets | N fits — very slow |
| RepeatedStratifiedKFold | Robust estimation | Multiple random splits |
| Method | Pros | Cons |
|---|---|---|
| GridSearchCV | Exhaustive, simple | Exponential with params |
| RandomizedSearchCV | Faster, continuous distributions | May miss optimal |
| Optuna/BayesianOpt | Smart search, early stopping | More setup, dependency |
| Halving*SearchCV | Successive halving, fast | Newer, less documented |
| GridSearchCV | Exhaustive | Exponential with params |
| RandomizedSearchCV | Faster, continuous dists | May miss optimal |
| Optuna | Smart search, pruning | Extra dependency |
| HalvingSearchCV | Successive halving | Newer, less docs |
| Transformer | Purpose |
|---|---|
| PolynomialFeatures | Interaction & polynomial terms |
| FunctionTransformer | Apply any function (log, sqrt) |
| SplineTransformer | Non-linear feature basis |
| KBinsDiscretizer | Bin continuous into categories |
| TargetEncoder | Encode categoricals by target mean |
PolynomialFeatures, FunctionTransformer, SplineTransformer, KBinsDiscretizer. Chain with Pipeline for clean, leak-free preprocessing. Use make_column_selector to auto-select column types.
| Data Size | Model | Why |
|---|---|---|
| <1K rows | Logistic/SVM/KNN | Simple, less overfitting |
| 1K-100K | Random Forest, XGBoost | Best accuracy/speed tradeoff |
| 100K+ | XGBoost, LightGBM | Handles large data efficiently |
| Very large | SGDClassifier/online | Incremental learning |
| Tabular | Gradient Boosting | Almost always best for tabular |
Train/Val/Test split → Cross-validate multiple models → Select best → Tune hyperparameters → Final evaluation on test set. Never tune on test data. Use cross_val_score for quick comparison, cross_validate for detailed metrics.
| Strategy | How |
|---|---|
| class_weight='balanced' | Built-in for most models |
| SMOTE | Synthetic oversampling (imblearn) |
| Threshold tuning | Adjust decision threshold from 0.5 |
| Metrics | Use F1, Precision-Recall AUC (not accuracy) |
| Ensemble | BalancedRandomForest |
joblib.dump(model, 'model.pkl') — faster than pickle for NumPy arrays. model = joblib.load('model.pkl'). Always save the entire pipeline (not just model) to include preprocessing. Version your models with timestamps.
Answer: Info from test set influencing training. Common cause: fitting scaler on full data before split. Fix: put all preprocessing inside a Pipeline which ensures fit only on train folds during cross-validation.
Answer: Pipeline: sequential steps (A→B→C). ColumnTransformer: parallel branches (different processing for different column types). Typically ColumnTransformer inside Pipeline.
Answer: KFold: general. StratifiedKFold: imbalanced classes. TimeSeriesSplit: temporal. GroupKFold: grouped data (same patient never in both).
Answer: Grid: exhaustive but exponential. Random: better for many params, samples continuous distributions. Bayesian (Optuna): learns from previous trials, most efficient for expensive models.
Answer: Inherit BaseEstimator + TransformerMixin. Implement fit(X, y) (learn params, return self) and transform(X) (apply). TransformerMixin gives fit_transform() free.
Answer: fit(): learn parameters from data. transform(): apply learned params to transform data. predict(): generate predictions. fit() is always on train, transform/predict on train+test.
Answer: Test set info influencing training. Common: fitting scaler before split. Fix: Pipeline ensures fit only on train folds.
Answer: Pipeline: sequential (A→B→C). ColumnTransformer: parallel branches (different processing per column type). Usually CT inside Pipeline.
Answer: KFold: general. Stratified: imbalanced. TimeSeriesSplit: temporal. GroupKFold: grouped data.
Answer: Grid: exhaustive, exponential. Random: better for many params. Bayesian (Optuna): learns, most efficient for expensive models.
Answer: BaseEstimator + TransformerMixin. Implement fit(X,y) and transform(X). TransformerMixin gives fit_transform free.
Answer: (1) class_weight='balanced'. (2) SMOTE oversampling. (3) Adjust threshold. (4) Use F1/AUC not accuracy. (5) BalancedRandomForest.
Answer: Tabular: gradient boosting (XGBoost/LightGBM). Small data: Logistic/SVM. Interpretability: Logistic/trees. Speed: LightGBM. Baseline: Random Forest.
Answer: fit: learn params from data. transform: apply params. predict: generate predictions. fit on train only, transform/predict on both.
| Concept | What It Is | Key Point |
|---|---|---|
| Tensor | N-dimensional array | Like NumPy ndarray but GPU-capable |
| requires_grad | Track operations for autograd | Only enable for learnable parameters |
| Concept | What | Key Point |
| Tensor | N-dimensional array | Like NumPy but GPU-capable |
| requires_grad | Track for autograd | Only for learnable params |
| device | CPU or CUDA | .to('cuda') moves to GPU |
| .detach() | Stop gradient tracking | Use for inference/metrics |
| .item() | Extract scalar value | Use for logging loss values |
| .item() | Extract scalar | Use for logging loss |
| .contiguous() | Ensure contiguous memory | Required after transpose/permute |
requires_grad=True, PyTorch records every operation in a directed acyclic graph (DAG). Each tensor stores its grad_fn — the function that created it. .backward() traverses this graph in reverse, computing gradients via the chain rule. The graph is destroyed after backward() (unless retain_graph=True).requires_grad=True, every operation is recorded. Each tensor stores grad_fn. .backward() traverses graph in reverse (chain rule). Graph destroyed after backward() unless retain_graph=True. Gradients ACCUMULATE — must optimizer.zero_grad() before each backward.Gradient accumulation: By default, .backward() accumulates gradients. You MUST call optimizer.zero_grad() before each backward pass. This is intentional — allows gradient accumulation for larger effective batch sizes.
Every model inherits nn.Module. Define layers in __init__, computation in forward(). model.parameters() returns all learnable weights. model.train() and model.eval() toggle BatchNorm/Dropout behavior. model.state_dict() saves/loads weights.
Every model inherits nn.Module. Layers in __init__, computation in forward(). model.train()/model.eval() toggle BatchNorm/Dropout. model.parameters() for optimizer. model.state_dict() for save/load. Use nn.Sequential for simple stacks, nn.ModuleList/nn.ModuleDict for dynamic architectures.
Every PyTorch training follows: (1) Forward pass, (2) Compute loss, (3) optimizer.zero_grad(), (4) loss.backward(), (5) optimizer.step(). No magic — you write it explicitly. This gives full control over learning rate scheduling, gradient clipping, mixed precision, etc.
(1) Forward pass → (2) Compute loss → (3) optimizer.zero_grad() → (4) loss.backward() → (5) optimizer.step(). Add: gradient clipping, LR scheduling, mixed precision, logging, checkpointing.
Dataset: override __len__ and __getitem__. DataLoader: wraps Dataset with batching, shuffling, multi-worker loading. Use num_workers > 0 for parallel data loading. pin_memory=True speeds up CPU→GPU transfer.
Dataset: override __len__ and __getitem__. DataLoader: batching, shuffling, multi-worker. num_workers>0 for parallel loading. pin_memory=True for faster GPU transfer. Use collate_fn for variable-length sequences.
Use torch.cuda.amp for automatic mixed precision. Forward pass in float16 (2x faster on modern GPUs), gradients in float32 (numerical stability). GradScaler prevents underflow. Up to 2-3x speedup with minimal accuracy loss.
| Scheduler | Strategy | When |
|---|---|---|
| StepLR | Decay every N epochs | Simple baseline |
| CosineAnnealingLR | Cosine decay | Standard for vision |
| OneCycleLR | Warmup + decay | Best for fast training |
| ReduceLROnPlateau | Decay on stall | When loss plateaus |
| LinearLR | Linear warmup | Transformer models |
Load pretrained model → Freeze base layers → Replace final layer → Fine-tune. model.requires_grad_(False) freezes all. Then unfreeze last N layers. Use smaller learning rate for pretrained layers.
torch.cuda.amp: forward in float16 (2x faster), gradients in float32. GradScaler prevents underflow. 2-3x speedup. Standard practice for any GPU training.
Register hooks on modules: register_forward_hook, register_backward_hook. View intermediate activations, gradient magnitudes, feature maps. Essential for debugging vanishing/exploding gradients.
Load pretrained → Freeze base → Replace head → Fine-tune with smaller LR. Discriminative LR: lower LR for earlier layers. Progressive unfreezing: unfreeze layers one at a time. Both work better than fine-tuning everything at once.
DistributedDataParallel is the standard for multi-GPU training. Each GPU runs a copy of the model, gradients are averaged across GPUs (all-reduce). Near-linear scaling. Use torchrun to launch.
DistributedDataParallel: each GPU runs model copy, gradients averaged via all-reduce. Near-linear scaling. Use torchrun to launch. DistributedSampler for data splitting.
| Tool | Purpose |
|---|---|
| register_forward_hook | View intermediate activations |
| register_backward_hook | Monitor gradient magnitudes |
| torch.profiler | GPU/CPU profiling |
| torch.cuda.memory_summary() | GPU memory debugging |
| detect_anomaly() | Find NaN/Inf sources |
JIT compiles model for 30-60% speedup. model = torch.compile(model). Uses TorchDynamo + Triton. Works on existing code. The future of PyTorch performance.
Answer: PyTorch records operations in a DAG when requires_grad=True. .backward() traverses the graph in reverse, computing gradients via chain rule. Graph is destroyed after backward (dynamic graph).
Answer: PyTorch accumulates gradients by default. Without zeroing, gradients from previous batch add to current. This is intentional — allows gradient accumulation for larger effective batches.
Answer: train(): BatchNorm uses batch stats, Dropout is active. eval(): BatchNorm uses running stats, Dropout disabled. Always switch before training/inference.
Answer: .detach(): creates a tensor that shares data but doesn't track gradients (single tensor). torch.no_grad(): context manager disabling gradient computation for all operations inside (saves memory during inference).
Answer: (1) Register backward hooks to monitor gradient magnitudes. (2) Use torch.nn.utils.clip_grad_norm_. (3) Gradient histograms in TensorBoard. (4) Check if BatchNorm/LayerNorm is applied. (5) Try skip connections (ResNet idea).
Answer: Rule of thumb: num_workers = 4 * num_gpus. Too many = CPU overhead, too few = GPU starved. Use pin_memory=True for faster CPU→GPU transfer. Profile to find sweet spot.
Answer: Records ops in DAG. .backward() traverses reverse, chain rule. Graph destroyed after backward. Dynamic = rebuilt each forward.
Answer: Gradients accumulate. Without zeroing, previous batch adds to current. Intentional: enables gradient accumulation for larger effective batch.
Answer: detach(): single tensor, shares data. no_grad(): context manager for all ops inside, saves memory. Use no_grad() for inference.
Answer: (1) Backward hooks for gradient magnitudes. (2) clip_grad_norm_. (3) TensorBoard histograms. (4) BatchNorm/LayerNorm. (5) Skip connections.
Answer: Rule: 4 × num_gpus. Too many = CPU overhead. pin_memory=True for faster transfers. Profile to find sweet spot.
Answer: compile JITs model via TorchDynamo+Triton. 30-60% faster. One line change. The future of PyTorch performance.
Answer: state_dict (weights only) vs full checkpoint (weights + optimizer + epoch). Use state_dict for inference, checkpoint for resuming.
Answer: autocast(fp16 forward) + GradScaler(fp32 grads). 2-3x speedup. Minimal accuracy loss. Standard for GPU training.
@tf.function compiles to static graph for production speed. Keras is the official high-level API. TF handles the full ML lifecycle: training → saving → serving → monitoring.@tf.function compiles to graph for production. Keras is the official API. TF handles full lifecycle: train → save → serve → monitor.| API | Use Case | Flexibility |
|---|---|---|
| Sequential | Simple stack of layers | Low (linear only) |
| Functional | Multi-input/output, branching | Medium |
| Subclassing | Custom forward logic | High (most flexible) |
| Sequential | Linear stack | Low |
| Functional | Multi-input/output, branching | Medium (recommended) |
| Subclassing | Custom forward logic | High |
Build efficient input pipelines: tf.data.Dataset chains transformations lazily. Key methods: .map(), .batch(), .shuffle(), .prefetch(tf.data.AUTOTUNE). Prefetching overlaps data loading with model execution. Supports TFRecord files for large datasets.
Chains transformations lazily. .map(), .batch(), .shuffle(), .prefetch(AUTOTUNE). Prefetching overlaps loading with GPU execution. .cache() for small datasets. .interleave() for reading multiple files. TFRecord format for large datasets.
| Callback | Purpose |
|---|---|
| ModelCheckpoint | Save best model (monitor val_loss) |
| ModelCheckpoint | Save best model |
| EarlyStopping | Stop when metric plateaus |
| ReduceLROnPlateau | Reduce LR when stuck |
| TensorBoard | Visualize training metrics |
| CSVLogger | Log metrics to CSV |
| LambdaCallback | Custom logic per epoch |
| TensorBoard | Visualize metrics |
| CSVLogger | Log to CSV |
| LambdaCallback | Custom per-epoch logic |
For full control: tf.GradientTape() records operations, then tape.gradient(loss, model.trainable_variables) computes gradients. Same pattern as PyTorch's manual loop. Use for: GANs, reinforcement learning, custom loss functions.
Record ops → compute gradients → apply. Use for: GANs, RL, custom losses, gradient penalty, multi-loss weighting. Same concept as PyTorch's manual loop.
-model.save('path') exports as SavedModel format — includes architecture, weights, and computation graph. Ready for TF Serving, TF Lite (mobile), TF.js (browser). Universal deployment format.
Trace Python → TF graph. Benefits: optimized execution, XLA, export. Gotchas: Python side effects only during tracing. Use tf.print() in graphs.
Decorating with @tf.function traces Python code into a TF graph. Benefits: optimized execution, XLA compilation, deployment. Gotchas: Python side effects only run during tracing, use tf.print() instead of print().
model.save('path') exports architecture + weights + computation. Ready for: TF Serving (production), TF Lite (mobile), TF.js (browser). One model, any platform.
Build model function → Tuner searches space. Strategies: Random, Hyperband, Bayesian. Integrates with TensorBoard. Alternative to Optuna for Keras models.
+ +| Aspect | TensorFlow | PyTorch |
|---|---|---|
| Deployment | TF Serving, TFLite, TF.js | TorchServe, ONNX |
| Research | Less common now | Dominant in papers |
| Production | Mature ecosystem | Catching up fast |
| Mobile | TFLite (mature) | PyTorch Mobile |
| Debugging | Harder (graph mode) | Easier (eager by default) |
| Choose TF When | Choose PyTorch When | |
| Production deployment at scale | Research & prototyping | |
| Mobile (TFLite mature) | Hugging Face ecosystem | |
| TPU training | GPU research | |
| Edge devices | Custom architectures | |
| Browser (TF.js) | Academic papers |
Answer: Sequential: linear stack. Functional: multi-input/output, shared layers. Subclassing: full Python control, custom forward. Use Functional for most real projects.
Answer: Compiles Python function into a TF graph. Faster execution, enables XLA optimization, required for SavedModel export. Gotcha: Python code only runs during tracing — side effects behave differently.
Answer: Chains transformations lazily. .prefetch(AUTOTUNE) overlaps data loading with GPU computation. .cache() stores in memory after first epoch. .interleave() reads multiple files concurrently.
Answer: Usually val_loss. Set patience=5-10 (epochs without improvement). restore_best_weights=True reverts to best epoch. Combine with ReduceLROnPlateau for better convergence.
Answer: When Keras .fit() is too restrictive: GANs (two optimizers), RL (custom gradients), multi-loss weighting, gradient penalty, research experiments needing full control.
Answer: TF: production deployment (TF Serving, TFLite), mobile apps, TPU training. PyTorch: research, prototyping, Hugging Face ecosystem. Both are converging in features.
Answer: Sequential: linear. Functional: multi-I/O, branching. Subclassing: full Python control. Use Functional for most projects.
Answer: Traces Python → TF graph. Faster, XLA, export. Gotcha: side effects only during tracing.
Answer: prefetch(AUTOTUNE) overlaps loading+training. cache() for small data. interleave() for multiple files.
Answer: monitor='val_loss', patience=5-10, restore_best_weights=True. Combine with ReduceLROnPlateau.
Answer: GANs, RL, custom gradients, multi-loss. When .fit() is too restrictive.
Answer: TF: deployment (Serving, Lite, JS), mobile. PyTorch: research, HuggingFace. Both converging.
Answer: SavedModel → TF Serving (REST/gRPC), TFLite (mobile), TF.js (browser). Docker + TF Serving for production.
| Feature | Purpose | Example |
|---|---|---|
| fixtures | Reusable test setup | @pytest.fixture for test data |
| parametrize | Run same test with many inputs | @pytest.mark.parametrize |
| conftest.py | Shared fixtures across tests | DB connections, mock data |
| monkeypatch | Override functions/env vars | Mock API calls |
| tmp_path | Temporary directory | Test file I/O without cleanup |
| markers | Tag tests (slow, gpu, integration) | pytest -m "not slow" |
| fixtures | Reusable test setup | @pytest.fixture |
| parametrize | Many inputs, same test | @pytest.mark.parametrize |
| conftest.py | Shared fixtures | DB connections, mock data |
| monkeypatch | Override functions/env | Mock API calls |
| tmp_path | Temp directory | Test file I/O |
| markers | Tag tests | pytest -m "not slow" |
| coverage | Measure test coverage | pytest --cov |
print() in production. Use logging module: configurable levels (DEBUG/INFO/WARNING/ERROR), output to files, structured format, no performance cost when disabled.
+ | Level | When to Use |
|---|---|
| DEBUG | Detailed diagnostic (tensor shapes, intermediate values) |
| INFO | Normal events (training started, epoch complete) |
| WARNING | Something unexpected but handled (missing feature, fallback) |
| ERROR | Something failed (model load error, API failure) |
| CRITICAL | System-level failure (out of memory, GPU crash) |
| Level | When |
| DEBUG | Tensor shapes, intermediate values |
| INFO | Training started, epoch complete |
| WARNING | Unexpected but handled (fallback used) |
| ERROR | Model load failure, API error |
| CRITICAL | OOM, GPU crash |
Never use print(). Use structured logging (JSON format) for production — parseable by log aggregators (ELK, Datadog).
Modern async web framework. Auto-generates OpenAPI docs. Type-validated requests via Pydantic. Use for: model inference APIs, data pipelines, webhook handlers. Deploy with Uvicorn + Docker. Add health checks and input validation.
+Modern async framework. Auto-generates OpenAPI docs. Pydantic validation. Deploy with Uvicorn + Docker. Add: health checks, input validation, error handling, rate limiting, request logging.
+ +Containerize everything: Python, CUDA, dependencies. Multi-stage builds: builder (install) → runtime (slim). Pin versions. NVIDIA Container Toolkit for GPU. docker compose for multi-service (API + Redis + DB).
Containerize your entire environment: Python version, CUDA drivers, dependencies. Multi-stage builds: builder stage (install deps) → runtime stage (slim image). Use NVIDIA Container Toolkit for GPU access. Pin all dependency versions.
+Replaces setup.py/cfg. Project metadata, dependencies, build system, tool configs (pytest, mypy, ruff). [project.optional-dependencies] for dev/test extras. pip install -e ".[dev]" for editable installs.
| Tool | Best For | Key Feature |
|---|---|---|
| Hydra | ML experiments | YAML configs, CLI overrides, multi-run |
| Hydra | ML experiments | YAML, CLI overrides, multi-run |
| Pydantic Settings | App config | Env var loading, validation |
| python-dotenv | Simple projects | .env file loading |
| dynaconf | Multi-environment | dev/staging/prod configs |
Automate: linting (ruff/flake8), type checking (mypy), testing (pytest), building (Docker), deploying. Use GitHub Actions or GitLab CI. Add model validation gate: compare new model metrics against baseline before deployment.
+GitHub Actions: lint (ruff) → type check (mypy) → test (pytest) → build (Docker) → deploy. Add model validation gate: new model must beat baseline on test metrics before deployment.
-| Tool | Purpose |
|---|---|
| ruff | Fast linter + formatter (replaces black, isort, flake8) |
| mypy | Static type checking |
| pre-commit | Git hooks for auto-formatting |
| pytest-cov | Test coverage measurement |
| ruff | Fast linter + formatter (replaces black, isort, flake8) |
| mypy | Static type checking |
| pre-commit | Git hooks for auto-formatting |
| pytest-cov | Test coverage |
| bandit | Security linting |
| Tool | Purpose |
|---|---|
| MLflow | Experiment tracking, model registry |
| DVC | Data versioning (like Git for data) |
| Weights & Biases | Experiment tracking, visualization |
| Evidently | Data drift & model monitoring |
| Great Expectations | Data validation |
| DB | Use Case | Python Library |
|---|---|---|
| SQLite | Local, small data, prototyping | sqlite3 (built-in) |
| PostgreSQL | Production, ACID, JSON | psycopg2, SQLAlchemy |
| Redis | Caching, queues, sessions | redis-py |
| MongoDB | Flexible schema, documents | pymongo |
| Pinecone/Weaviate | Vector search (embeddings) | Official SDKs |
Answer: (1) Unit tests: data transformations, feature engineering functions. (2) Integration tests: full pipeline end-to-end. (3) Model tests: output shape, range, determinism with seeds. (4) Data tests: schema validation, distribution checks. Use pytest fixtures for reusable test data.
Answer: Logging: configurable levels, file output, structured format, zero cost when disabled, thread-safe. Print: none of these. Production code must use logging for observability and debugging.
Answer: FastAPI/Flask for REST API. Docker for containerization. Load model at startup (not per request). Add health checks, input validation, error handling, logging, metrics. Use async for high throughput. Consider model registries (MLflow) for versioning.
Answer: Project metadata, dependencies, build system, tool configs (pytest, mypy, ruff). Replaced setup.py/setup.cfg. Pin dependency versions for reproducibility. Use [project.optional-dependencies] for dev/test extras.
Answer: Hydra: YAML configs with CLI overrides, multi-run sweeps. Store configs in version control. Never hardcode hyperparameters. Use config groups for model/data/training combos.
Answer: Automate: lint → type-check → test → build → deploy. Add model validation gate: new model must beat baseline on test metrics. Use GitHub Actions. Include data validation (Great Expectations) in pipeline.
Answer: Unit: transforms, features. Integration: full pipeline. Model: shape, range, determinism. Data: schema, distributions. Use pytest fixtures.
Answer: Logging: levels, file output, structured (JSON), zero cost when disabled, thread-safe. Print: none. Production = logging.
Answer: FastAPI + Docker. Load model at startup. Add health checks, validation, error handling, logging. Async for throughput.
Answer: pyproject.toml: modern standard, all tools in one file. Pin deps. Use optional deps for dev/test. pip install -e ".[dev]".
Answer: Hydra: YAML + CLI overrides + multi-run sweeps. Version control configs. Never hardcode hyperparams.
Answer: lint → type-check → test → build → deploy. Model validation gate: must beat baseline. GitHub Actions + Docker.
Answer: MLflow model registry. DVC for data. Git for code. timestamp + metrics in model filename. A/B testing for rollout.
Answer: Input distribution changes post-deployment. Detect: Evidently, statistical tests. Monitor: feature distributions, prediction distributions. Retrain trigger.
| Tool | Type | When to Use | Overhead |
|---|---|---|---|
| cProfile | Function-level | Find slow functions | ~2x slowdown |
| line_profiler | Line-by-line | Find slow lines in a function | Higher |
| Py-Spy | Sampling profiler | Production profiling | Near zero |
| tracemalloc | Memory allocation | Find memory leaks | Low |
| memory_profiler | Line-by-line memory | Find memory-heavy lines | High |
| scalene | CPU + Memory + GPU | Comprehensive profiling | Low |
| Tool | Type | When | Overhead |
| cProfile | Function-level | Find slow functions | ~2x |
| line_profiler | Line-by-line | Optimize hot function | Higher |
| Py-Spy | Sampling | Production profiling | Near zero |
| tracemalloc | Memory | Find leaks | Low |
| memory_profiler | Line memory | Memory per line | High |
| scalene | CPU+Memory+GPU | Comprehensive | Low |
GIL prevents true multi-threading for CPU-bound Python code. But: NumPy, Pandas, and scikit-learn release the GIL during C operations. Solutions for parallelism:
+| Tool | Best For | How |
|---|---|---|
| threading | I/O-bound (API calls, disk) | GIL released during I/O waits |
| multiprocessing | CPU-bound Python | Separate processes, separate GIL |
| concurrent.futures | Simple parallel patterns | ThreadPool/ProcessPool executors |
| asyncio | Many I/O operations | Event loop, cooperative multitasking |
| joblib | sklearn parallel | n_jobs parameter |
| Task Type | Solution | Why |
| I/O-bound | asyncio / threading | GIL released during I/O |
| CPU-bound Python | multiprocessing | Separate processes, separate GIL |
| CPU-bound NumPy | threading OK | NumPy releases GIL |
| Many tasks | concurrent.futures | Simple Pool interface |
@numba.jit(nopython=True) compiles Python functions to machine code. Supports NumPy arrays and most math operations. 10-100x speedup for loops that can't be vectorized. @numba.vectorize creates custom ufuncs. @numba.cuda.jit runs on GPU.
@numba.jit(nopython=True): compile to machine code. 10-100x speedup for loops. Supports NumPy, math. @numba.vectorize: custom ufuncs. @cuda.jit: GPU kernels. Best for: tight loops that can't be vectorized.
Compiles Python to C extension modules. Add type declarations for massive speedups. Best for: tight loops, calling C libraries, CPython extensions. More setup than Numba but more control.
+Pandas/NumPy API for data bigger than memory. dask.dataframe, dask.array, dask.delayed. Lazy execution. Task graph scheduler. Scales from laptop to cluster. Alternative: Polars for single-machine parallel.
Pandas-like API for datasets larger than memory. Key abstractions: dask.dataframe (parallel Pandas), dask.array (parallel NumPy), dask.delayed (custom parallelism). Uses a task scheduler to execute lazily. Scales from laptop to cluster.
General-purpose distributed framework. Ray Tune (hyperparameter tuning), Ray Serve (model serving), Ray Data. Easier than Dask for ML. Used by OpenAI, Uber.
-General-purpose distributed framework. Ray Tune for hyperparameter tuning, Ray Serve for model serving, Ray Data for data processing. Easier than Dask for ML-specific workloads. Used by OpenAI, Uber, Ant Group.
- -| Tool | Scope | Use Case |
|---|---|---|
| @functools.lru_cache | In-memory, function | Expensive computations |
| @functools.cache | Unbounded cache | Pure functions |
| joblib.Memory | Disk cache | Data processing pipelines |
| Redis | External cache | Multi-process, API responses |
| diskcache | Pure Python disk | Simple persistent cache |
3.12: Faster interpreter (5-15% overall), better error messages, per-interpreter GIL (experimental). 3.13: Free-threaded CPython (no-GIL mode experimental), JIT compiler (experimental). The future of Python performance is exciting.
+3.12: 5-15% faster, better errors, per-interpreter GIL. 3.13: Free-threaded (no-GIL experimental), JIT compiler (experimental). The future of Python performance is exciting.
+ +| Anti-Pattern | Fix | Speedup |
|---|---|---|
for row in df.iterrows() | Vectorized ops | 100-1000x |
s += "text" in loop | ''.join(parts) | 100x |
x in big_list | x in big_set | 1000x |
| Python list of floats | NumPy array | 50-100x |
| Global imports in function | Import at top | Variable |
| Not using built-ins | sum(), min() | 5-10x |
Answer: Simplifies reference counting (thread-safe without granular locks). Makes single-threaded code faster. Makes C extension integration easier. Python 3.13 has experimental free-threaded mode (no-GIL).
Answer: (1) Vectorize with NumPy (broadcast). (2) If too complex, use Numba JIT. (3) Cython for C-level types. (4) multiprocessing if iterations are independent.
Answer: Threading: I/O-bound (shared memory, low overhead). Multiprocessing: CPU-bound (separate memory, bypasses GIL). For downloading 1000 images → threads. For computing 1000 matrix operations → processes.
Answer: JIT compiler that translates Python/NumPy to machine code using LLVM. @jit(nopython=True) for 10-100x speedup. Works best with: NumPy arrays, math operations, loops. Doesn't support: Pandas, string manipulation, most Python objects.
Answer: cProfile: function-level (find slow functions). line_profiler: line-by-line. Py-Spy: sampling (production-safe). tracemalloc: memory. scalene: CPU+memory+GPU all-in-one. Always profile before optimizing.
Answer: Dask: familiar Pandas/NumPy API, Python-native, scales well. Ray: ML-focused (tune, serve), lower-level control. Spark: JVM-based, best for very large (TB+) data, enterprise. For Python ML: Dask or Ray. For big data ETL: Spark.
Answer: Simplifies reference counting. Makes single-threaded faster. Easier C extensions. Python 3.13 has experimental no-GIL mode.
Answer: (1) NumPy vectorize. (2) Numba JIT. (3) Cython. (4) multiprocessing if independent.
Answer: Threading: I/O-bound (shared memory). Multiprocessing: CPU-bound (bypasses GIL). Downloads→threads. Matrix math→processes.
Answer: JIT compiler: Python→machine code via LLVM. @jit(nopython=True). 10-100x for NumPy loops. No Pandas/strings.
Answer: cProfile: functions. line_profiler: lines. Py-Spy: production. tracemalloc: memory. scalene: all-in-one. Profile FIRST, optimize second.
Answer: Dask: Pandas API, Python-native. Ray: ML-focused. Spark: JVM, TB+ data. Python ML: Dask/Ray. Big data ETL: Spark.
Answer: (1) Use sets not lists for lookups. (2) NumPy not Python loops. (3) Generator expressions for memory. Bonus: lru_cache for expensive functions.
Answer: Hash-based memoization. Args must be hashable. maxsize=None for unlimited. cache_info() shows hits/misses. Perfect for pure functions.