title: TorchCode
emoji: π₯
colorFrom: red
colorTo: yellow
sdk: docker
app_port: 7860
pinned: false
π₯ TorchCode
Crack the PyTorch interview.
Practice implementing operators and architectures from scratch β the exact skills top ML teams test for.
Like LeetCode, but for tensors. Self-hosted. Jupyter-based. Instant feedback.
π― Why TorchCode?
Top companies (Meta, Google DeepMind, OpenAI, etc.) expect ML engineers to implement core operations from memory on a whiteboard. Reading papers isn't enough β you need to write softmax, LayerNorm, MultiHeadAttention, and full Transformer blocks code.
TorchCode gives you a structured practice environment with:
No cloud. No signup. No GPU needed. Just make run β or try it instantly on Hugging Face.
π Quick Start
Option 0 β Try it online (zero install)
Launch on Hugging Face Spaces β opens a full JupyterLab environment in your browser. Nothing to install.
Or open any problem directly in Google Colab β every notebook has an badge.
Option 0b β Use the judge in Colab (pip)
In Google Colab, install the judge from PyPI so you can run check(...) without cloning the repo:
!pip install torch-judge
Then in a notebook cell:
from torch_judge import check, status, hint, reset_progress
status() # list all problems and your progress
check("relu") # run tests for the "relu" task
hint("relu") # show a hint
Option 1 β Pull the pre-built image (fastest)
docker run -p 8888:8888 -e PORT=8888 ghcr.io/duoan/torchcode:latest
Option 2 β Build locally
make run
Open http://localhost:8888 β that's it. Works with both Docker and Podman (auto-detected).
π Problem Set
Frequency: π₯ = very likely in interviews, β = commonly asked, π‘ = emerging / differentiator
π§± Fundamentals β "Implement X from scratch"
The bread and butter of ML coding interviews. You'll be asked to write these without torch.nn.
| # | Problem | What You'll Implement | Difficulty | Freq | Key Concepts |
|---|---|---|---|---|---|
| 1 | ReLU |
relu(x) |
π₯ | Activation functions, element-wise ops | |
| 2 | Softmax |
my_softmax(x, dim) |
π₯ | Numerical stability, exp/log tricks | |
| 16 | Cross-Entropy Loss |
cross_entropy_loss(logits, targets) |
π₯ | Log-softmax, logsumexp trick | |
| 17 | Dropout |
MyDropout (nn.Module) |
π₯ | Train/eval mode, inverted scaling | |
| 18 | Embedding |
MyEmbedding (nn.Module) |
π₯ | Lookup table, weight[indices] |
|
| 19 | GELU |
my_gelu(x) |
β | Gaussian error linear unit, torch.erf |
|
| 20 | Kaiming Init |
kaiming_init(weight) |
β | std = sqrt(2/fan_in), variance scaling |
|
| 21 | Gradient Clipping |
clip_grad_norm(params, max_norm) |
β | Norm-based clipping, direction preservation | |
| 31 | Gradient Accumulation |
accumulated_step(model, opt, ...) |
π‘ | Micro-batching, loss scaling | |
| 40 | Linear Regression |
LinearRegression (3 methods) |
π₯ | Normal equation, GD from scratch, nn.Linear | |
| 3 | Linear Layer |
SimpleLinear (nn.Module) |
π₯ | y = xW^T + b, Kaiming init, nn.Parameter |
|
| 4 | LayerNorm |
my_layer_norm(x, Ξ³, Ξ²) |
π₯ | Normalization, running stats, affine transform | |
| 7 | BatchNorm |
my_batch_norm(x, Ξ³, Ξ²) |
β | Batch vs layer statistics, train/eval behavior | |
| 8 | RMSNorm |
rms_norm(x, weight) |
β | LLaMA-style norm, simpler than LayerNorm | |
| 15 | SwiGLU MLP |
SwiGLUMLP (nn.Module) |
β | Gated FFN, SiLU(gate) * up, LLaMA/Mistral-style |
|
| 22 | Conv2d |
my_conv2d(x, weight, ...) |
π₯ | Convolution, unfold, stride/padding |
π§ Attention Mechanisms β The heart of modern ML interviews
If you're interviewing for any role touching LLMs or Transformers, expect at least one of these.
| # | Problem | What You'll Implement | Difficulty | Freq | Key Concepts |
|---|---|---|---|---|---|
| 23 | Cross-Attention |
MultiHeadCrossAttention (nn.Module) |
β | Encoder-decoder, Q from decoder, K/V from encoder | |
| 5 | Scaled Dot-Product Attention |
scaled_dot_product_attention(Q, K, V) |
π₯ | softmax(QK^T/βd_k)V, the foundation of everything |
|
| 6 | Multi-Head Attention |
MultiHeadAttention (nn.Module) |
π₯ | Parallel heads, split/concat, projection matrices | |
| 9 | Causal Self-Attention |
causal_attention(Q, K, V) |
π₯ | Autoregressive masking with -inf, GPT-style |
|
| 10 | Grouped Query Attention |
GroupQueryAttention (nn.Module) |
β | GQA (LLaMA 2), KV sharing across heads | |
| 11 | Sliding Window Attention |
sliding_window_attention(Q, K, V, w) |
β | Mistral-style local attention, O(nΒ·w) complexity | |
| 12 | Linear Attention |
linear_attention(Q, K, V) |
π‘ | Kernel trick, Ο(Q)(Ο(K)^TV), O(nΒ·dΒ²) |
|
| 14 | KV Cache Attention |
KVCacheAttention (nn.Module) |
π₯ | Incremental decoding, cache K/V, prefill vs decode | |
| 24 | RoPE |
apply_rope(q, k) |
π₯ | Rotary position embedding, relative position via rotation | |
| 25 | Flash Attention |
flash_attention(Q, K, V, block_size) |
π‘ | Tiled attention, online softmax, memory-efficient |
ποΈ Architecture & Adaptation β Put it all together
| # | Problem | What You'll Implement | Difficulty | Freq | Key Concepts |
|---|---|---|---|---|---|
| 26 | LoRA |
LoRALinear (nn.Module) |
β | Low-rank adaptation, frozen base + BA update |
|
| 27 | ViT Patch Embedding |
PatchEmbedding (nn.Module) |
π‘ | Image β patches β linear projection | |
| 13 | GPT-2 Block |
GPT2Block (nn.Module) |
β | Pre-norm, causal MHA + MLP (4x, GELU), residual connections | |
| 28 | Mixture of Experts |
MixtureOfExperts (nn.Module) |
β | Mixtral-style, top-k routing, expert MLPs |
βοΈ Training & Optimization
| # | Problem | What You'll Implement | Difficulty | Freq | Key Concepts |
|---|---|---|---|---|---|
| 29 | Adam Optimizer |
MyAdam |
β | Momentum + RMSProp, bias correction | |
| 30 | Cosine LR Scheduler |
cosine_lr_schedule(step, ...) |
β | Linear warmup + cosine annealing |
π― Inference & Decoding
| # | Problem | What You'll Implement | Difficulty | Freq | Key Concepts |
|---|---|---|---|---|---|
| 32 | Top-k / Top-p Sampling |
sample_top_k_top_p(logits, ...) |
π₯ | Nucleus sampling, temperature scaling | |
| 33 | Beam Search |
beam_search(log_prob_fn, ...) |
π₯ | Hypothesis expansion, pruning, eos handling | |
| 34 | Speculative Decoding |
speculative_decode(target, draft, ...) |
π‘ | Accept/reject, draft model acceleration |
π¬ Advanced β Differentiators
| # | Problem | What You'll Implement | Difficulty | Freq | Key Concepts |
|---|---|---|---|---|---|
| 35 | BPE Tokenizer |
SimpleBPE |
π‘ | Byte-pair encoding, merge rules, subword splits | |
| 36 | INT8 Quantization |
Int8Linear (nn.Module) |
π‘ | Per-channel quantize, scale/zero-point, buffer vs param | |
| 37 | DPO Loss |
dpo_loss(chosen, rejected, ...) |
π‘ | Direct preference optimization, alignment training | |
| 38 | GRPO Loss |
grpo_loss(logps, rewards, group_ids, eps) |
π‘ | Group relative policy optimization, RLAIF, within-group normalized advantages | |
| 39 | PPO Loss |
ppo_loss(new_logps, old_logps, advantages, clip_ratio) |
π‘ | PPO clipped surrogate loss, policy gradient, trust region |
βοΈ How It Works
Each problem has two notebooks:
| File | Purpose |
|---|---|
01_relu.ipynb |
βοΈ Blank template β write your code here |
01_relu_solution.ipynb |
π Reference solution β check when stuck |
Workflow
1. Open a blank notebook β Read the problem description
2. Implement your solution β Use only basic PyTorch ops
3. Debug freely β print(x.shape), check gradients, etc.
4. Run the judge cell β check("relu")
5. See instant colored feedback β β
pass / β fail per test case
6. Stuck? Get a nudge β hint("relu")
7. Review the reference solution β 01_relu_solution.ipynb
8. Click π Reset in the toolbar β Blank slate β practice again!
In-Notebook API
from torch_judge import check, hint, status
check("relu") # Judge your implementation
hint("causal_attention") # Get a hint without full spoiler
status() # Progress dashboard β solved / attempted / todo
π Suggested Study Plan
Total: ~12β16 hours spread across 3β4 weeks. Perfect for interview prep on a deadline.
| Week | Focus | Problems | Time |
|---|---|---|---|
| 1 | π§± Foundations | ReLU β Softmax β CE Loss β Dropout β Embedding β GELU β Linear β LayerNorm β BatchNorm β RMSNorm β SwiGLU MLP β Conv2d | 2β3 hrs |
| 2 | π§ Attention Deep Dive | SDPA β MHA β Cross-Attn β Causal β GQA β KV Cache β Sliding Window β RoPE β Linear Attn β Flash Attn | 3β4 hrs |
| 3 | ποΈ Architecture + Training | GPT-2 Block β LoRA β MoE β ViT Patch β Adam β Cosine LR β Grad Clip β Grad Accumulation β Kaiming Init | 3β4 hrs |
| 4 | π― Inference + Advanced | Top-k/p Sampling β Beam Search β Speculative Decoding β BPE β INT8 Quant β DPO Loss β GRPO Loss β PPO Loss + speed run | 3β4 hrs |
ποΈ Architecture
ββββββββββββββββββββββββββββββββββββββββββββ
β Docker / Podman Container β
β β
β JupyterLab (:8888) β
β βββ templates/ (reset on each run) β
β βββ solutions/ (reference impl) β
β βββ torch_judge/ (auto-grading) β
β βββ torchcode-labext (JLab plugin) β
β β π Reset β restore template β
β β π Colab β open in Colab β
β βββ PyTorch (CPU), NumPy β
β β
β Judge checks: β
β β Output correctness (allclose) β
β β Gradient flow (autograd) β
β β Shape consistency β
β β Edge cases & numerical stability β
ββββββββββββββββββββββββββββββββββββββββββββ
Single container. Single port. No database. No frontend framework. No GPU.
π οΈ Commands
make run # Build & start (http://localhost:8888)
make stop # Stop the container
make clean # Stop + remove volumes + reset all progress
π§© Adding Your Own Problems
TorchCode uses auto-discovery β just drop a new file in torch_judge/tasks/:
TASK = {
"id": "my_task",
"title": "My Custom Problem",
"difficulty": "medium",
"function_name": "my_function",
"hint": "Think about broadcasting...",
"tests": [ ... ],
}
No registration needed. The judge picks it up automatically.
π¦ Publishing torch-judge to PyPI (maintainers)
The judge is published as a separate package so Colab/users can pip install torch-judge without cloning the repo.
Automatic (GitHub Action)
Pushing to master after changing the package version triggers .github/workflows/pypi-publish.yml, which builds and uploads to PyPI. No git tag is required.
- Bump version in
torch_judge/_version.py(e.g.__version__ = "0.1.1"). - Configure PyPI Trusted Publisher (one-time):
- PyPI β Your project torch-judge β Publishing β Add a new pending publisher
- Owner:
duoan, Repository:TorchCode, Workflow:pypi-publish.yml, Environment: (leave empty) - Run the workflow once (push a version bump to
masteror Actions β Publish torch-judge to PyPI β Run workflow); PyPI will then link the publisher.
- Release: commit the version bump and
git push origin master.
Alternatively, use an API token: add repository secret PYPI_API_TOKEN (value = pypi-... from PyPI) and set TWINE_USERNAME=__token__ and TWINE_PASSWORD from that secret in the workflow if you prefer not to use Trusted Publishing.
Manual
pip install build twine
python -m build
twine upload dist/*
Version is in torch_judge/_version.py; bump it before each release.
β FAQ
Do I need a GPU?
No. Everything runs on CPU. The problems test correctness and understanding, not throughput.
Can I keep my solutions between runs?
Blank templates reset on every
make run so you practice from scratch. Save your work under a different filename if you want to keep it. You can also click the π Reset button in the notebook toolbar at any time to restore the blank template without restarting.
Can I use Google Colab instead?
Yes! Every notebook has an Open in Colab badge at the top. Click it to open the problem directly in Google Colab β no Docker or local setup needed. You can also use the Colab toolbar button inside JupyterLab.
How are solutions graded?
The judge runs your function against multiple test cases using
torch.allclose for numerical correctness, verifies gradients flow properly via autograd, and checks edge cases specific to each operation.
Who is this for?
Anyone preparing for ML/AI engineering interviews at top tech companies, or anyone who wants to deeply understand how PyTorch operations work under the hood.
π€ Contributors
Thanks to everyone who has contributed to TorchCode.
Auto-generated from the GitHub contributors graph with avatars and GitHub usernames.