| # Sparse Transformer: Experiment Suite + Triton Kernels |
|
|
| Comprehensive experiment infrastructure for the Chunked Sparse Backward Pass paper. |
|
|
| ## Files |
|
|
| | File | Description | |
| |------|-------------| |
| | `triton_sparse.py` | Triton-fused sparse backward kernels (dW, dX, dBias) + Python-loop baseline + correctness tests + microbenchmark | |
| | `e2e_full.py` | End-to-end training benchmark: Dense vs PyLoop vs Triton at d_model ∈ {512, 1024, 2048} | |
| | `full_experiments.py` | 7-experiment ablation suite (baselines, predictor accuracy, chunk ablation, compute-matched, exploration, attention sparsification, sparsity sweep) | |
| | `analyze_results.py` | Publication figure generator (matplotlib) | |
|
|
| ## Quick Start |
|
|
| ```bash |
| pip install torch triton tiktoken matplotlib numpy |
| |
| # Correctness test + microbenchmark |
| python triton_sparse.py |
| |
| # End-to-end training (needs ≥24GB GPU for d=2048) |
| python e2e_full.py |
| |
| # Full ablation suite (7 experiments, ~4-6 hours on A10G) |
| python full_experiments.py --experiment all --device cuda --steps 2000 --seeds "42,123,456" |
| |
| # Single experiment |
| python full_experiments.py --experiment baselines --device cuda |
| ``` |
|
|
| ## Results |
|
|
| See `RESULTS.md` for collected tables. |
|
|