Upload README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,34 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Sparse Transformer: Experiment Suite + Triton Kernels
|
| 2 |
+
|
| 3 |
+
Comprehensive experiment infrastructure for the Chunked Sparse Backward Pass paper.
|
| 4 |
+
|
| 5 |
+
## Files
|
| 6 |
+
|
| 7 |
+
| File | Description |
|
| 8 |
+
|------|-------------|
|
| 9 |
+
| `triton_sparse.py` | Triton-fused sparse backward kernels (dW, dX, dBias) + Python-loop baseline + correctness tests + microbenchmark |
|
| 10 |
+
| `e2e_full.py` | End-to-end training benchmark: Dense vs PyLoop vs Triton at d_model ∈ {512, 1024, 2048} |
|
| 11 |
+
| `full_experiments.py` | 7-experiment ablation suite (baselines, predictor accuracy, chunk ablation, compute-matched, exploration, attention sparsification, sparsity sweep) |
|
| 12 |
+
| `analyze_results.py` | Publication figure generator (matplotlib) |
|
| 13 |
+
|
| 14 |
+
## Quick Start
|
| 15 |
+
|
| 16 |
+
```bash
|
| 17 |
+
pip install torch triton tiktoken matplotlib numpy
|
| 18 |
+
|
| 19 |
+
# Correctness test + microbenchmark
|
| 20 |
+
python triton_sparse.py
|
| 21 |
+
|
| 22 |
+
# End-to-end training (needs ≥24GB GPU for d=2048)
|
| 23 |
+
python e2e_full.py
|
| 24 |
+
|
| 25 |
+
# Full ablation suite (7 experiments, ~4-6 hours on A10G)
|
| 26 |
+
python full_experiments.py --experiment all --device cuda --steps 2000 --seeds "42,123,456"
|
| 27 |
+
|
| 28 |
+
# Single experiment
|
| 29 |
+
python full_experiments.py --experiment baselines --device cuda
|
| 30 |
+
```
|
| 31 |
+
|
| 32 |
+
## Results
|
| 33 |
+
|
| 34 |
+
See `RESULTS.md` for collected tables.
|