theapemachine
/

sparse-transformer-experiments

Model card Files Files and versions

theapemachine commited on 29 days ago

Commit

7cf627f

·

verified ·

1 Parent(s): e5e3719

Upload README.md

Files changed (1) hide show

README.md +34 -0

README.md ADDED Viewed

	@@ -0,0 +1,34 @@

+# Sparse Transformer: Experiment Suite + Triton Kernels
+Comprehensive experiment infrastructure for the Chunked Sparse Backward Pass paper.
+## Files
+| File | Description |
+|------|-------------|
+| `triton_sparse.py` | Triton-fused sparse backward kernels (dW, dX, dBias) + Python-loop baseline + correctness tests + microbenchmark |
+| `e2e_full.py` | End-to-end training benchmark: Dense vs PyLoop vs Triton at d_model ∈ {512, 1024, 2048} |
+| `full_experiments.py` | 7-experiment ablation suite (baselines, predictor accuracy, chunk ablation, compute-matched, exploration, attention sparsification, sparsity sweep) |
+| `analyze_results.py` | Publication figure generator (matplotlib) |
+## Quick Start
+```bash
+pip install torch triton tiktoken matplotlib numpy
+# Correctness test + microbenchmark
+python triton_sparse.py
+# End-to-end training (needs ≥24GB GPU for d=2048)
+python e2e_full.py
+# Full ablation suite (7 experiments, ~4-6 hours on A10G)
+python full_experiments.py --experiment all --device cuda --steps 2000 --seeds "42,123,456"
+# Single experiment
+python full_experiments.py --experiment baselines --device cuda
+```
+## Results
+See `RESULTS.md` for collected tables.