theapemachine
/

sparse-transformer-experiments

Model card Files Files and versions

sparse-transformer-experiments / README.md

theapemachine's picture

Upload README.md

7cf627f verified 14 days ago

|

history blame contribute delete

1.19 kB

	# Sparse Transformer: Experiment Suite + Triton Kernels

	Comprehensive experiment infrastructure for the Chunked Sparse Backward Pass paper.

	## Files

	\| File \| Description \|
	\|------\|-------------\|
	\| `triton_sparse.py` \| Triton-fused sparse backward kernels (dW, dX, dBias) + Python-loop baseline + correctness tests + microbenchmark \|
	\| `e2e_full.py` \| End-to-end training benchmark: Dense vs PyLoop vs Triton at d_model ∈ {512, 1024, 2048} \|
	\| `full_experiments.py` \| 7-experiment ablation suite (baselines, predictor accuracy, chunk ablation, compute-matched, exploration, attention sparsification, sparsity sweep) \|
	\| `analyze_results.py` \| Publication figure generator (matplotlib) \|

	## Quick Start

	```bash
	pip install torch triton tiktoken matplotlib numpy

	# Correctness test + microbenchmark
	python triton_sparse.py

	# End-to-end training (needs ≥24GB GPU for d=2048)
	python e2e_full.py

	# Full ablation suite (7 experiments, ~4-6 hours on A10G)
	python full_experiments.py --experiment all --device cuda --steps 2000 --seeds "42,123,456"

	# Single experiment
	python full_experiments.py --experiment baselines --device cuda
	```

	## Results

	See `RESULTS.md` for collected tables.