| | --- |
| | license: mit |
| | tags: |
| | - sparseflow |
| | - sparse-attention |
| | - efficient-nlp |
| | datasets: |
| | - gsm8k |
| | - lighteval/MATH |
| | - allenai/ai2_arc |
| | - tau/commonsense_qa |
| | - piqa |
| | - allenai/sciq |
| | - trivia_qa |
| | - nq_open |
| | - wikitext |
| | --- |
| | |
| | # SparseFlow v8 |
| |
|
| | Efficient language model with **sparse attention** and **persistent memory**. |
| |
|
| | ## π REAL Measured Metrics |
| |
|
| | | Metric | Value | |
| | |--------|-------| |
| | | Parameters | 71,359,746 | |
| | | Perplexity | 14.77 | |
| | | Attention Sparsity | 87.5% | |
| | | Channel Sparsity | 75.0% | |
| | | Peak Memory | 3.67 GB | |
| |
|
| | ## ποΈ Architecture |
| |
|
| | - **Sparse Token Attention**: Attends to top-64 tokens per position |
| | - **Sparse Channel FFN**: Activates top-128 channels |
| | - **Persistent Memory**: 20,000 memory vectors |
| | - **8 Transformer layers** with 512 dim |
| |
|
| | ## π Training Data |
| |
|
| | Open source datasets only: |
| | - GSM8K, MATH (mathematics) |
| | - ARC, OpenBookQA, SciQ (science & reasoning) |
| | - CommonsenseQA, PIQA (common sense) |
| | - TriviaQA, Natural Questions (factual) |
| | - WikiText-103 (language modeling) |
| |
|
| | ## π¨βπ» Author |
| |
|
| | **Logo (Mike Amega)** β Ame Web Studio |
| |
|