--- license: mit tags: - sparseflow - sparse-attention - efficient-nlp datasets: - gsm8k - lighteval/MATH - allenai/ai2_arc - tau/commonsense_qa - piqa - allenai/sciq - trivia_qa - nq_open - wikitext --- # SparseFlow v8 Efficient language model with **sparse attention** and **persistent memory**. ## 📊 REAL Measured Metrics | Metric | Value | |--------|-------| | Parameters | 71,359,746 | | Perplexity | 14.77 | | Attention Sparsity | 87.5% | | Channel Sparsity | 75.0% | | Peak Memory | 3.67 GB | ## 🏗️ Architecture - **Sparse Token Attention**: Attends to top-64 tokens per position - **Sparse Channel FFN**: Activates top-128 channels - **Persistent Memory**: 20,000 memory vectors - **8 Transformer layers** with 512 dim ## 📚 Training Data Open source datasets only: - GSM8K, MATH (mathematics) - ARC, OpenBookQA, SciQ (science & reasoning) - CommonsenseQA, PIQA (common sense) - TriviaQA, Natural Questions (factual) - WikiText-103 (language modeling) ## 👨‍💻 Author **Logo (Mike Amega)** — Ame Web Studio