Upload README.md with huggingface_hub

868681c verified 4 days ago

1.04 kB

license: mit
tags:
  - sparseflow
  - sparse-attention
  - efficient-nlp
datasets:
  - gsm8k
  - lighteval/MATH
  - allenai/ai2_arc
  - tau/commonsense_qa
  - piqa
  - allenai/sciq
  - trivia_qa
  - nq_open
  - wikitext

SparseFlow v8

Efficient language model with sparse attention and persistent memory.

📊 REAL Measured Metrics

Metric	Value
Parameters	71,359,746
Perplexity	14.77
Attention Sparsity	87.5%
Channel Sparsity	75.0%
Peak Memory	3.67 GB

🏗️ Architecture

Sparse Token Attention: Attends to top-64 tokens per position
Sparse Channel FFN: Activates top-128 channels
Persistent Memory: 20,000 memory vectors
8 Transformer layers with 512 dim

📚 Training Data

Open source datasets only:

GSM8K, MATH (mathematics)
ARC, OpenBookQA, SciQ (science & reasoning)
CommonsenseQA, PIQA (common sense)
TriviaQA, Natural Questions (factual)
WikiText-103 (language modeling)

👨‍💻 Author

Logo (Mike Amega) — Ame Web Studio