sparseflow-chat-v8 / README.md
amewebstudio's picture
Upload README.md with huggingface_hub
868681c verified
metadata
license: mit
tags:
  - sparseflow
  - sparse-attention
  - efficient-nlp
datasets:
  - gsm8k
  - lighteval/MATH
  - allenai/ai2_arc
  - tau/commonsense_qa
  - piqa
  - allenai/sciq
  - trivia_qa
  - nq_open
  - wikitext

SparseFlow v8

Efficient language model with sparse attention and persistent memory.

πŸ“Š REAL Measured Metrics

Metric Value
Parameters 71,359,746
Perplexity 14.77
Attention Sparsity 87.5%
Channel Sparsity 75.0%
Peak Memory 3.67 GB

πŸ—οΈ Architecture

  • Sparse Token Attention: Attends to top-64 tokens per position
  • Sparse Channel FFN: Activates top-128 channels
  • Persistent Memory: 20,000 memory vectors
  • 8 Transformer layers with 512 dim

πŸ“š Training Data

Open source datasets only:

  • GSM8K, MATH (mathematics)
  • ARC, OpenBookQA, SciQ (science & reasoning)
  • CommonsenseQA, PIQA (common sense)
  • TriviaQA, Natural Questions (factual)
  • WikiText-103 (language modeling)

πŸ‘¨β€πŸ’» Author

Logo (Mike Amega) β€” Ame Web Studio