amewebstudio
/

sparseflow-chat-v8

sparse-attention

Model card Files Files and versions

sparseflow-chat-v8 / README.md

amewebstudio's picture

Upload README.md with huggingface_hub

868681c verified 4 days ago

|

history blame contribute delete

1.04 kB

	---
	license: mit
	tags:
	- sparseflow
	- sparse-attention
	- efficient-nlp
	datasets:
	- gsm8k
	- lighteval/MATH
	- allenai/ai2_arc
	- tau/commonsense_qa
	- piqa
	- allenai/sciq
	- trivia_qa
	- nq_open
	- wikitext
	---

	# SparseFlow v8

	Efficient language model with sparse attention and persistent memory.

	## 📊 REAL Measured Metrics

	\| Metric \| Value \|
	\|--------\|-------\|
	\| Parameters \| 71,359,746 \|
	\| Perplexity \| 14.77 \|
	\| Attention Sparsity \| 87.5% \|
	\| Channel Sparsity \| 75.0% \|
	\| Peak Memory \| 3.67 GB \|

	## 🏗️ Architecture

	- Sparse Token Attention: Attends to top-64 tokens per position
	- Sparse Channel FFN: Activates top-128 channels
	- Persistent Memory: 20,000 memory vectors
	- 8 Transformer layers with 512 dim

	## 📚 Training Data

	Open source datasets only:
	- GSM8K, MATH (mathematics)
	- ARC, OpenBookQA, SciQ (science & reasoning)
	- CommonsenseQA, PIQA (common sense)
	- TriviaQA, Natural Questions (factual)
	- WikiText-103 (language modeling)

	## 👨‍💻 Author

	Logo (Mike Amega) — Ame Web Studio