|
|
--- |
|
|
license: mit |
|
|
language: |
|
|
- en |
|
|
tags: |
|
|
- text-generation |
|
|
- story-generation |
|
|
- tiny-model |
|
|
- efficient-attention |
|
|
- unified-attention |
|
|
library_name: pytorch |
|
|
pipeline_tag: text-generation |
|
|
model-index: |
|
|
- name: yocto |
|
|
results: |
|
|
- task: |
|
|
type: text-generation |
|
|
dataset: |
|
|
name: TinyStories |
|
|
type: roneneldan/TinyStories |
|
|
metrics: |
|
|
- name: Perplexity |
|
|
type: perplexity |
|
|
value: 9.58 |
|
|
--- |
|
|
|
|
|
# YOCTO — World's Smallest Language Model |
|
|
|
|
|
<p align="center"> |
|
|
<img src="https://img.shields.io/badge/Parameters-484K-blue" alt="Parameters"> |
|
|
<img src="https://img.shields.io/badge/Size-946KB-green" alt="Size"> |
|
|
<img src="https://img.shields.io/badge/Speed-700%2B%20tok%2Fs-orange" alt="Speed"> |
|
|
<img src="https://img.shields.io/badge/Perplexity-9.58-purple" alt="Perplexity"> |
|
|
</p> |
|
|
|
|
|
Yocto is a 484K parameter language model that tells children's stories. It achieves 9.58 perplexity on TinyStories — matching models 2-4× larger. |
|
|
|
|
|
## Key Innovation: Unified Attention |
|
|
|
|
|
Standard transformers use 3 separate projections (Q, K, V). Yocto uses one unified projection that splits into [seeking|offering|content] bands: |
|
|
|
|
|
``` |
|
|
Standard: Q = W_Q·x, K = W_K·x, V = W_V·x [3d² params] |
|
|
Unified: u = W·x → [seeking|offering|content] [d² params] |
|
|
``` |
|
|
|
|
|
Result: **67% fewer attention parameters**, better perplexity. |
|
|
|
|
|
## Quick Start |
|
|
|
|
|
```python |
|
|
import torch |
|
|
from huggingface_hub import hf_hub_download |
|
|
|
|
|
# Download model |
|
|
model_path = hf_hub_download(repo_id="Reinforce-ai/yocto", filename="model.pt") |
|
|
tokenizer_path = hf_hub_download(repo_id="Reinforce-ai/yocto", filename="tokenizer.json") |
|
|
|
|
|
# Load and generate (see GitHub for full code) |
|
|
``` |
|
|
|
|
|
## Performance |
|
|
|
|
|
| Metric | Value | |
|
|
|--------|-------| |
|
|
| Parameters | 484,272 | |
|
|
| Size (fp16) | 946 KB | |
|
|
| Attention share | 5.7% | |
|
|
| Perplexity | 9.58 | |
|
|
| Speed (CPU) | **700+ tok/s** | |
|
|
|
|
|
## Example Output |
|
|
|
|
|
**Prompt:** "Once upon a time" |
|
|
|
|
|
> Once upon a time, there was a little girl named Lily. She loved to play with her toys all day long. One day, she found a shiny thing on the shelf. The little girl said, "Look, mommy, look!" Her mommy explained that it's very cool, so Lily and her mommy went to the store to buy some tasty food. |
|
|
|
|
|
## Architecture |
|
|
|
|
|
| Component | Value | |
|
|
|-----------|-------| |
|
|
| Embedding dim | 72 | |
|
|
| Layers | 4 | |
|
|
| Attention heads | 3 | |
|
|
| FFN dim | 288 | |
|
|
| Vocab size | 4,000 | |
|
|
| Context length | 512 | |
|
|
|
|
|
## Live Demo |
|
|
|
|
|
Try Yocto in your browser: [HuggingFace Space](https://huggingface.co/spaces/Reinforce-ai/yocto-demo) |
|
|
|
|
|
## Links |
|
|
|
|
|
- 🌐 **Website**: [reinforceai.com/yocto](https://www.reinforceai.com/yocto) |
|
|
- 💻 **GitHub**: [github.com/reinforceai/yocto](https://github.com/reinforceai/yocto) |
|
|
- 📄 **Paper**: [Attention Fields: Unified Projections for Efficient Language Models](https://github.com/reinforceai/yocto/blob/main/README.md) |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@misc{deshwal2026yocto, |
|
|
title={Attention Fields: Unified Projections for Efficient Language Models}, |
|
|
author={Deshwal, Viraj}, |
|
|
year={2026}, |
|
|
url={https://www.reinforceai.com/yocto}, |
|
|
howpublished={\url{https://github.com/reinforceai/yocto}} |
|
|
} |
|
|
``` |
|
|
|
|
|
## License |
|
|
|
|
|
MIT |