roneneldan/TinyStories
Viewer β’ Updated β’ 2.14M β’ 90.2k β’ 1.01k
MicroMixer-1-500K is a mid-sized model with ~500K parameters. Basic sentence structures like "Once upon a time there was a little boy named Sammy" begin to emerge.
graph TD
A[Byte Input] --> B[Token Embedding]
B --> C[RoPE Position Encoding]
C --> D[ImprovedMixerLayer Γ3]
D --> E[LayerNorm]
E --> F[LM Head]
F --> G[Byte Output]
style A fill:#007BFF,color:#fff
style G fill:#00D620,color:#fff
style D fill:#AE00FF,color:#fff
| Parameter | Value |
|---|---|
| Total Parameters | 557,328 |
| Hidden Dimension | 176 |
| Channel MLP Dimension | 384 |
| Number of Layers | 3 |
| Max Sequence Length | 128 |
| Vocabulary Size | 256 (Byte-level) |
βββββββββββββββββββββββββββββββββββββββββββββββ
β ImprovedMixerLayer β
β βββββββββββββββββββββββββββββββββββββββ β
β β LayerNorm β HyperMixing β Residual β β β Token Mixing
β βββββββββββββββββββββββββββββββββββββββ€ β
β β LayerNorm β MlpBlock β Residual β β β Channel Mixing
β βββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββ
Linear β GELU β Linear| Metric | 300K | 500K | Change |
|---|---|---|---|
| Parameters | 331,680 | 557,328 | 1.7x |
| Hidden Dim | 128 | 176 | 1.4x |
| Channel MLP | 288 | 384 | 1.3x |
| Sequence Length | 128 | 128 | Same |
Prompt: "Once upon a time"
Output: "Once upon a time there was a little boy named Sammy. Limmy love..."
Prompt: "The weather is"
Output: "The weather isy any ve vea laund ye Tha veaverd p shewave cand"
| Limitation | Description |
|---|---|
| Grammatical Errors | Fully grammatical sentences still difficult |
| Unstable Names | Names like "Sammy", "Lily" generated inconsistently |
| Short Prompt Issues | Short prompts like "Hello" produce near-random output |
| Overfitting | Overfits to specific TinyStories phrases |
Dataset: TinyStories
import torch
from huggingface_hub import hf_hub_download
from src.model import MicroMixerV2, MicroMixerV2Config
from src.tokenizer import ByteTokenizer
# Clone the repository first:
# git clone https://github.com/llaa33219/MicroMixer-1.git
# cd MicroMixer-1
config = MicroMixerV2Config(
max_seq_len=128,
hidden_dim=176,
channel_mlp_dim=384,
num_layers=3,
use_hyper=True,
)
model = MicroMixerV2(config)
weights_path = hf_hub_download("llaa33219/MicroMixer-1-500K-TinyStories", "model.pt")
model.load_state_dict(torch.load(weights_path, map_location="cpu"))
model.eval()
tokenizer = ByteTokenizer()
input_ids = torch.tensor([tokenizer.encode("Once upon a time")])
with torch.no_grad():
output = model.generate(input_ids, max_new_tokens=64, temperature=0.8, top_k=40)
print(tokenizer.decode(output[0].tolist()))
Part of the MicroMixer-1 research project