Upload folder using huggingface_hub

Browse files

Files changed (4) hide show

README.md +91 -3
config.json +16 -0
model.safetensors +3 -0
tokenizer.json +0 -0

README.md CHANGED Viewed

@@ -1,3 +1,91 @@
----
-license: mit
----

+---
+language:
+- en
+license: mit
+tags:
+- pytorch
+- language-model
+- llm
+- transformer
+- gqa
+- rope
+- swiglu
+library_name: pytorch
+---
+# Mythos-500M
+A 500M parameter decoder-only language model built from scratch.
+## Architecture
+| Component | Value |
+|-----------|-------|
+| Parameters | ~505M |
+| Layers | 40 |
+| Hidden dim | 1024 |
+| Attention | GQA (16Q / 8KV heads) |
+| FFN | SwiGLU (dim=2816) |
+| Position | RoPE (θ=10,000) |
+| Normalization | RMSNorm |
+| Vocabulary | 32,000 BPE |
+| Context | 2048 tokens |
+## Key Design Choices
+- **GQA** — 2× smaller KV cache vs standard MHA
+- **SwiGLU** — +10% quality over GeLU at same FLOP budget
+- **RoPE** — no learnable position embeddings, extrapolates to longer sequences
+- **RMSNorm** — 10% faster than LayerNorm, same stability
+- **Weight tying** — embedding and output share the same matrix
+## Usage
+```python
+import torch
+from safetensors.torch import load_file
+from src.core.transformer import Mythos, ModelConfig
+from src.inference.generate import generate
+# Load model
+config = ModelConfig(
+    vocab_size=32000, d_model=1024, n_layers=40,
+    n_heads=16, n_kv_heads=8, d_ff=2816, max_seq_len=2048
+)
+model = Mythos(config)
+model.load_state_dict(load_file("model.safetensors"))
+model.eval()
+# Generate
+from tokenizers import Tokenizer
+tokenizer = Tokenizer.from_file("tokenizer.json")
+prompt = "The key insight about transformers is"
+ids = tokenizer.encode(prompt).ids
+input_ids = torch.tensor([ids])
+output = generate(model, input_ids, max_new_tokens=100, temperature=0.8)
+print(tokenizer.decode(output[0].tolist()))
+```
+## Training
+- **Data**: FineWeb-Edu (60%) + The Stack (25%) + Books (15%)
+- **Tokens**: ~26B
+- **Hardware**: Apple Silicon M2/M3 or A100
+- **Framework**: PyTorch 2.x
+## License
+MIT — use for anything.
+## Citation
+```bibtex
+@software{graudt2026mythos,
+  author = {Graudt, Boris},
+  title  = {Mythos: A 500M Parameter Language Model from Scratch},
+  year   = {2026},
+  url    = {https://github.com/borisgraudt/mythos}
+}
+```

config.json ADDED Viewed

	@@ -0,0 +1,16 @@

+{
+  "vocab_size": 3252,
+  "d_model": 768,
+  "n_layers": 24,
+  "n_heads": 12,
+  "n_kv_heads": 4,
+  "d_ff": 3072,
+  "max_seq_len": 2048,
+  "dropout": 0.0,
+  "norm_eps": 1e-05,
+  "rope_theta": 10000.0,
+  "model_type": "mythos",
+  "architectures": [
+    "Mythos"
+  ]
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f261ce4c928ad99303b287e428558f4af48f4ee3c76db05216d7a13657ee28b7
+size 615191384

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff