| --- |
| license: mit |
| language: |
| - en |
| tags: |
| - erebus |
| - language-model |
| - causal-lm |
| - foundation-model |
| - pytorch |
| pipeline_tag: text-generation |
| --- |
| |
| # Erebus Tiny |
|
|
| **Erebus Tiny** is a decoder-only causal language model (~19M parameters) |
| trained from scratch as part of the [Erebus](https://github.com/m-np/erebus) |
| foundation-model project. |
|
|
| ## Model architecture |
|
|
| | Attribute | Value | |
| |----------------|-------| |
| | Architecture | Decoder-only Transformer (GPT-style) | |
| | Parameters | ~19M | |
| | `d_model` | 256 | |
| | `n_heads` | 4 | |
| | `n_layers` | 6 | |
| | `d_ff` | 1024 | |
| | `max_seq_len` | 512 | |
| | Vocabulary | 50,257 (GPT-2 BPE) | |
| | Positional enc | RoPE | |
| | FFN activation | SwiGLU | |
| | Normalisation | RMSNorm (pre-norm) | |
| | Training steps | 10,000 | |
|
|
| ## Training details |
|
|
| - **Dataset**: FineWeb (`sample-10BT`, ~10 B tokens from CommonCrawl) |
| - **Tokeniser**: tiktoken `gpt2` encoding (vocab = 50 257) |
| - **Optimiser**: AdamW (β₁=0.9, β₂=0.95, weight decay=0.1) |
| - **Schedule**: Cosine decay with linear warm-up |
| - **Precision**: bfloat16 mixed precision |
|
|
| ## How to use |
|
|
| ```python |
| import torch |
| from huggingface_hub import hf_hub_download |
| from safetensors.torch import load_file |
| |
| # Install: pip install huggingface_hub safetensors tiktoken torch |
| |
| # Download model weights |
| weights_path = hf_hub_download("Rzoro/erebus-tiny", "model.safetensors") |
| config_path = hf_hub_download("Rzoro/erebus-tiny", "config.json") |
| |
| import json |
| with open(config_path) as f: |
| cfg_dict = json.load(f) |
| |
| # Build the model (requires erebus repo on your Python path) |
| import sys; sys.path.insert(0, "/path/to/erebus") |
| from model import ErebusConfig, Erebus |
| |
| config = ErebusConfig(**cfg_dict) |
| model = Erebus(config) |
| model.load_state_dict(load_file(weights_path)) |
| model.eval() |
| |
| # Generate text |
| import tiktoken |
| enc = tiktoken.get_encoding("gpt2") |
| prompt = "The foundation of artificial intelligence is" |
| input_ids = torch.tensor([enc.encode(prompt)], dtype=torch.long) |
| output = model.generate(input_ids, max_new_tokens=100, temperature=0.8) |
| print(enc.decode(output[0].tolist())) |
| ``` |
|
|
| ## Fine-tuning |
|
|
| Because weights are in standard PyTorch format and the architecture is a |
| plain decoder-only transformer, you can fine-tune with: |
|
|
| - **Full fine-tuning**: load weights and train as usual (small model fits on one GPU) |
| - **LoRA / QLoRA**: apply PEFT adapters for parameter-efficient fine-tuning |
| - **Instruction tuning**: format data with a `### Instruction:` / `### Response:` template |
|
|
| ## License |
|
|
| [MIT](LICENSE) |
|
|