Erebus-Medium
Erebus-Medium is a decoder-only causal language model (~454M parameters) trained from scratch as part of the Erebus foundation-model project.
Model architecture
| Attribute | Value |
|---|---|
| Architecture | Decoder-only Transformer (GPT-style) |
| Parameters | ~454M |
d_model |
1024 |
n_heads |
16 |
n_layers |
24 |
d_ff |
4096 |
max_seq_len |
1024 |
| Vocabulary | 50,257 (GPT-2 BPE) |
| Positional enc | RoPE |
| FFN activation | SwiGLU |
| Normalisation | RMSNorm (pre-norm) |
| Training steps | 20,000 |
Training details
- Dataset: FineWeb (
sample-10BT, ~10 B tokens from CommonCrawl) - Tokeniser: tiktoken
gpt2encoding (vocab = 50 257) - Optimiser: AdamW (β₁=0.9, β₂=0.95, weight decay=0.1)
- Schedule: Cosine decay with linear warm-up
- Precision: bfloat16 mixed precision
How to use
import torch
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
# Install: pip install huggingface_hub safetensors tiktoken torch
# Download model weights
weights_path = hf_hub_download("Rzoro/erebus-medium", "model.safetensors")
config_path = hf_hub_download("Rzoro/erebus-medium", "config.json")
import json
with open(config_path) as f:
cfg_dict = json.load(f)
# Build the model (requires erebus repo on your Python path)
import sys; sys.path.insert(0, "/path/to/erebus")
from model import ErebusConfig, Erebus
config = ErebusConfig(**cfg_dict)
model = Erebus(config)
model.load_state_dict(load_file(weights_path))
model.eval()
# Generate text
import tiktoken
enc = tiktoken.get_encoding("gpt2")
prompt = "The foundation of artificial intelligence is"
input_ids = torch.tensor([enc.encode(prompt)], dtype=torch.long)
output = model.generate(input_ids, max_new_tokens=100, temperature=0.8)
print(enc.decode(output[0].tolist()))
Fine-tuning
Because weights are in standard PyTorch format and the architecture is a plain decoder-only transformer, you can fine-tune with:
- Full fine-tuning: load weights and train as usual (small model fits on one GPU)
- LoRA / QLoRA: apply PEFT adapters for parameter-efficient fine-tuning
- Instruction tuning: format data with a
### Instruction:/### Response:template
License
- Downloads last month
- 307