Erebus-Small

Erebus-Small is a decoder-only causal language model (~152M parameters) trained from scratch as part of the Erebus foundation-model project.

Model architecture

Attribute Value
Architecture Decoder-only Transformer (GPT-style)
Parameters ~152M
d_model 768
n_heads 12
n_layers 12
d_ff 3072
max_seq_len 1024
Vocabulary 50,257 (GPT-2 BPE)
Positional enc RoPE
FFN activation SwiGLU
Normalisation RMSNorm (pre-norm)
Training steps 42,000

Training details

  • Dataset: FineWeb (sample-10BT, ~10 B tokens from CommonCrawl)
  • Tokeniser: tiktoken gpt2 encoding (vocab = 50 257)
  • Optimiser: AdamW (β₁=0.9, β₂=0.95, weight decay=0.1)
  • Schedule: Cosine decay with linear warm-up
  • Precision: bfloat16 mixed precision

How to use

import torch
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file

# Install: pip install huggingface_hub safetensors tiktoken torch

# Download model weights
weights_path = hf_hub_download("Rzoro/erebus-small", "model.safetensors")
config_path  = hf_hub_download("Rzoro/erebus-small", "config.json")

import json
with open(config_path) as f:
    cfg_dict = json.load(f)

# Build the model (requires erebus repo on your Python path)
import sys; sys.path.insert(0, "/path/to/erebus")
from model import ErebusConfig, Erebus

config = ErebusConfig(**cfg_dict)
model  = Erebus(config)
model.load_state_dict(load_file(weights_path))
model.eval()

# Generate text
import tiktoken
enc = tiktoken.get_encoding("gpt2")
prompt = "The foundation of artificial intelligence is"
input_ids = torch.tensor([enc.encode(prompt)], dtype=torch.long)
output = model.generate(input_ids, max_new_tokens=100, temperature=0.8)
print(enc.decode(output[0].tolist()))

Fine-tuning

Because weights are in standard PyTorch format and the architecture is a plain decoder-only transformer, you can fine-tune with:

  • Full fine-tuning: load weights and train as usual (small model fits on one GPU)
  • LoRA / QLoRA: apply PEFT adapters for parameter-efficient fine-tuning
  • Instruction tuning: format data with a ### Instruction: / ### Response: template

License

MIT

Downloads last month
35
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support