Add Erebus foundation model weights
Browse files- README.md +91 -0
- config.json +9 -0
- model.safetensors +3 -0
- tokenizer.json +3 -0
README.md
ADDED
|
@@ -0,0 +1,91 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
language:
|
| 4 |
+
- en
|
| 5 |
+
tags:
|
| 6 |
+
- erebus
|
| 7 |
+
- language-model
|
| 8 |
+
- causal-lm
|
| 9 |
+
- foundation-model
|
| 10 |
+
- pytorch
|
| 11 |
+
pipeline_tag: text-generation
|
| 12 |
+
---
|
| 13 |
+
|
| 14 |
+
# Erebus Tiny
|
| 15 |
+
|
| 16 |
+
**Erebus Tiny** is a decoder-only causal language model (~19M parameters)
|
| 17 |
+
trained from scratch as part of the [Erebus](https://github.com/m-np/erebus)
|
| 18 |
+
foundation-model project.
|
| 19 |
+
|
| 20 |
+
## Model architecture
|
| 21 |
+
|
| 22 |
+
| Attribute | Value |
|
| 23 |
+
|----------------|-------|
|
| 24 |
+
| Architecture | Decoder-only Transformer (GPT-style) |
|
| 25 |
+
| Parameters | ~19M |
|
| 26 |
+
| `d_model` | 256 |
|
| 27 |
+
| `n_heads` | 4 |
|
| 28 |
+
| `n_layers` | 6 |
|
| 29 |
+
| `d_ff` | 1024 |
|
| 30 |
+
| `max_seq_len` | 512 |
|
| 31 |
+
| Vocabulary | 50,257 (GPT-2 BPE) |
|
| 32 |
+
| Positional enc | RoPE |
|
| 33 |
+
| FFN activation | SwiGLU |
|
| 34 |
+
| Normalisation | RMSNorm (pre-norm) |
|
| 35 |
+
| Training steps | 10,000 |
|
| 36 |
+
|
| 37 |
+
## Training details
|
| 38 |
+
|
| 39 |
+
- **Dataset**: FineWeb (`sample-10BT`, ~10 B tokens from CommonCrawl)
|
| 40 |
+
- **Tokeniser**: tiktoken `gpt2` encoding (vocab = 50 257)
|
| 41 |
+
- **Optimiser**: AdamW (β₁=0.9, β₂=0.95, weight decay=0.1)
|
| 42 |
+
- **Schedule**: Cosine decay with linear warm-up
|
| 43 |
+
- **Precision**: bfloat16 mixed precision
|
| 44 |
+
|
| 45 |
+
## How to use
|
| 46 |
+
|
| 47 |
+
```python
|
| 48 |
+
import torch
|
| 49 |
+
from huggingface_hub import hf_hub_download
|
| 50 |
+
from safetensors.torch import load_file
|
| 51 |
+
|
| 52 |
+
# Install: pip install huggingface_hub safetensors tiktoken torch
|
| 53 |
+
|
| 54 |
+
# Download model weights
|
| 55 |
+
weights_path = hf_hub_download("Rzoro/erebus-tiny", "model.safetensors")
|
| 56 |
+
config_path = hf_hub_download("Rzoro/erebus-tiny", "config.json")
|
| 57 |
+
|
| 58 |
+
import json
|
| 59 |
+
with open(config_path) as f:
|
| 60 |
+
cfg_dict = json.load(f)
|
| 61 |
+
|
| 62 |
+
# Build the model (requires erebus repo on your Python path)
|
| 63 |
+
import sys; sys.path.insert(0, "/path/to/erebus")
|
| 64 |
+
from model import ErebusConfig, Erebus
|
| 65 |
+
|
| 66 |
+
config = ErebusConfig(**cfg_dict)
|
| 67 |
+
model = Erebus(config)
|
| 68 |
+
model.load_state_dict(load_file(weights_path))
|
| 69 |
+
model.eval()
|
| 70 |
+
|
| 71 |
+
# Generate text
|
| 72 |
+
import tiktoken
|
| 73 |
+
enc = tiktoken.get_encoding("gpt2")
|
| 74 |
+
prompt = "The foundation of artificial intelligence is"
|
| 75 |
+
input_ids = torch.tensor([enc.encode(prompt)], dtype=torch.long)
|
| 76 |
+
output = model.generate(input_ids, max_new_tokens=100, temperature=0.8)
|
| 77 |
+
print(enc.decode(output[0].tolist()))
|
| 78 |
+
```
|
| 79 |
+
|
| 80 |
+
## Fine-tuning
|
| 81 |
+
|
| 82 |
+
Because weights are in standard PyTorch format and the architecture is a
|
| 83 |
+
plain decoder-only transformer, you can fine-tune with:
|
| 84 |
+
|
| 85 |
+
- **Full fine-tuning**: load weights and train as usual (small model fits on one GPU)
|
| 86 |
+
- **LoRA / QLoRA**: apply PEFT adapters for parameter-efficient fine-tuning
|
| 87 |
+
- **Instruction tuning**: format data with a `### Instruction:` / `### Response:` template
|
| 88 |
+
|
| 89 |
+
## License
|
| 90 |
+
|
| 91 |
+
[MIT](LICENSE)
|
config.json
ADDED
|
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"vocab_size": 50257,
|
| 3 |
+
"d_model": 256,
|
| 4 |
+
"n_heads": 4,
|
| 5 |
+
"n_layers": 6,
|
| 6 |
+
"d_ff": 1024,
|
| 7 |
+
"max_seq_len": 512,
|
| 8 |
+
"dropout": 0.1
|
| 9 |
+
}
|
model.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:749e40a94472b80ac627b25a3347cf9c669bb9edb4d4ea865f31c3923dacc45d
|
| 3 |
+
size 76647648
|
tokenizer.json
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"encoding": "gpt2"
|
| 3 |
+
}
|