Rzoro commited on
Commit
45493f0
·
verified ·
1 Parent(s): 88f3199

Add Erebus foundation model weights

Browse files
Files changed (4) hide show
  1. README.md +91 -0
  2. config.json +9 -0
  3. model.safetensors +3 -0
  4. tokenizer.json +3 -0
README.md ADDED
@@ -0,0 +1,91 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ tags:
6
+ - erebus
7
+ - language-model
8
+ - causal-lm
9
+ - foundation-model
10
+ - pytorch
11
+ pipeline_tag: text-generation
12
+ ---
13
+
14
+ # Erebus Tiny
15
+
16
+ **Erebus Tiny** is a decoder-only causal language model (~19M parameters)
17
+ trained from scratch as part of the [Erebus](https://github.com/m-np/erebus)
18
+ foundation-model project.
19
+
20
+ ## Model architecture
21
+
22
+ | Attribute | Value |
23
+ |----------------|-------|
24
+ | Architecture | Decoder-only Transformer (GPT-style) |
25
+ | Parameters | ~19M |
26
+ | `d_model` | 256 |
27
+ | `n_heads` | 4 |
28
+ | `n_layers` | 6 |
29
+ | `d_ff` | 1024 |
30
+ | `max_seq_len` | 512 |
31
+ | Vocabulary | 50,257 (GPT-2 BPE) |
32
+ | Positional enc | RoPE |
33
+ | FFN activation | SwiGLU |
34
+ | Normalisation | RMSNorm (pre-norm) |
35
+ | Training steps | 10,000 |
36
+
37
+ ## Training details
38
+
39
+ - **Dataset**: FineWeb (`sample-10BT`, ~10 B tokens from CommonCrawl)
40
+ - **Tokeniser**: tiktoken `gpt2` encoding (vocab = 50 257)
41
+ - **Optimiser**: AdamW (β₁=0.9, β₂=0.95, weight decay=0.1)
42
+ - **Schedule**: Cosine decay with linear warm-up
43
+ - **Precision**: bfloat16 mixed precision
44
+
45
+ ## How to use
46
+
47
+ ```python
48
+ import torch
49
+ from huggingface_hub import hf_hub_download
50
+ from safetensors.torch import load_file
51
+
52
+ # Install: pip install huggingface_hub safetensors tiktoken torch
53
+
54
+ # Download model weights
55
+ weights_path = hf_hub_download("Rzoro/erebus-tiny", "model.safetensors")
56
+ config_path = hf_hub_download("Rzoro/erebus-tiny", "config.json")
57
+
58
+ import json
59
+ with open(config_path) as f:
60
+ cfg_dict = json.load(f)
61
+
62
+ # Build the model (requires erebus repo on your Python path)
63
+ import sys; sys.path.insert(0, "/path/to/erebus")
64
+ from model import ErebusConfig, Erebus
65
+
66
+ config = ErebusConfig(**cfg_dict)
67
+ model = Erebus(config)
68
+ model.load_state_dict(load_file(weights_path))
69
+ model.eval()
70
+
71
+ # Generate text
72
+ import tiktoken
73
+ enc = tiktoken.get_encoding("gpt2")
74
+ prompt = "The foundation of artificial intelligence is"
75
+ input_ids = torch.tensor([enc.encode(prompt)], dtype=torch.long)
76
+ output = model.generate(input_ids, max_new_tokens=100, temperature=0.8)
77
+ print(enc.decode(output[0].tolist()))
78
+ ```
79
+
80
+ ## Fine-tuning
81
+
82
+ Because weights are in standard PyTorch format and the architecture is a
83
+ plain decoder-only transformer, you can fine-tune with:
84
+
85
+ - **Full fine-tuning**: load weights and train as usual (small model fits on one GPU)
86
+ - **LoRA / QLoRA**: apply PEFT adapters for parameter-efficient fine-tuning
87
+ - **Instruction tuning**: format data with a `### Instruction:` / `### Response:` template
88
+
89
+ ## License
90
+
91
+ [MIT](LICENSE)
config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "vocab_size": 50257,
3
+ "d_model": 256,
4
+ "n_heads": 4,
5
+ "n_layers": 6,
6
+ "d_ff": 1024,
7
+ "max_seq_len": 512,
8
+ "dropout": 0.1
9
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:749e40a94472b80ac627b25a3347cf9c669bb9edb4d4ea865f31c3923dacc45d
3
+ size 76647648
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "encoding": "gpt2"
3
+ }