Add Erebus foundation model weights

Browse files

Files changed (4) hide show

README.md +91 -0
config.json +9 -0
model.safetensors +3 -0
tokenizer.json +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,91 @@

+---
+license: mit
+language:
+- en
+tags:
+- erebus
+- language-model
+- causal-lm
+- foundation-model
+- pytorch
+pipeline_tag: text-generation
+---
+# Erebus Tiny
+**Erebus Tiny** is a decoder-only causal language model (~19M parameters)
+trained from scratch as part of the [Erebus](https://github.com/m-np/erebus)
+foundation-model project.
+## Model architecture
+| Attribute      | Value |
+|----------------|-------|
+| Architecture   | Decoder-only Transformer (GPT-style) |
+| Parameters     | ~19M |
+| `d_model`      | 256 |
+| `n_heads`      | 4 |
+| `n_layers`     | 6 |
+| `d_ff`         | 1024 |
+| `max_seq_len`  | 512 |
+| Vocabulary     | 50,257 (GPT-2 BPE) |
+| Positional enc | RoPE |
+| FFN activation | SwiGLU |
+| Normalisation  | RMSNorm (pre-norm) |
+| Training steps | 10,000 |
+## Training details
+- **Dataset**: FineWeb (`sample-10BT`, ~10 B tokens from CommonCrawl)
+- **Tokeniser**: tiktoken `gpt2` encoding (vocab = 50 257)
+- **Optimiser**: AdamW (β₁=0.9, β₂=0.95, weight decay=0.1)
+- **Schedule**: Cosine decay with linear warm-up
+- **Precision**: bfloat16 mixed precision
+## How to use
+```python
+import torch
+from huggingface_hub import hf_hub_download
+from safetensors.torch import load_file
+# Install: pip install huggingface_hub safetensors tiktoken torch
+# Download model weights
+weights_path = hf_hub_download("Rzoro/erebus-tiny", "model.safetensors")
+config_path  = hf_hub_download("Rzoro/erebus-tiny", "config.json")
+import json
+with open(config_path) as f:
+    cfg_dict = json.load(f)
+# Build the model (requires erebus repo on your Python path)
+import sys; sys.path.insert(0, "/path/to/erebus")
+from model import ErebusConfig, Erebus
+config = ErebusConfig(**cfg_dict)
+model  = Erebus(config)
+model.load_state_dict(load_file(weights_path))
+model.eval()
+# Generate text
+import tiktoken
+enc = tiktoken.get_encoding("gpt2")
+prompt = "The foundation of artificial intelligence is"
+input_ids = torch.tensor([enc.encode(prompt)], dtype=torch.long)
+output = model.generate(input_ids, max_new_tokens=100, temperature=0.8)
+print(enc.decode(output[0].tolist()))
+```
+## Fine-tuning
+Because weights are in standard PyTorch format and the architecture is a
+plain decoder-only transformer, you can fine-tune with:
+- **Full fine-tuning**: load weights and train as usual (small model fits on one GPU)
+- **LoRA / QLoRA**: apply PEFT adapters for parameter-efficient fine-tuning
+- **Instruction tuning**: format data with a `### Instruction:` / `### Response:` template
+## License
+[MIT](LICENSE)

config.json ADDED Viewed

	@@ -0,0 +1,9 @@

+{
+  "vocab_size": 50257,
+  "d_model": 256,
+  "n_heads": 4,
+  "n_layers": 6,
+  "d_ff": 1024,
+  "max_seq_len": 512,
+  "dropout": 0.1
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:749e40a94472b80ac627b25a3347cf9c669bb9edb4d4ea865f31c3923dacc45d
+size 76647648

tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+{
+  "encoding": "gpt2"
+}