Rzoro
/

erebus-tiny

Text Generation

foundation-model

Model card Files Files and versions

erebus-tiny / README.md

Rzoro's picture

Add Erebus foundation model weights

45493f0 verified 9 days ago

|

history blame contribute delete

2.51 kB

	---
	license: mit
	language:
	- en
	tags:
	- erebus
	- language-model
	- causal-lm
	- foundation-model
	- pytorch
	pipeline_tag: text-generation
	---

	# Erebus Tiny

	Erebus Tiny is a decoder-only causal language model (~19M parameters)
	trained from scratch as part of the [Erebus](https://github.com/m-np/erebus)
	foundation-model project.

	## Model architecture

	\| Attribute \| Value \|
	\|----------------\|-------\|
	\| Architecture \| Decoder-only Transformer (GPT-style) \|
	\| Parameters \| ~19M \|
	\| `d_model` \| 256 \|
	\| `n_heads` \| 4 \|
	\| `n_layers` \| 6 \|
	\| `d_ff` \| 1024 \|
	\| `max_seq_len` \| 512 \|
	\| Vocabulary \| 50,257 (GPT-2 BPE) \|
	\| Positional enc \| RoPE \|
	\| FFN activation \| SwiGLU \|
	\| Normalisation \| RMSNorm (pre-norm) \|
	\| Training steps \| 10,000 \|

	## Training details

	- Dataset: FineWeb (`sample-10BT`, ~10 B tokens from CommonCrawl)
	- Tokeniser: tiktoken `gpt2` encoding (vocab = 50 257)
	- Optimiser: AdamW (β₁=0.9, β₂=0.95, weight decay=0.1)
	- Schedule: Cosine decay with linear warm-up
	- Precision: bfloat16 mixed precision

	## How to use

	```python
	import torch
	from huggingface_hub import hf_hub_download
	from safetensors.torch import load_file

	# Install: pip install huggingface_hub safetensors tiktoken torch

	# Download model weights
	weights_path = hf_hub_download("Rzoro/erebus-tiny", "model.safetensors")
	config_path = hf_hub_download("Rzoro/erebus-tiny", "config.json")

	import json
	with open(config_path) as f:
	cfg_dict = json.load(f)

	# Build the model (requires erebus repo on your Python path)
	import sys; sys.path.insert(0, "/path/to/erebus")
	from model import ErebusConfig, Erebus

	config = ErebusConfig(**cfg_dict)
	model = Erebus(config)
	model.load_state_dict(load_file(weights_path))
	model.eval()

	# Generate text
	import tiktoken
	enc = tiktoken.get_encoding("gpt2")
	prompt = "The foundation of artificial intelligence is"
	input_ids = torch.tensor([enc.encode(prompt)], dtype=torch.long)
	output = model.generate(input_ids, max_new_tokens=100, temperature=0.8)
	print(enc.decode(output[0].tolist()))
	```

	## Fine-tuning

	Because weights are in standard PyTorch format and the architecture is a
	plain decoder-only transformer, you can fine-tune with:

	- Full fine-tuning: load weights and train as usual (small model fits on one GPU)
	- LoRA / QLoRA: apply PEFT adapters for parameter-efficient fine-tuning
	- Instruction tuning: format data with a `### Instruction:` / `### Response:` template

	## License

	[MIT](LICENSE)