Alpha v0 Historic β 97M GPT Chat Model
A 97M parameter GPT model trained from scratch using the Alpha framework with the Helios Vulkan compute backend. Trained on dialogue data for chat-style text generation.
Model Details
| Property | Value |
|---|---|
| Parameters | ~97.7M |
| Architecture | GPT with GELU FFN |
| Dimensions | 768 |
| Layers | 12 |
| Heads | 12 |
| Context Length | 512 tokens |
| Vocabulary | 8000 (BPE-8k-chat) |
| Activation | gelu |
| Weight Tying | No (separate wte and lmHead) |
Training
- Data: super_chat.txt (94MB mixed dialogue corpus)
- Tokens: ~24.8M tokens (BPE-8k-chat)
- Steps: 2,500 (best quality checkpoint)
- Val Loss: 5.17
- LR: 3e-4 β 3e-5 (cosine decay with 500 step warmup)
- Batch: 8 Γ 4 accumulation steps = effective batch 32
- Gradient Clipping: 1.0
Training Infrastructure
- Framework: Alpha (custom TypeScript ML framework)
- Hardware: NVIDIA L4 (24GB), GCP
- Backend: Helios (custom Vulkan compute shaders)
- Precision: FP32 (fp16 disabled for stability at this scale)
Files
checkpoint-gpt2-97m-2500.alphβ Model checkpoint (ALPH binary format)checkpoint-17000.binβ Legacy 34M model checkpoint (deprecated)
Format
Custom ALPH binary format: 4-byte magic "ALPH" + header + float32 tensors. Load with the Alpha inference engine:
git clone https://github.com/thomasdavis/alpha
cd alpha && npm install && npm run build
node apps/cli/dist/main.js sample \
--checkpoint=checkpoint-gpt2-97m-2500.alph \
--backend=cpu_ref --steps=200 --temp=0.8
Tokenizer
BPE with 8000 tokens (chat variant). Tokenizer artifacts are embedded in the checkpoint.
Special tokens: <|end_of_text|>, <|assistant|>, <|user|>
Chat Format
<|user|> What is the meaning of life? <|assistant|>
Limitations
- 97M parameters β coherent but limited reasoning capacity
- Trained on synthetic/curated dialogue data β may produce philosophical-style responses
- BPE-8k vocabulary has some fragmentation on common words
- Not suitable for production use β research/educational purposes
License
MIT
- Downloads last month
- 40