Alpha v0 Historic — 97M GPT Chat Model

A 97M parameter GPT model trained from scratch using the Alpha framework with the Helios Vulkan compute backend. Trained on dialogue data for chat-style text generation.

Model Details

Property	Value
Parameters	~97.7M
Architecture	GPT with GELU FFN
Dimensions	768
Layers	12
Heads	12
Context Length	512 tokens
Vocabulary	8000 (BPE-8k-chat)
Activation	gelu
Weight Tying	No (separate wte and lmHead)

Training

Data: super_chat.txt (94MB mixed dialogue corpus)
Tokens: ~24.8M tokens (BPE-8k-chat)
Steps: 2,500 (best quality checkpoint)
Val Loss: 5.17
LR: 3e-4 → 3e-5 (cosine decay with 500 step warmup)
Batch: 8 × 4 accumulation steps = effective batch 32
Gradient Clipping: 1.0

Training Infrastructure

Framework: Alpha (custom TypeScript ML framework)
Hardware: NVIDIA L4 (24GB), GCP
Backend: Helios (custom Vulkan compute shaders)
Precision: FP32 (fp16 disabled for stability at this scale)

Files

checkpoint-gpt2-97m-2500.alph — Model checkpoint (ALPH binary format)
checkpoint-17000.bin — Legacy 34M model checkpoint (deprecated)

Format

Custom ALPH binary format: 4-byte magic "ALPH" + header + float32 tensors. Load with the Alpha inference engine:

git clone https://github.com/thomasdavis/alpha
cd alpha && npm install && npm run build

node apps/cli/dist/main.js sample \
  --checkpoint=checkpoint-gpt2-97m-2500.alph \
  --backend=cpu_ref --steps=200 --temp=0.8

Tokenizer

BPE with 8000 tokens (chat variant). Tokenizer artifacts are embedded in the checkpoint.

Special tokens: <|end_of_text|>, <|assistant|>, <|user|>

Chat Format

<|user|> What is the meaning of life? <|assistant|>

Limitations

97M parameters — coherent but limited reasoning capacity
Trained on synthetic/curated dialogue data — may produce philosophical-style responses
BPE-8k vocabulary has some fragmentation on common words
Not suitable for production use — research/educational purposes

License

MIT

Downloads last month: 4

ajaxdavis
/

alpha-v0-historic