Alpha v0 Historic β€” 97M GPT Chat Model

A 97M parameter GPT model trained from scratch using the Alpha framework with the Helios Vulkan compute backend. Trained on dialogue data for chat-style text generation.

Model Details

Property Value
Parameters ~97.7M
Architecture GPT with GELU FFN
Dimensions 768
Layers 12
Heads 12
Context Length 512 tokens
Vocabulary 8000 (BPE-8k-chat)
Activation gelu
Weight Tying No (separate wte and lmHead)

Training

  • Data: super_chat.txt (94MB mixed dialogue corpus)
  • Tokens: ~24.8M tokens (BPE-8k-chat)
  • Steps: 2,500 (best quality checkpoint)
  • Val Loss: 5.17
  • LR: 3e-4 β†’ 3e-5 (cosine decay with 500 step warmup)
  • Batch: 8 Γ— 4 accumulation steps = effective batch 32
  • Gradient Clipping: 1.0

Training Infrastructure

  • Framework: Alpha (custom TypeScript ML framework)
  • Hardware: NVIDIA L4 (24GB), GCP
  • Backend: Helios (custom Vulkan compute shaders)
  • Precision: FP32 (fp16 disabled for stability at this scale)

Files

  • checkpoint-gpt2-97m-2500.alph β€” Model checkpoint (ALPH binary format)
  • checkpoint-17000.bin β€” Legacy 34M model checkpoint (deprecated)

Format

Custom ALPH binary format: 4-byte magic "ALPH" + header + float32 tensors. Load with the Alpha inference engine:

git clone https://github.com/thomasdavis/alpha
cd alpha && npm install && npm run build

node apps/cli/dist/main.js sample \
  --checkpoint=checkpoint-gpt2-97m-2500.alph \
  --backend=cpu_ref --steps=200 --temp=0.8

Tokenizer

BPE with 8000 tokens (chat variant). Tokenizer artifacts are embedded in the checkpoint.

Special tokens: <|end_of_text|>, <|assistant|>, <|user|>

Chat Format

<|user|> What is the meaning of life? <|assistant|>

Limitations

  • 97M parameters β€” coherent but limited reasoning capacity
  • Trained on synthetic/curated dialogue data β€” may produce philosophical-style responses
  • BPE-8k vocabulary has some fragmentation on common words
  • Not suitable for production use β€” research/educational purposes

License

MIT

Downloads last month
40
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Space using ajaxdavis/alpha-v0-historic 1