FlashLM v5.2 "Nova-Ignition"

5.0M parameter language model designed for 2-CPU/5GB RAM environments. Trained for 2 hours on free-tier cloud CPU. No GPU β€” not for training, not for inference.

Model Details

  • Architecture: Standard Transformer with Rotary Positional Embeddings (RoPE)
  • Parameters: ~5.0M
  • Vocab Size: 4,096 (BPE)
  • Context Length: 128 tokens
  • d_model: 256
  • Layers: 6
  • Attention Heads: 4
  • FFN Hidden: 512
  • Activation: GELU
  • Weight Tying: Yes (embedding ↔ head)

Architecture

Embedding (4K Γ— 256, float, weight-tied)
  β†’ 6 Γ— NovaBlock:
      LayerNorm β†’ MultiHeadAttention (RoPE) + residual
      LayerNorm β†’ FFN (GELU, 256β†’512β†’256) + residual
  β†’ LayerNorm β†’ Output Head (tied to embedding)

Training

  • Dataset: TinyStories V2 (validation split)
  • Training Time: 2 hours
  • Hardware: Free-tier cloud CPU (2 threads, 5GB RAM)
  • Speed: ~3,500 tokens/sec

Benchmark Results

Model Params BPC PPL Hardware
FlashLM v5.2 5.0M 0.78 10.56 2-thread CPU
FlashLM v4 "Bolt" 4.3M 0.88 15.05 2-thread CPU
TinyStories-1M 3.7M 0.62 6.72 V100 GPU

v5.2 beats v4 by 11% relative in BPC with the same training time (2 hours)!

Usage

import torch
from tokenizers import Tokenizer
import torch.nn as nn
import torch.nn.functional as F

# Load tokenizer
tokenizer = Tokenizer.from_file("tokenizer.json")

# Load model (requires architecture definition - see model.py)
model = NovaIgnitionLM(vocab=4096, d_model=256, n_layers=6, 
                       n_heads=4, d_head=64, d_ffn=512)
model.load_state_dict(torch.load("best.pt", weights_only=True))

# Generate
prompt = "Once upon a time"
ids = tokenizer.encode(prompt).ids
x = torch.tensor([ids])
out = model.generate(x, max_new_tokens=80, temperature=0.8, top_k=40)
text = tokenizer.decode(out[0].tolist())
print(text)

Files

  • best.pt - Best model checkpoint
  • latest.pt - Latest checkpoint
  • config.json - Training configuration

Limitations

  • Small context window (128 tokens)
  • Trained on limited data (~20M tokens)
  • Not suitable for complex reasoning tasks

License

MIT

Citation

@misc{flashlm-v52,
  author = {Chang Cheng},
  title = {FlashLM v5.2 Nova-Ignition},
  year = {2026},
  url = {https://github.com/changcheng967/FlashLM}
}
Downloads last month
34
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train changcheng967/flashlm-v5.2-nova-ignition

Space using changcheng967/flashlm-v5.2-nova-ignition 1