FlashLM v6 "SUPERNOVA" — P-RCSM Architecture

4.1M parameter language model with novel P-RCSM (Parallel Reasoning via Compositional State Machine) architecture. 81% ternary weights {-1, 0, +1}. Trained entirely on CPU.

Model Details

Metric	Value
Parameters	4,141,862 (4.1M)
Ternary params	3,348,480 (80.8%)
Vocab size	4,096 (BPE)
d_model	192
Layers	6
Val PPL	14.0
Training speed	3,500 tok/s
Training time	~3 hours
Hardware	2-thread CPU (Deepnote free tier)
Model RAM	15.8 MB (float32)

Architecture

Embedding (4K × 192, weight-tied)
  → 6× SupernovaBlock:
      RMSNorm → GatedLinearMixer (ternary) + residual
      RMSNorm → P-RCSM (MultiScaleLinearBank + StateGate + SlotMemory) + residual
      RMSNorm → TernaryGLU (ternary SiLU gate/up/down) + residual
  → RMSNorm → Output Head (tied to embedding)

No attention. No convolution. Token mixing via GatedLinearMixer (shifted linear projections). Reasoning via P-RCSM (multi-scale routing, hierarchical planner-executor gating, slot memory). All ops are F.linear (BitLinear ternary) and element-wise.

Usage

import torch
import torch.nn.functional as F
from tokenizers import Tokenizer
from train import Config, FlashLMv6

config = Config()
model = FlashLMv6(config)
state = torch.load('best_model.pt', map_location='cpu')
model.load_state_dict(state['model'] if 'model' in state else state)
model.eval()

tokenizer = Tokenizer.from_file('tokenizer_v6.json')

prompt = "Once upon a time"
ids = tokenizer.encode(prompt).ids
x = torch.tensor([ids])

with torch.no_grad():
    for _ in range(100):
        logits = model(x[:, -128:])
        next_id = torch.multinomial(F.softmax(logits[:, -1] / 0.8, dim=-1), 1)
        x = torch.cat([x, next_id], dim=1)

print(tokenizer.decode(x[0].tolist()))

Sample Output

Once upon a time, there was a cute little girl named Lily. She loved to play with her toys and watch movies with her. One day, her mommy told her to help her fix her toy.

changcheng967
/

flashlm-v6-supernova

FlashLM v6 "SUPERNOVA" — P-RCSM Architecture

Model Details

Architecture

Usage

Sample Output

Links

Dataset used to train changcheng967/flashlm-v6-supernova