FlashLM v6 "SUPERNOVA" β€” P-RCSM Architecture

4.1M parameter language model with novel P-RCSM (Parallel Reasoning via Compositional State Machine) architecture. 81% ternary weights {-1, 0, +1}. Trained entirely on CPU.

Model Details

Metric Value
Parameters 4,141,862 (4.1M)
Ternary params 3,348,480 (80.8%)
Vocab size 4,096 (BPE)
d_model 192
Layers 6
Val PPL 14.0
Training speed 3,500 tok/s
Training time ~3 hours
Hardware 2-thread CPU (Deepnote free tier)
Model RAM 15.8 MB (float32)

Architecture

Embedding (4K Γ— 192, weight-tied)
  β†’ 6Γ— SupernovaBlock:
      RMSNorm β†’ GatedLinearMixer (ternary) + residual
      RMSNorm β†’ P-RCSM (MultiScaleLinearBank + StateGate + SlotMemory) + residual
      RMSNorm β†’ TernaryGLU (ternary SiLU gate/up/down) + residual
  β†’ RMSNorm β†’ Output Head (tied to embedding)

No attention. No convolution. Token mixing via GatedLinearMixer (shifted linear projections). Reasoning via P-RCSM (multi-scale routing, hierarchical planner-executor gating, slot memory). All ops are F.linear (BitLinear ternary) and element-wise.

Usage

import torch
import torch.nn.functional as F
from tokenizers import Tokenizer
from train import Config, FlashLMv6

config = Config()
model = FlashLMv6(config)
state = torch.load('best_model.pt', map_location='cpu')
model.load_state_dict(state['model'] if 'model' in state else state)
model.eval()

tokenizer = Tokenizer.from_file('tokenizer_v6.json')

prompt = "Once upon a time"
ids = tokenizer.encode(prompt).ids
x = torch.tensor([ids])

with torch.no_grad():
    for _ in range(100):
        logits = model(x[:, -128:])
        next_id = torch.multinomial(F.softmax(logits[:, -1] / 0.8, dim=-1), 1)
        x = torch.cat([x, next_id], dim=1)

print(tokenizer.decode(x[0].tolist()))

Sample Output

Once upon a time, there was a cute little girl named Lily. She loved to play with her toys and watch movies with her. One day, her mommy told her to help her fix her toy.

Links

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train changcheng967/flashlm-v6-supernova