FlashLM v6 "SUPERNOVA" β P-RCSM Architecture
4.1M parameter language model with novel P-RCSM (Parallel Reasoning via Compositional State Machine) architecture. 81% ternary weights {-1, 0, +1}. Trained entirely on CPU.
Model Details
| Metric | Value |
|---|---|
| Parameters | 4,141,862 (4.1M) |
| Ternary params | 3,348,480 (80.8%) |
| Vocab size | 4,096 (BPE) |
| d_model | 192 |
| Layers | 6 |
| Val PPL | 14.0 |
| Training speed | 3,500 tok/s |
| Training time | ~3 hours |
| Hardware | 2-thread CPU (Deepnote free tier) |
| Model RAM | 15.8 MB (float32) |
Architecture
Embedding (4K Γ 192, weight-tied)
β 6Γ SupernovaBlock:
RMSNorm β GatedLinearMixer (ternary) + residual
RMSNorm β P-RCSM (MultiScaleLinearBank + StateGate + SlotMemory) + residual
RMSNorm β TernaryGLU (ternary SiLU gate/up/down) + residual
β RMSNorm β Output Head (tied to embedding)
No attention. No convolution. Token mixing via GatedLinearMixer (shifted linear projections). Reasoning via P-RCSM (multi-scale routing, hierarchical planner-executor gating, slot memory). All ops are F.linear (BitLinear ternary) and element-wise.
Usage
import torch
import torch.nn.functional as F
from tokenizers import Tokenizer
from train import Config, FlashLMv6
config = Config()
model = FlashLMv6(config)
state = torch.load('best_model.pt', map_location='cpu')
model.load_state_dict(state['model'] if 'model' in state else state)
model.eval()
tokenizer = Tokenizer.from_file('tokenizer_v6.json')
prompt = "Once upon a time"
ids = tokenizer.encode(prompt).ids
x = torch.tensor([ids])
with torch.no_grad():
for _ in range(100):
logits = model(x[:, -128:])
next_id = torch.multinomial(F.softmax(logits[:, -1] / 0.8, dim=-1), 1)
x = torch.cat([x, next_id], dim=1)
print(tokenizer.decode(x[0].tolist()))
Sample Output
Once upon a time, there was a cute little girl named Lily. She loved to play with her toys and watch movies with her. One day, her mommy told her to help her fix her toy.
Links
- GitHub: changcheng967/FlashLM
- v5 Thunderbolt: changcheng967/flashlm-v5-thunderbolt
- v4 Bolt: changcheng967/flashlm-v4-bolt
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support