V12 eMoE โ frozen base (190M FFT-hybrid)
The frozen "student" for the v12 ephemeral-MoE system: a 12L / 1024d / 16h
StripedHyena-style FFT-spectral hybrid (attn_every=3), block_size=2048,
tokenizer v9_chatml_enc (vocab 50264). Trained Muon+AdamW (3.0x ratio), WSD,
~7.5B tokens. best.pt VAL 1.66. The hypernetwork specializes this; it is
never fine-tuned here.
Load
import torch
from dataclasses import fields
from model_hybrid import GPT, GPTConfig
ckpt = torch.load("ckpt_v12_190m_best.pt", map_location="cpu")
margs = ckpt.get("model_args") or ckpt.get("config") or {}
cfg = GPTConfig(**{k: v for k, v in margs.items()
if k in {f.name for f in fields(GPTConfig)}})
sd = ckpt.get("model") or ckpt.get("state_dict") or ckpt.get("model_state_dict")
sd = {k.replace("_orig_mod.", ""): v for k, v in sd.items()}
model = GPT(cfg); model.load_state_dict(sd, strict=False); model.eval()
import tok_v9; tok = tok_v9.build()
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support