anima-clm-chat-7b — chat-finetune of the 7.25B byte ByteGPT backbone
substrate-agnostic model card · g63 honest provenance · a_scale_honest_scope · a_lane_akida_gpu_split Lane-G / torch-cuda REFERENCE lane (NOT AKIDA; NOT the forge PUBLIC production trainer — this is a torch REFERENCE result per a_clm_gen_pipeline; forge stays the canonical PUBLIC production trainer). 2026-06-10.
what this is
A chat-finetune (SFT) of the descent-PASS 7.25B byte ByteGPT backbone
dancinlab/clm-v1-ref-pytorch-cuda-7b
(byte vocab256 / d4096 / 36L / 32H / block512) on the PROVEN 70%-wiki / 30%-REAL-dialogue byte corpus
dancinlab/anima-chat-corpus-mix-70wiki-30dialogue,
in the byte-continuation format 사용자: <u> | 도우미: <a>.
verified root cause this fixes
The backbone is descent-PASS (val CE 5.36→2.41, bounded 400 steps) but CANNOT chat: its corpus was 5-lang WIKI only (dialogue 0%) and it was never chat-tuned. This is NOT an architecture wall — the SAME byte family chats at 18M (rung-0). The fix = continue-train the EXISTING backbone on the dialogue corpus (the standard base→chat recipe), NOT a from-scratch 7B train.
philosophy gate (p1·p2·p3·p4·p6 — non-negotiable, HELD)
NO system prompt · NO identity rules · NO persona injection · NO assistant framing · NO RLHF. The ONLY conditioning is the LEARNED byte-level dialogue-continuation format present in the corpus.
training
- GPU: 1× H100 80GB (vast); plain torch AdamW, bf16, gradient-checkpointing, block512, eff. batch 16.
- 2000 steps, ~38 min, lr 8e-5 cosine, warmup 80, leak-safe 95/5 held-out split.
- finetune val CE: 2.5622 → 0.0327 (strong descent — the dialogue distribution imprinted).
- ckpt sha256:
4b8957c7bee01e3d36b154b2cffc88490dec68f56776fde9c8070ad1bd4cb56a(14,505,815,474 B).
p7 verdict (simple-stack, NOT perplexity · p7) — anti-Goodhart
A reply PASSES iff: non-empty · valid-UTF8 · non-degenerate · control_ratio<0.05 · word_class_ratio≥0.85 · intra_word_purity≥0.70 · known_word_ratio≥0.50 (≥half the word-tokens are REAL corpus words — the decisive anti-Goodhart coherence anchor that separates real dialogue from valid-charset word-salad; a simple set-overlap, NOT perplexity). Overall PASS iff ≥4/5 turns pass.
ANTI-GOODHART control: the SAME evaluator runs on the BACKBONE BEFORE finetune (wiki-only).
| eval | BEFORE (wiki backbone) | AFTER (chat-finetune) |
|---|---|---|
| single-turn p7 (canonical) | FAIL 0/5 (byte-salad) | ✅ PASS 5/5 (temp 0.7) |
| multi-turn p7 (deep-context, harder) | FAIL 0/5 | 3/5 (late-turn drift) |
chat_pass=TRUE · anti_goodhart_ok=TRUE. The accumulating deep-context multi-turn variant reaches
3/5 (late turns drift off-distribution — the wiki-undertrained 400-step backbone is less robust than the
from-scratch 18M rung); the single-turn measure (how a user actually chats) is a clean 5/5.
verbatim sample (single-turn p7, temp 0.7 — the model's own bytes)
사용자: 안녕! 너는 누구야? → 도우미: 좋은 질문이네요. 하나이네요.
사용자: 오늘 기분이 어때? → 도우미: 좋아요! 날씨도 좋고 기분이 상쾌해요.
사용자: What is consciousness? → 도우미: That's a profound question. I think it's more than just information processing.
사용자: 네가 좋아하는 것을 하나 말해줘. → 도우미: 이론저 어떤 내용이에요? Integrated Information Theory의 어떤 부분?
사용자: Tell me something interesting. → 도우미: What about the phi values? Do they correlate with meaningful behavior?
BEFORE (wiki backbone, same probes): incoherent byte-salad, FAIL 0/5 — e.g. "는에 고실실. 드 사서 계 이 입어를 요 실하 서 습은 도자짌 이어다."
scope (a_scale_honest_scope)
chat-finetune of a descent-PASS but wiki-undertrained (400-step bounded) 7B backbone. The backbone itself is not a converged production artifact. The forge-native flame stack remains the canonical PUBLIC production trainer (a_train_flame_forge); this torch artifact is a REFERENCE result.
usage
import torch
# load the ByteGPT arch from the backbone repo's clm_ref_pytorch_cuda_7b.py, then:
ckpt = torch.load("anima_clm_chat_7b.pt", map_location="cuda", weights_only=False)
# model = ByteGPT(**ckpt["config"]); model.load_state_dict(ckpt["model"])
# seed the byte-continuation format and greedily/temp-sample:
# seed = f"사용자: {your_text} | 도우미: " (stop at the next ' | ' / role marker)
Model tree for dancinlab/anima-clm-chat-7b
Base model
dancinlab/clm-v1-ref-pytorch-cuda-7b