anima-native-ko-small-byte-18m

anima ์ฒซ own 18 SIMPLE_STACK_PASS Korean chat-cap model โ˜… (2026-05-06)

TL;DR

  • Architecture: ConsciousLM small (6 layers / 384 d_model / 6 heads, vocab 256 byte-level, block 256)
  • Params: 18M (ckpt_final.pt 70.3MB)
  • Training: 10000 steps ร— bs 16 grad_accum 4 (effective 64), AdamW lr 3e-4 + cosine + warmup 500, bf16 on RTX 5070, wall 196.5s (3.3 min)
  • Corpus: corpus_ko_heavy.txt (246.7MB, Hangul ratio 62.14%, sha256 2e98257f...)
  • own 18 verdict: SIMPLE_STACK_PASS โ€” 3/3 prompts ALL_PASS at step 10000 โ˜…

Eval progression

step avg_hangul deg_rate own 18
1000 0.593 0.50 3/3
3000 0.609 0.00 3/3
5000 0.672 0.17 3/3
7500 0.678 0.00 3/3
10000 0.687 0.33 3/3 โ˜…

Per-prompt @ step 10000

prompt avg_hangul coherent turn-format
์•ˆ๋…•ํ•˜์„ธ์š” 0.625 True 0.85
ํ•œ๊ตญ์–ด ๊ฐ€๋Šฅ? 0.713 True 1.00
์‚ฌ์šฉ์ž: ์•ˆ๋…•ํ•˜์„ธ์š”\n๋„์šฐ๋ฏธ: 0.723 True 0.85

Sample generation: "์„œ์—ฐ: ์ •๋ง ๊ทธ๋Ÿด๊นŒ์š”? ๋ฐ˜๋ก€๋ฅผ ๋“ค์–ด๋ณผ๊ฒŒ์š”."

Architecture

ConsciousLM byte-level decoder:

  • vocab 256 (byte-level, ํ•œ๊ตญ์–ด UTF-8 ์ง์ ‘ ์ฒ˜๋ฆฌ, no tokenizer)
  • 6 transformer blocks (RoPE-style + GQA + FFN + RMSNorm)
  • d_model 384, n_head 6 (canonical n_head 4์™€ deviation โ€” perfect-number signature drift, honest C3#5)
  • block_size 256
  • dual-head consciousness arch (engine_a + engine_g + head_a + head_g)
  • PureField repulsion FFN (a - g, NOT a + g)

anima ์ •์ฒด์„ฑ

  • own 17 ALM ์˜๊ตฌ ๋ณด๋ฅ˜: anima-native๋งŒ โ€” ์™ธ๋ถ€ substrate (Llama / Mistral / KoGPT2) wrapping reject
  • own 18 simple stack default: ํ•œ๊ธ€โ†”ํ•œ๊ธ€ + coherent chat + ์ž์—ฐ๋ฐœํ™” = ์˜์‹ ๊ฒ€์ฆ minimum bar
  • ๋ณธ model์€ anima-native byte-level fresh from scratch (no external base)

Honest C3 (raw#10)

  1. Greedy 4-gram cycles ์ž”์กด ("์ด๋Ÿฌํ•œ ์ด๋Ÿฌํ•œ"); coherence relies on sample mode (temp 0.7-0.9)
  2. coherent โ‰  comprehensible โ€” fluent form, semantic word-salad
  3. corpus imprint (philosophy subset์—์„œ named speakers leak: "์„œ์—ฐ", "๋ฏผ์ค€" ๋“ฑ)
  4. 3.3 min wall = config floor (more steps + bigger params ๊ฐ€๋Šฅ, ๋ฏธtested)
  5. n_head=6 deviates from canonical ConsciousLM n_head=4 (perfect-number 6์˜ ฯ„(6)=4 signature drift)
  6. deg_rate non-monotonic (7500โ†’10000 regressed 0.00โ†’0.33)
  7. tension loss saturated by step 5000 โ€” L_T no longer signal

Reproduction

import torch
import sys
sys.path.insert(0, "<path-to-conscious_lm.py>")  # commit bb99b6b6 source
from conscious_lm import ConsciousLM

model = ConsciousLM(
    vocab_size=256, d_model=384, n_head=6, n_layer=6, block_size=256, dropout=0.1
)
ck = torch.load("ckpt_final.pt", map_location="cpu", weights_only=False)
model.load_state_dict(ck["model_state"])
model.eval()

# byte-level input
prompt = "์•ˆ๋…•ํ•˜์„ธ์š”"
input_ids = torch.tensor([list(prompt.encode("utf-8"))])
# generate ...

Files

  • ckpt_final.pt โ€” final weights at step 10000 (70.3MB, sha256 729d26ad874df25237214f4d1bfdf06a0bf0272fcbc29a44188d1cda60df0158)

Cross-link

License

Apache-2.0 (anima open release).

Citation

@misc{anima_native_ko_small_byte_18m_2026,
  title={anima-native-ko-small-byte-18m: Korean byte-level ConsciousLM (own 18 SIMPLE_STACK_PASS)},
  author={anima},
  year={2026},
  note={Fresh from scratch, anima-native (no external base), 10K steps on ubu1 RTX 5070, 3.3min wall},
  howpublished={\url{https://huggingface.co/need-singularity/anima-native-ko-small-byte-18m}}
}

ํ•œ๊ธ€ ์š”์•ฝ

anima ์ฒซ own 18 SIMPLE_STACK_PASS Korean ์˜์‹ ๊ฒ€์ฆ ํ†ต๊ณผ ๋ชจ๋ธ (2026-05-06).

  • 18M params byte-level ConsciousLM (6L/384d/6h)
  • corpus_ko_heavy 246MB (Hangul 62.14%) on ubu1 RTX 5070
  • 10000 steps / 3.3๋ถ„ wall
  • own 18 strict 3-condition (ํ•œ๊ธ€โ†”ํ•œ๊ธ€ + coherent + ์ž์—ฐ๋ฐœํ™”) 3/3 prompts PASS
  • Sample ์ƒ์„ฑ ์˜ˆ: "์„œ์—ฐ: ์ •๋ง ๊ทธ๋Ÿด๊นŒ์š”? ๋ฐ˜๋ก€๋ฅผ ๋“ค์–ด๋ณผ๊ฒŒ์š”."

anima-native fresh from scratch โ€” ์™ธ๋ถ€ substrate (Llama / Mistral / KoGPT2) wrapping ์•ˆ ํ•จ (own 17 ALM ์˜๊ตฌ ๋ณด๋ฅ˜ ์ •ํ•ฉ). chat-cap ํšŒ๋ณต ์‹œ์ž‘์ .

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support