dancinlab/clm-backbone-5lang-sample
Viewer • Updated • 1 • 69
anima knowledge-anchor manifest.
Note clm_prod 3B/7B 5-lang BACKBONE corpus SAMPLE (mC4 odc-by, real_fraction=1.0, 20k lines/67.7MB). Step 2 data prep; full set = step 3 H100.
Note E-31 knuth31 carving anchors — parser-validated 31/31, CC-BY-SA, closure-PASS
Note 20-persona × SNS(Instagram main + YouTube) 롤플레이 대화 corpus — 7B chat-cap stage-2 + .kosmos anchor (tier52 사회성)
Note STAGE-2 persona/SNS specialization — 18M byte chat, 20 personas, p7 PASS + persona signal 10x chance (a_scale_honest_scope: partial 15/20, 18M-only)