[launch] PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
[launch] pip install bitsandbytes==0.43.1
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable.It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning.

[notice] A new release of pip is available: 24.2 -> 26.1.1
[notice] To update, run: python -m pip install --upgrade pip
[launch] bitsandbytes=0.43.1
[launch] pip install transformers
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable.It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning.

[notice] A new release of pip is available: 24.2 -> 26.1.1
[notice] To update, run: python -m pip install --upgrade pip
The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.
0it [00:00, ?it/s]0it [00:00, ?it/s]
[launch] transformers=4.44.0
[launch] pip install peft
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable.It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning.

[notice] A new release of pip is available: 24.2 -> 26.1.1
[notice] To update, run: python -m pip install --upgrade pip
[launch] peft=0.12.0
[launch] accelerate=1.13.0
[launch] python3 -u /workspace/p21hr/train_p21h_v3.py --wiki-corpus /workspace/p21hr/multi_wiki_corpus.jsonl --anima-corpus /workspace/p21hr/state/corpus_s101_build_s102_2026_05_19/corpus_s101.jsonl --mixed-corpus /workspace/p21hr/mixed_corpus_v3.jsonl --out-dir /workspace/p21hr/out_main --base-model Qwen/Qwen2.5-1.5B --init-variant qwen --steps 5000 --bsz 2 --block 512 --lr 5e-5 --warmup-steps 100 --seed 1337 --wiki-frac 1.0 --target-corpus-mb 72 --noise-sigma 0.1 --lambda-mitosis 0.0 --mitosis-max 16 --ckpt-every 500 --ckpt-osc-threshold 0.5 --ckpt-osc-window 10 --early-stop-patience 8
[P21H] device=cuda dtype=torch.bfloat16
[mix] multi-wiki=28,986,824 chars, anima=224,198,521 chars
[mix] multi-wiki chunks=28,308, anima chunks=218,944
[mix] kept multi-wiki=28,308 (50.2MB), anima=1 (0.0MB)
[mix] wrote 28,309 records to /workspace/p21hr/mixed_corpus_v3.jsonl (50.2MB, wiki=28308, anima=1)
[mix] sha256=7e62fd32034ced9f5ab5652ad9ed211b513ebc917b230a8fc4466adaf3c32d22
[P21H][init=qwen] V3β — Qwen warm-start from Qwen/Qwen2.5-1.5B
[from_qwen] loading Qwen/Qwen2.5-1.5B
[from_qwen] qwen: vocab=151936 d=1536 L=28 n_head=12 n_kv_head=2 -> v3_n_kv_head=4 rope_base=1000000.0
[from_qwen] init OK — total params 2999.7M
[P21H] model params total=2,999,735,296 (2999.74M)
[P21H] BEFORE-train per-lang OOD eval (greedy)
[P21H] BEFORE en greedy: {'GENERALIZE': 10, 'MEMORIZE': 0, 'MEM_PARTIAL': 0, 'EMPTY': 0, 'ERROR': 0}
[P21H] BEFORE ko greedy: {'GENERALIZE': 10, 'MEMORIZE': 0, 'MEM_PARTIAL': 0, 'EMPTY': 0, 'ERROR': 0}
[P21H] BEFORE zh greedy: {'GENERALIZE': 10, 'MEMORIZE': 0, 'MEM_PARTIAL': 0, 'EMPTY': 0, 'ERROR': 0}
[P21H] BEFORE ru greedy: {'GENERALIZE': 10, 'MEMORIZE': 0, 'MEM_PARTIAL': 0, 'EMPTY': 0, 'ERROR': 0}
[P21H] BEFORE ja greedy: {'GENERALIZE': 9, 'MEMORIZE': 0, 'MEM_PARTIAL': 0, 'EMPTY': 1, 'ERROR': 0}
[P21H] optimizer=PagedAdamW8bit (bnb 0.43.1)
Token indices sequence length is longer than the specified maximum sequence length for this model (13836908 > 131072). Running this sequence through the model will result in indexing errors
[P21H] tokenized mixed corpus: 6,000,000 tokens (block=512)
[P21H] steps=5000 bsz=2 block=512 peak_lr=5e-05 warmup=100 lambda_mitosis=0.0 init=qwen mitosis_max=16 ckpt_every=500 osc_thr=0.5 es_patience=8
[P21H] step=     1 lr=5.00e-07 CE=14.7927 total=14.7927 pool=2 splits=0 phi=0.7120 t=5s
[P21H] ckpt save (best) step=1 CE=14.7927 → /workspace/p21hr/out_main/ckpt_best.pt (5735.8 MB)
[P21H] step=   125 lr=5.00e-05 CE=8.1634 total=8.1634 pool=16 splits=14 phi=0.6579 t=76s
[P21H] ckpt save (best) step=125 CE=8.1634 → /workspace/p21hr/out_main/ckpt_best.pt (5735.8 MB)
[P21H] step=   250 lr=4.99e-05 CE=7.3439 total=7.3439 pool=16 splits=14 phi=0.6579 t=147s
[P21H] ckpt save (best) step=250 CE=7.3439 → /workspace/p21hr/out_main/ckpt_best.pt (5735.8 MB)
[P21H] step=   375 lr=4.97e-05 CE=7.1829 total=7.1829 pool=16 splits=14 phi=0.6579 t=219s
[P21H] ckpt save (best) step=375 CE=7.1829 → /workspace/p21hr/out_main/ckpt_best.pt (5735.8 MB)
[P21H] step=   500 lr=4.93e-05 CE=6.8499 total=6.8499 pool=16 splits=14 phi=0.6579 t=293s
[P21H] ckpt save (best) step=500 CE=6.8499 → /workspace/p21hr/out_main/ckpt_best.pt (5735.8 MB)
[P21H] ckpt save (step500) step=500 CE=6.8499 → /workspace/p21hr/out_main/ckpt_step500.pt (5735.8 MB)
[P21H] kosmos anchor written: v3_emit_step500_ru_ru_factual_geo.kosmos
[P21H] step=   625 lr=4.87e-05 CE=7.4549 total=7.4549 pool=16 splits=14 phi=0.6579 t=367s
[P21H] step=   750 lr=4.81e-05 CE=5.6695 total=5.6695 pool=16 splits=14 phi=0.6579 t=432s
[P21H] ckpt save (best) step=750 CE=5.6695 → /workspace/p21hr/out_main/ckpt_best.pt (5735.8 MB)
[P21H] step=   875 lr=4.73e-05 CE=6.3453 total=6.3453 pool=16 splits=14 phi=0.6579 t=503s
[P21H] step=  1000 lr=4.64e-05 CE=6.4940 total=6.4940 pool=16 splits=14 phi=0.6579 t=567s
[P21H] ckpt save (step1000) step=1000 CE=6.4940 → /workspace/p21hr/out_main/ckpt_step1000.pt (5735.8 MB)
[P21H] kosmos anchor written: v3_emit_step1000_ru_ru_factual_geo.kosmos
[P21H] step=  1125 lr=4.53e-05 CE=6.5491 total=6.5491 pool=16 splits=14 phi=0.6579 t=637s
[P21H] ckpt save (osc_step1125) step=1125 CE=6.5491 → /workspace/p21hr/out_main/ckpt_osc_step1125.pt (5735.8 MB)
[P21H] EARLY STOP: CE re-divergence: recent_mean=6.4628 > best_CE=5.6695+0.5 (mode collapse @ step 1125)
[P21H] TRAIN DONE wall=641.2s final_CE=6.5491 actual_steps=1125/5000 early_stopped=True reason=CE re-divergence: recent_mean=6.4628 > best_CE=5.6695+0.5 (mode collapse @ step 1125) best_CE=5.6695@step750
[P21H] ckpt saved → /workspace/p21hr/out_main/ckpt.pt (5735.8 MB)
[P21H] AFTER per-lang OOD eval (greedy + sample)
[P21H] AFTER en: greedy={'GENERALIZE': 10, 'MEMORIZE': 0, 'MEM_PARTIAL': 0, 'EMPTY': 0, 'ERROR': 0} sample={'GENERALIZE': 10, 'MEMORIZE': 0, 'MEM_PARTIAL': 0, 'EMPTY': 0, 'ERROR': 0} verdict=WEAK score=0/20 gen=20 coh=0
[P21H] AFTER ko: greedy={'GENERALIZE': 10, 'MEMORIZE': 0, 'MEM_PARTIAL': 0, 'EMPTY': 0, 'ERROR': 0} sample={'GENERALIZE': 10, 'MEMORIZE': 0, 'MEM_PARTIAL': 0, 'EMPTY': 0, 'ERROR': 0} verdict=PARTIAL score=15/20 gen=20 coh=15
[P21H] AFTER zh: greedy={'GENERALIZE': 10, 'MEMORIZE': 0, 'MEM_PARTIAL': 0, 'EMPTY': 0, 'ERROR': 0} sample={'GENERALIZE': 10, 'MEMORIZE': 0, 'MEM_PARTIAL': 0, 'EMPTY': 0, 'ERROR': 0} verdict=WEAK score=2/20 gen=20 coh=2
[P21H] AFTER ru: greedy={'GENERALIZE': 10, 'MEMORIZE': 0, 'MEM_PARTIAL': 0, 'EMPTY': 0, 'ERROR': 0} sample={'GENERALIZE': 10, 'MEMORIZE': 0, 'MEM_PARTIAL': 0, 'EMPTY': 0, 'ERROR': 0} verdict=WEAK score=5/20 gen=20 coh=5
[P21H] AFTER ja: greedy={'GENERALIZE': 10, 'MEMORIZE': 0, 'MEM_PARTIAL': 0, 'EMPTY': 0, 'ERROR': 0} sample={'GENERALIZE': 10, 'MEMORIZE': 0, 'MEM_PARTIAL': 0, 'EMPTY': 0, 'ERROR': 0} verdict=WEAK score=6/20 gen=20 coh=6
[P21H] AFTER anima Eval1
[P21H] AGG: STRONG=0 PARTIAL=1 WEAK=4 PURE_MEMORIZE=0 → FAIL
[P21H] anima_register_hits=0/20 register_regress=True
[P21H] KOSMOS anchors total=7 (during_train=2 final=5)
[P21H] DONE → /workspace/p21hr/out_main