[launch] PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True [launch] pip install bitsandbytes==0.43.1 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable.It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning. [notice] A new release of pip is available: 24.2 -> 26.1.1 [notice] To update, run: python -m pip install --upgrade pip [launch] bitsandbytes=0.43.1 [launch] pip install transformers WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable.It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning. [notice] A new release of pip is available: 24.2 -> 26.1.1 [notice] To update, run: python -m pip install --upgrade pip The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`. 0it [00:00, ?it/s] 0it [00:00, ?it/s] [launch] transformers=4.44.0 [launch] pip install peft WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable.It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning. [notice] A new release of pip is available: 24.2 -> 26.1.1 [notice] To update, run: python -m pip install --upgrade pip [launch] peft=0.12.0 [launch] accelerate=1.13.0 [launch] python3 -u /workspace/p21hr/train_p21h_v3.py --wiki-corpus /workspace/p21hr/multi_wiki_corpus.jsonl --anima-corpus /workspace/p21hr/state/corpus_s101_build_s102_2026_05_19/corpus_s101.jsonl --mixed-corpus /workspace/p21hr/mixed_corpus_v3.jsonl --out-dir /workspace/p21hr/out_main --base-model Qwen/Qwen2.5-1.5B --init-variant qwen --steps 5000 --bsz 2 --block 512 --lr 5e-5 --warmup-steps 100 --seed 1337 --wiki-frac 0.5 --target-corpus-mb 72 --noise-sigma 0.1 --lambda-mitosis 0.0 --mitosis-max 16 --ckpt-every 500 --ckpt-osc-threshold 0.5 --ckpt-osc-window 10 --early-stop-patience 8 [P21H] device=cuda dtype=torch.bfloat16 [mix] multi-wiki=28,986,824 chars, anima=224,198,521 chars [mix] multi-wiki chunks=28,308, anima chunks=218,944 [mix] kept multi-wiki=22,073 (36.0MB), anima=28,407 (36.0MB) [mix] wrote 50,480 records to /workspace/p21hr/mixed_corpus_v3.jsonl (72.0MB, wiki=22073, anima=28407) [mix] sha256=db8f15d412817b532480522e6fb65a451856e23ab0fb9009014f4c58570e104d [P21H][init=qwen] V3β — Qwen warm-start from Qwen/Qwen2.5-1.5B [from_qwen] loading Qwen/Qwen2.5-1.5B [from_qwen] qwen: vocab=151936 d=1536 L=28 n_head=12 n_kv_head=2 -> v3_n_kv_head=4 rope_base=1000000.0 [from_qwen] init OK — total params 2999.7M [P21H] model params total=2,999,735,296 (2999.74M) [P21H] BEFORE-train per-lang OOD eval (greedy) [P21H] BEFORE en greedy: {'GENERALIZE': 10, 'MEMORIZE': 0, 'MEM_PARTIAL': 0, 'EMPTY': 0, 'ERROR': 0} [P21H] BEFORE ko greedy: {'GENERALIZE': 10, 'MEMORIZE': 0, 'MEM_PARTIAL': 0, 'EMPTY': 0, 'ERROR': 0} [P21H] BEFORE zh greedy: {'GENERALIZE': 10, 'MEMORIZE': 0, 'MEM_PARTIAL': 0, 'EMPTY': 0, 'ERROR': 0} [P21H] BEFORE ru greedy: {'GENERALIZE': 10, 'MEMORIZE': 0, 'MEM_PARTIAL': 0, 'EMPTY': 0, 'ERROR': 0} [P21H] BEFORE ja greedy: {'GENERALIZE': 9, 'MEMORIZE': 0, 'MEM_PARTIAL': 0, 'EMPTY': 1, 'ERROR': 0} [P21H] optimizer=PagedAdamW8bit (bnb 0.43.1) Token indices sequence length is longer than the specified maximum sequence length for this model (22889429 > 131072). Running this sequence through the model will result in indexing errors [P21H] tokenized mixed corpus: 6,000,000 tokens (block=512) [P21H] steps=5000 bsz=2 block=512 peak_lr=5e-05 warmup=100 lambda_mitosis=0.0 init=qwen mitosis_max=16 ckpt_every=500 osc_thr=0.5 es_patience=8 [P21H] step= 1 lr=5.00e-07 CE=14.1780 total=14.1780 pool=2 splits=0 phi=0.7120 t=6s [P21H] ckpt save (best) step=1 CE=14.1780 → /workspace/p21hr/out_main/ckpt_best.pt (5735.8 MB) [P21H] step= 125 lr=5.00e-05 CE=5.8837 total=5.8837 pool=16 splits=14 phi=0.6579 t=585s [P21H] ckpt save (best) step=125 CE=5.8837 → /workspace/p21hr/out_main/ckpt_best.pt (5735.8 MB) [P21H] step= 250 lr=4.99e-05 CE=3.6508 total=3.6508 pool=16 splits=14 phi=0.6579 t=1149s [P21H] ckpt save (best) step=250 CE=3.6508 → /workspace/p21hr/out_main/ckpt_best.pt (5735.8 MB) [P21H] step= 375 lr=4.97e-05 CE=6.1331 total=6.1331 pool=16 splits=14 phi=0.6579 t=1723s [P21H] step= 500 lr=4.93e-05 CE=5.2994 total=5.2994 pool=16 splits=14 phi=0.6579 t=2303s [P21H] ckpt save (step500) step=500 CE=5.2994 → /workspace/p21hr/out_main/ckpt_step500.pt (5735.8 MB) [P21H] kosmos anchor written: v3_emit_step500_ru_ru_factual_geo.kosmos [P21H] step= 625 lr=4.87e-05 CE=3.1315 total=3.1315 pool=16 splits=14 phi=0.6579 t=2875s [P21H] ckpt save (best) step=625 CE=3.1315 → /workspace/p21hr/out_main/ckpt_best.pt (5735.8 MB) [P21H] step= 750 lr=4.81e-05 CE=6.4955 total=6.4955 pool=16 splits=14 phi=0.6579 t=3473s [P21H] step= 875 lr=4.73e-05 CE=3.8592 total=3.8592 pool=16 splits=14 phi=0.6579 t=4044s [P21H] step= 1000 lr=4.64e-05 CE=6.7657 total=6.7657 pool=16 splits=14 phi=0.6579 t=4632s [P21H] ckpt save (step1000) step=1000 CE=6.7657 → /workspace/p21hr/out_main/ckpt_step1000.pt (5735.8 MB) [P21H] kosmos anchor written: v3_emit_step1000_ru_ru_factual_geo.kosmos [P21H] step= 1125 lr=4.53e-05 CE=2.1516 total=2.1516 pool=16 splits=14 phi=0.6579 t=5229s [P21H] ckpt save (best) step=1125 CE=2.1516 → /workspace/p21hr/out_main/ckpt_best.pt (5735.8 MB) [P21H] ckpt save (osc_step1125) step=1125 CE=2.1516 → /workspace/p21hr/out_main/ckpt_osc_step1125.pt (5735.8 MB) [P21H] EARLY STOP: CE re-divergence: recent_mean=4.2588 > best_CE=2.1516+0.5 (mode collapse @ step 1125) [P21H] TRAIN DONE wall=5237.6s final_CE=2.1516 actual_steps=1125/5000 early_stopped=True reason=CE re-divergence: recent_mean=4.2588 > best_CE=2.1516+0.5 (mode collapse @ step 1125) best_CE=2.1516@step1125 [P21H] ckpt saved → /workspace/p21hr/out_main/ckpt.pt (5735.8 MB) [P21H] AFTER per-lang OOD eval (greedy + sample) [P21H] AFTER en: greedy={'GENERALIZE': 9, 'MEMORIZE': 1, 'MEM_PARTIAL': 0, 'EMPTY': 0, 'ERROR': 0} sample={'GENERALIZE': 4, 'MEMORIZE': 6, 'MEM_PARTIAL': 0, 'EMPTY': 0, 'ERROR': 0} verdict=WEAK score=4/20 gen=13 coh=4 [P21H] AFTER ko: greedy={'GENERALIZE': 7, 'MEMORIZE': 3, 'MEM_PARTIAL': 0, 'EMPTY': 0, 'ERROR': 0} sample={'GENERALIZE': 7, 'MEMORIZE': 3, 'MEM_PARTIAL': 0, 'EMPTY': 0, 'ERROR': 0} verdict=WEAK score=5/20 gen=14 coh=5 [P21H] AFTER zh: greedy={'GENERALIZE': 10, 'MEMORIZE': 0, 'MEM_PARTIAL': 0, 'EMPTY': 0, 'ERROR': 0} sample={'GENERALIZE': 10, 'MEMORIZE': 0, 'MEM_PARTIAL': 0, 'EMPTY': 0, 'ERROR': 0} verdict=WEAK score=1/20 gen=20 coh=1 [P21H] AFTER ru: greedy={'GENERALIZE': 10, 'MEMORIZE': 0, 'MEM_PARTIAL': 0, 'EMPTY': 0, 'ERROR': 0} sample={'GENERALIZE': 9, 'MEMORIZE': 1, 'MEM_PARTIAL': 0, 'EMPTY': 0, 'ERROR': 0} verdict=WEAK score=3/20 gen=19 coh=3 [P21H] AFTER ja: greedy={'GENERALIZE': 8, 'MEMORIZE': 2, 'MEM_PARTIAL': 0, 'EMPTY': 0, 'ERROR': 0} sample={'GENERALIZE': 8, 'MEMORIZE': 1, 'MEM_PARTIAL': 1, 'EMPTY': 0, 'ERROR': 0} verdict=WEAK score=1/20 gen=16 coh=1 [P21H] AFTER anima Eval1 [P21H] AGG: STRONG=0 PARTIAL=0 WEAK=5 PURE_MEMORIZE=0 → FAIL [P21H] anima_register_hits=9/20 register_regress=False [P21H] KOSMOS anchors total=7 (during_train=2 final=5) [P21H] DONE → /workspace/p21hr/out_main