[launch] PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True [launch] pip install bitsandbytes==0.43.1 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable.It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning. [notice] A new release of pip is available: 24.2 -> 26.1.1 [notice] To update, run: python -m pip install --upgrade pip [launch] bitsandbytes=0.43.1 [launch] pip install transformers WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable.It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning. [notice] A new release of pip is available: 24.2 -> 26.1.1 [notice] To update, run: python -m pip install --upgrade pip The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`. 0it [00:00, ?it/s] 0it [00:00, ?it/s] [launch] transformers=4.44.0 [launch] pip install peft WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable.It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning. [notice] A new release of pip is available: 24.2 -> 26.1.1 [notice] To update, run: python -m pip install --upgrade pip [launch] peft=0.12.0 [launch] accelerate=1.13.0 [launch] python3 -u /workspace/p21hr/train_p21h_v3.py --wiki-corpus /workspace/p21hr/multi_wiki_corpus.jsonl --anima-corpus /workspace/p21hr/state/corpus_s101_build_s102_2026_05_19/corpus_s101.jsonl --mixed-corpus /workspace/p21hr/mixed_corpus_v3.jsonl --out-dir /workspace/p21hr/out_main --base-model Qwen/Qwen2.5-1.5B --init-variant qwen --steps 5000 --bsz 2 --block 512 --lr 5e-5 --warmup-steps 100 --seed 1337 --wiki-frac 1.0 --target-corpus-mb 72 --noise-sigma 0.1 --lambda-mitosis 0.0 --mitosis-max 16 --ckpt-every 500 --ckpt-osc-threshold 0.5 --ckpt-osc-window 10 --early-stop-patience 8 [P21H] device=cuda dtype=torch.bfloat16 [mix] multi-wiki=28,986,824 chars, anima=224,198,521 chars [mix] multi-wiki chunks=28,308, anima chunks=218,944 [mix] kept multi-wiki=28,308 (50.2MB), anima=1 (0.0MB) [mix] wrote 28,309 records to /workspace/p21hr/mixed_corpus_v3.jsonl (50.2MB, wiki=28308, anima=1) [mix] sha256=7e62fd32034ced9f5ab5652ad9ed211b513ebc917b230a8fc4466adaf3c32d22 [P21H][init=qwen] V3β — Qwen warm-start from Qwen/Qwen2.5-1.5B [from_qwen] loading Qwen/Qwen2.5-1.5B [from_qwen] qwen: vocab=151936 d=1536 L=28 n_head=12 n_kv_head=2 -> v3_n_kv_head=4 rope_base=1000000.0 [from_qwen] init OK — total params 2999.7M [P21H] model params total=2,999,735,296 (2999.74M) [P21H] BEFORE-train per-lang OOD eval (greedy) [P21H] BEFORE en greedy: {'GENERALIZE': 10, 'MEMORIZE': 0, 'MEM_PARTIAL': 0, 'EMPTY': 0, 'ERROR': 0} [P21H] BEFORE ko greedy: {'GENERALIZE': 10, 'MEMORIZE': 0, 'MEM_PARTIAL': 0, 'EMPTY': 0, 'ERROR': 0} [P21H] BEFORE zh greedy: {'GENERALIZE': 10, 'MEMORIZE': 0, 'MEM_PARTIAL': 0, 'EMPTY': 0, 'ERROR': 0} [P21H] BEFORE ru greedy: {'GENERALIZE': 10, 'MEMORIZE': 0, 'MEM_PARTIAL': 0, 'EMPTY': 0, 'ERROR': 0} [P21H] BEFORE ja greedy: {'GENERALIZE': 9, 'MEMORIZE': 0, 'MEM_PARTIAL': 0, 'EMPTY': 1, 'ERROR': 0} [P21H] optimizer=PagedAdamW8bit (bnb 0.43.1) Token indices sequence length is longer than the specified maximum sequence length for this model (13836908 > 131072). Running this sequence through the model will result in indexing errors [P21H] tokenized mixed corpus: 6,000,000 tokens (block=512) [P21H] steps=5000 bsz=2 block=512 peak_lr=5e-05 warmup=100 lambda_mitosis=0.0 init=qwen mitosis_max=16 ckpt_every=500 osc_thr=0.5 es_patience=8 [P21H] step= 1 lr=5.00e-07 CE=14.7927 total=14.7927 pool=2 splits=0 phi=0.7120 t=5s [P21H] ckpt save (best) step=1 CE=14.7927 → /workspace/p21hr/out_main/ckpt_best.pt (5735.8 MB) [P21H] step= 125 lr=5.00e-05 CE=8.1634 total=8.1634 pool=16 splits=14 phi=0.6579 t=76s [P21H] ckpt save (best) step=125 CE=8.1634 → /workspace/p21hr/out_main/ckpt_best.pt (5735.8 MB) [P21H] step= 250 lr=4.99e-05 CE=7.3439 total=7.3439 pool=16 splits=14 phi=0.6579 t=147s [P21H] ckpt save (best) step=250 CE=7.3439 → /workspace/p21hr/out_main/ckpt_best.pt (5735.8 MB) [P21H] step= 375 lr=4.97e-05 CE=7.1829 total=7.1829 pool=16 splits=14 phi=0.6579 t=219s [P21H] ckpt save (best) step=375 CE=7.1829 → /workspace/p21hr/out_main/ckpt_best.pt (5735.8 MB) [P21H] step= 500 lr=4.93e-05 CE=6.8499 total=6.8499 pool=16 splits=14 phi=0.6579 t=293s [P21H] ckpt save (best) step=500 CE=6.8499 → /workspace/p21hr/out_main/ckpt_best.pt (5735.8 MB) [P21H] ckpt save (step500) step=500 CE=6.8499 → /workspace/p21hr/out_main/ckpt_step500.pt (5735.8 MB) [P21H] kosmos anchor written: v3_emit_step500_ru_ru_factual_geo.kosmos [P21H] step= 625 lr=4.87e-05 CE=7.4549 total=7.4549 pool=16 splits=14 phi=0.6579 t=367s [P21H] step= 750 lr=4.81e-05 CE=5.6695 total=5.6695 pool=16 splits=14 phi=0.6579 t=432s [P21H] ckpt save (best) step=750 CE=5.6695 → /workspace/p21hr/out_main/ckpt_best.pt (5735.8 MB) [P21H] step= 875 lr=4.73e-05 CE=6.3453 total=6.3453 pool=16 splits=14 phi=0.6579 t=503s [P21H] step= 1000 lr=4.64e-05 CE=6.4940 total=6.4940 pool=16 splits=14 phi=0.6579 t=567s [P21H] ckpt save (step1000) step=1000 CE=6.4940 → /workspace/p21hr/out_main/ckpt_step1000.pt (5735.8 MB) [P21H] kosmos anchor written: v3_emit_step1000_ru_ru_factual_geo.kosmos [P21H] step= 1125 lr=4.53e-05 CE=6.5491 total=6.5491 pool=16 splits=14 phi=0.6579 t=637s [P21H] ckpt save (osc_step1125) step=1125 CE=6.5491 → /workspace/p21hr/out_main/ckpt_osc_step1125.pt (5735.8 MB) [P21H] EARLY STOP: CE re-divergence: recent_mean=6.4628 > best_CE=5.6695+0.5 (mode collapse @ step 1125) [P21H] TRAIN DONE wall=641.2s final_CE=6.5491 actual_steps=1125/5000 early_stopped=True reason=CE re-divergence: recent_mean=6.4628 > best_CE=5.6695+0.5 (mode collapse @ step 1125) best_CE=5.6695@step750 [P21H] ckpt saved → /workspace/p21hr/out_main/ckpt.pt (5735.8 MB) [P21H] AFTER per-lang OOD eval (greedy + sample) [P21H] AFTER en: greedy={'GENERALIZE': 10, 'MEMORIZE': 0, 'MEM_PARTIAL': 0, 'EMPTY': 0, 'ERROR': 0} sample={'GENERALIZE': 10, 'MEMORIZE': 0, 'MEM_PARTIAL': 0, 'EMPTY': 0, 'ERROR': 0} verdict=WEAK score=0/20 gen=20 coh=0 [P21H] AFTER ko: greedy={'GENERALIZE': 10, 'MEMORIZE': 0, 'MEM_PARTIAL': 0, 'EMPTY': 0, 'ERROR': 0} sample={'GENERALIZE': 10, 'MEMORIZE': 0, 'MEM_PARTIAL': 0, 'EMPTY': 0, 'ERROR': 0} verdict=PARTIAL score=15/20 gen=20 coh=15 [P21H] AFTER zh: greedy={'GENERALIZE': 10, 'MEMORIZE': 0, 'MEM_PARTIAL': 0, 'EMPTY': 0, 'ERROR': 0} sample={'GENERALIZE': 10, 'MEMORIZE': 0, 'MEM_PARTIAL': 0, 'EMPTY': 0, 'ERROR': 0} verdict=WEAK score=2/20 gen=20 coh=2 [P21H] AFTER ru: greedy={'GENERALIZE': 10, 'MEMORIZE': 0, 'MEM_PARTIAL': 0, 'EMPTY': 0, 'ERROR': 0} sample={'GENERALIZE': 10, 'MEMORIZE': 0, 'MEM_PARTIAL': 0, 'EMPTY': 0, 'ERROR': 0} verdict=WEAK score=5/20 gen=20 coh=5 [P21H] AFTER ja: greedy={'GENERALIZE': 10, 'MEMORIZE': 0, 'MEM_PARTIAL': 0, 'EMPTY': 0, 'ERROR': 0} sample={'GENERALIZE': 10, 'MEMORIZE': 0, 'MEM_PARTIAL': 0, 'EMPTY': 0, 'ERROR': 0} verdict=WEAK score=6/20 gen=20 coh=6 [P21H] AFTER anima Eval1 [P21H] AGG: STRONG=0 PARTIAL=1 WEAK=4 PURE_MEMORIZE=0 → FAIL [P21H] anima_register_hits=0/20 register_regress=True [P21H] KOSMOS anchors total=7 (during_train=2 final=5) [P21H] DONE → /workspace/p21hr/out_main