rl4phyx-backup / logs /eval_coldstart.log
YUNTA88's picture
Upload folder using huggingface_hub
3eee49d verified
============================================================
OPEN-ENDED EVAL: Base vs SFT (Multi-GPU)
Base model: /workspace/rl4phyx/models/Qwen2.5-VL-3B-Instruct
SFT model: /workspace/rl4phyx/RL4Phyx/SFT/checkpoints/sft_qwen25vl_3b_fullft_coldstart/final
Base GPUs: [0, 1, 2, 3]
SFT GPUs: [0, 1, 2, 3, 4, 5, 6, 7]
============================================================
Loaded 1533 test samples
Mechanics: 276
Electromagnetism: 275
Thermodynamics: 255
Waves/Acoustics: 253
Optics: 252
Modern Physics: 222
>>> Starting SFT model inference...
[sft][GPU 0] Loading model...
The tokenizer you are loading from '/workspace/rl4phyx/RL4Phyx/SFT/checkpoints/sft_qwen25vl_3b_fullft_coldstart/final' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e. This will lead to incorrect tokenization. You should set the `fix_mistral_regex=True` flag when loading this tokenizer to fix this issue.
`torch_dtype` is deprecated! Use `dtype` instead!
[sft][GPU 1] Loading model...
The tokenizer you are loading from '/workspace/rl4phyx/RL4Phyx/SFT/checkpoints/sft_qwen25vl_3b_fullft_coldstart/final' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e. This will lead to incorrect tokenization. You should set the `fix_mistral_regex=True` flag when loading this tokenizer to fix this issue.
`torch_dtype` is deprecated! Use `dtype` instead!
[sft][GPU 2] Loading model...
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]The tokenizer you are loading from '/workspace/rl4phyx/RL4Phyx/SFT/checkpoints/sft_qwen25vl_3b_fullft_coldstart/final' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e. This will lead to incorrect tokenization. You should set the `fix_mistral_regex=True` flag when loading this tokenizer to fix this issue.
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]`torch_dtype` is deprecated! Use `dtype` instead!
[sft][GPU 3] Loading model...
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s] Loading checkpoint shards: 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1/2 [00:01<00:01, 1.22s/it]The tokenizer you are loading from '/workspace/rl4phyx/RL4Phyx/SFT/checkpoints/sft_qwen25vl_3b_fullft_coldstart/final' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e. This will lead to incorrect tokenization. You should set the `fix_mistral_regex=True` flag when loading this tokenizer to fix this issue.
Loading checkpoint shards: 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1/2 [00:01<00:01, 1.59s/it]`torch_dtype` is deprecated! Use `dtype` instead!
[sft][GPU 4] Loading model...
Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:02<00:00, 1.04it/s] Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:02<00:00, 1.00s/it]
[sft][GPU 1] Model loaded. Processing 192 samples.
The tokenizer you are loading from '/workspace/rl4phyx/RL4Phyx/SFT/checkpoints/sft_qwen25vl_3b_fullft_coldstart/final' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e. This will lead to incorrect tokenization. You should set the `fix_mistral_regex=True` flag when loading this tokenizer to fix this issue.
Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:02<00:00, 1.30s/it] Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:02<00:00, 1.34s/it]
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s] Loading checkpoint shards: 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1/2 [00:01<00:01, 1.46s/it][sft][GPU 0] Model loaded. Processing 192 samples.
`torch_dtype` is deprecated! Use `dtype` instead!
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s] Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:02<00:00, 1.18s/it] Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:02<00:00, 1.22s/it]
[sft][GPU 5] Loading model...
[sft][GPU 2] Model loaded. Processing 192 samples.
Loading checkpoint shards: 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1/2 [00:01<00:01, 1.24s/it]The tokenizer you are loading from '/workspace/rl4phyx/RL4Phyx/SFT/checkpoints/sft_qwen25vl_3b_fullft_coldstart/final' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e. This will lead to incorrect tokenization. You should set the `fix_mistral_regex=True` flag when loading this tokenizer to fix this issue.
`torch_dtype` is deprecated! Use `dtype` instead!
Loading checkpoint shards: 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1/2 [00:01<00:01, 1.07s/it] Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:01<00:00, 1.13it/s] Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:01<00:00, 1.06it/s]
[sft][GPU 3] Model loaded. Processing 192 samples.
[sft][GPU 6] Loading model...
The tokenizer you are loading from '/workspace/rl4phyx/RL4Phyx/SFT/checkpoints/sft_qwen25vl_3b_fullft_coldstart/final' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e. This will lead to incorrect tokenization. You should set the `fix_mistral_regex=True` flag when loading this tokenizer to fix this issue.
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]`torch_dtype` is deprecated! Use `dtype` instead!
Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:02<00:00, 1.01s/it] Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:02<00:00, 1.02s/it]
[sft][GPU 4] Model loaded. Processing 192 samples.
[sft][GPU 7] Loading model...
The tokenizer you are loading from '/workspace/rl4phyx/RL4Phyx/SFT/checkpoints/sft_qwen25vl_3b_fullft_coldstart/final' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e. This will lead to incorrect tokenization. You should set the `fix_mistral_regex=True` flag when loading this tokenizer to fix this issue.
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s] Loading checkpoint shards: 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1/2 [00:01<00:01, 1.24s/it]`torch_dtype` is deprecated! Use `dtype` instead!
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s] Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:02<00:00, 1.01s/it] Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:02<00:00, 1.05s/it]
[sft][GPU 5] Model loaded. Processing 192 samples.
Loading checkpoint shards: 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1/2 [00:01<00:01, 1.26s/it] Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:01<00:00, 1.15it/s] Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:01<00:00, 1.08it/s]
[sft][GPU 6] Model loaded. Processing 192 samples.
Loading checkpoint shards: 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1/2 [00:01<00:01, 1.37s/it] Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:02<00:00, 1.03s/it] Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:02<00:00, 1.08s/it]
[sft][GPU 7] Model loaded. Processing 189 samples.
[sft][GPU 5] 20/192 done
[sft][GPU 7] 20/189 done
[sft][GPU 6] 20/192 done
[sft][GPU 0] 20/192 done
[sft][GPU 3] 20/192 done
[sft][GPU 2] 20/192 done
[sft][GPU 4] 20/192 done
[sft][GPU 1] 20/192 done
[sft][GPU 6] 40/192 done
[sft][GPU 0] 40/192 done
[sft][GPU 7] 40/189 done
[sft][GPU 5] 40/192 done
[sft][GPU 3] 40/192 done
[sft][GPU 2] 40/192 done
[sft][GPU 4] 40/192 done
[sft][GPU 1] 40/192 done
[sft][GPU 0] 60/192 done
[sft][GPU 6] 60/192 done
[sft][GPU 7] 60/189 done
[sft][GPU 5] 60/192 done
[sft][GPU 2] 60/192 done
[sft][GPU 4] 60/192 done
[sft][GPU 3] 60/192 done
[sft][GPU 0] 80/192 done
[sft][GPU 1] 60/192 done
[sft][GPU 2] 80/192 done
[sft][GPU 7] 80/189 done
[sft][GPU 5] 80/192 done
[sft][GPU 4] 80/192 done
[sft][GPU 6] 80/192 done
[sft][GPU 3] 80/192 done
[sft][GPU 5] 100/192 done
[sft][GPU 0] 100/192 done
[sft][GPU 1] 80/192 done
[sft][GPU 2] 100/192 done
[sft][GPU 4] 100/192 done
[sft][GPU 6] 100/192 done
[sft][GPU 3] 100/192 done
[sft][GPU 7] 100/189 done
[sft][GPU 5] 120/192 done
[sft][GPU 1] 100/192 done
[sft][GPU 0] 120/192 done
[sft][GPU 6] 120/192 done
[sft][GPU 3] 120/192 done
[sft][GPU 2] 120/192 done
[sft][GPU 4] 120/192 done
[sft][GPU 5] 140/192 done
[sft][GPU 7] 120/189 done
[sft][GPU 1] 120/192 done
[sft][GPU 2] 140/192 done
[sft][GPU 6] 140/192 done
[sft][GPU 3] 140/192 done
[sft][GPU 0] 140/192 done
[sft][GPU 4] 140/192 done
[sft][GPU 5] 160/192 done
[sft][GPU 7] 140/189 done
[sft][GPU 1] 140/192 done
[sft][GPU 2] 160/192 done
[sft][GPU 3] 160/192 done
[sft][GPU 6] 160/192 done
[sft][GPU 0] 160/192 done
[sft][GPU 5] 180/192 done
[sft][GPU 4] 160/192 done
[sft][GPU 7] 160/189 done
[sft][GPU 2] 180/192 done
[sft][GPU 1] 160/192 done
[sft][GPU 5] 192/192 done
[sft][GPU 5] Saved 192 results to /workspace/rl4phyx/RL4Phyx/SFT/sft_eval_footprint/inference_results_phyx_gpu5.jsonl
[sft][GPU 6] 180/192 done
[sft][GPU 4] 180/192 done
[sft][GPU 2] 192/192 done
[sft][GPU 2] Saved 192 results to /workspace/rl4phyx/RL4Phyx/SFT/sft_eval_footprint/inference_results_phyx_gpu2.jsonl
[sft][GPU 3] 180/192 done
[sft][GPU 0] 180/192 done
[sft][GPU 7] 180/189 done
[sft][GPU 4] 192/192 done
[sft][GPU 4] Saved 192 results to /workspace/rl4phyx/RL4Phyx/SFT/sft_eval_footprint/inference_results_phyx_gpu4.jsonl
[sft][GPU 6] 192/192 done
[sft][GPU 6] Saved 192 results to /workspace/rl4phyx/RL4Phyx/SFT/sft_eval_footprint/inference_results_phyx_gpu6.jsonl
[sft][GPU 7] 189/189 done
[sft][GPU 7] Saved 189 results to /workspace/rl4phyx/RL4Phyx/SFT/sft_eval_footprint/inference_results_phyx_gpu7.jsonl
[sft][GPU 3] 192/192 done
[sft][GPU 3] Saved 192 results to /workspace/rl4phyx/RL4Phyx/SFT/sft_eval_footprint/inference_results_phyx_gpu3.jsonl
[sft][GPU 0] 192/192 done
[sft][GPU 0] Saved 192 results to /workspace/rl4phyx/RL4Phyx/SFT/sft_eval_footprint/inference_results_phyx_gpu0.jsonl
[sft][GPU 1] 180/192 done
[sft][GPU 1] 192/192 done
[sft][GPU 1] Saved 192 results to /workspace/rl4phyx/RL4Phyx/SFT/sft_eval_footprint/inference_results_phyx_gpu1.jsonl
============================================================
INFERENCE COMPLETE in 82.8 min
Base results: 0 β†’ /workspace/rl4phyx/RL4Phyx/SFT/sft_eval_footprint/inference_results_base.jsonl
SFT results: 1533 β†’ /workspace/rl4phyx/RL4Phyx/SFT/sft_eval_footprint/inference_results_coldstart.jsonl
============================================================