============================================================ OPEN-ENDED EVAL: Base vs SFT (Multi-GPU) Base model: /workspace/rl4phyx/models/Qwen2.5-VL-3B-Instruct SFT model: /workspace/rl4phyx/RL4Phyx/SFT/checkpoints/sft_qwen25vl_3b_fullft_coldstart/final Base GPUs: [0, 1, 2, 3] SFT GPUs: [0, 1, 2, 3, 4, 5, 6, 7] ============================================================ Loaded 1533 test samples Mechanics: 276 Electromagnetism: 275 Thermodynamics: 255 Waves/Acoustics: 253 Optics: 252 Modern Physics: 222 >>> Starting SFT model inference... [sft][GPU 0] Loading model... The tokenizer you are loading from '/workspace/rl4phyx/RL4Phyx/SFT/checkpoints/sft_qwen25vl_3b_fullft_coldstart/final' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e. This will lead to incorrect tokenization. You should set the `fix_mistral_regex=True` flag when loading this tokenizer to fix this issue. `torch_dtype` is deprecated! Use `dtype` instead! [sft][GPU 1] Loading model... The tokenizer you are loading from '/workspace/rl4phyx/RL4Phyx/SFT/checkpoints/sft_qwen25vl_3b_fullft_coldstart/final' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e. This will lead to incorrect tokenization. You should set the `fix_mistral_regex=True` flag when loading this tokenizer to fix this issue. `torch_dtype` is deprecated! Use `dtype` instead! [sft][GPU 2] Loading model... Loading checkpoint shards: 0%| | 0/2 [00:00