| /opt/conda/lib/python3.11/site-packages/torch/utils/_pytree.py:185: FutureWarning: optree is installed but the version is too old to support PyTorch Dynamo in C++ pytree. C++ pytree support is disabled. Please consider upgrading optree using `python3 -m pip install --upgrade 'optree>=0.13.0'`. |
| warnings.warn( |
| ============================================================ |
| OPEN-ENDED EVAL: Base vs SFT (Multi-GPU) |
| Base model: /workspace/rl4phyx/models/Qwen2.5-VL-3B-Instruct |
| SFT model: /workspace/rl4phyx/RL4Phyx/SFT/checkpoints/lora_math_f/merged |
| Base GPUs: [] |
| SFT GPUs: [0, 1, 2, 3, 4, 5, 6, 7] |
| ============================================================ |
|
|
| Loaded 1533 test samples |
| Mechanics: 276 |
| Electromagnetism: 275 |
| Thermodynamics: 255 |
| Waves/Acoustics: 253 |
| Optics: 252 |
| Modern Physics: 222 |
|
|
| >>> SKIPPING BASE model (BASE_GPUS is empty) |
|
|
| >>> Starting SFT model inference... |
| /opt/conda/lib/python3.11/site-packages/torch/utils/_pytree.py:185: FutureWarning: optree is installed but the version is too old to support PyTorch Dynamo in C++ pytree. C++ pytree support is disabled. Please consider upgrading optree using `python3 -m pip install --upgrade 'optree>=0.13.0'`. |
| warnings.warn( |
| /opt/conda/lib/python3.11/site-packages/torch/utils/_pytree.py:185: FutureWarning: optree is installed but the version is too old to support PyTorch Dynamo in C++ pytree. C++ pytree support is disabled. Please consider upgrading optree using `python3 -m pip install --upgrade 'optree>=0.13.0'`. |
| warnings.warn( |
| /opt/conda/lib/python3.11/site-packages/torch/utils/_pytree.py:185: FutureWarning: optree is installed but the version is too old to support PyTorch Dynamo in C++ pytree. C++ pytree support is disabled. Please consider upgrading optree using `python3 -m pip install --upgrade 'optree>=0.13.0'`. |
| warnings.warn( |
| /opt/conda/lib/python3.11/site-packages/torch/utils/_pytree.py:185: FutureWarning: optree is installed but the version is too old to support PyTorch Dynamo in C++ pytree. C++ pytree support is disabled. Please consider upgrading optree using `python3 -m pip install --upgrade 'optree>=0.13.0'`. |
| warnings.warn( |
| /opt/conda/lib/python3.11/site-packages/torch/utils/_pytree.py:185: FutureWarning: optree is installed but the version is too old to support PyTorch Dynamo in C++ pytree. C++ pytree support is disabled. Please consider upgrading optree using `python3 -m pip install --upgrade 'optree>=0.13.0'`. |
| warnings.warn( |
| /opt/conda/lib/python3.11/site-packages/torch/utils/_pytree.py:185: FutureWarning: optree is installed but the version is too old to support PyTorch Dynamo in C++ pytree. C++ pytree support is disabled. Please consider upgrading optree using `python3 -m pip install --upgrade 'optree>=0.13.0'`. |
| warnings.warn( |
| /opt/conda/lib/python3.11/site-packages/torch/utils/_pytree.py:185: FutureWarning: optree is installed but the version is too old to support PyTorch Dynamo in C++ pytree. C++ pytree support is disabled. Please consider upgrading optree using `python3 -m pip install --upgrade 'optree>=0.13.0'`. |
| warnings.warn( |
| /opt/conda/lib/python3.11/site-packages/torch/utils/_pytree.py:185: FutureWarning: optree is installed but the version is too old to support PyTorch Dynamo in C++ pytree. C++ pytree support is disabled. Please consider upgrading optree using `python3 -m pip install --upgrade 'optree>=0.13.0'`. |
| warnings.warn( |
| [sft][GPU 2] Loading model... |
| Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.48, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`. |
| [sft][GPU 4] Loading model... |
| Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.48, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`. |
| [sft][GPU 7] Loading model... |
| Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.48, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`. |
| [sft][GPU 5] Loading model... |
| [sft][GPU 3] Loading model... |
| Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.48, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`. |
| Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.48, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`. |
| [sft][GPU 0] Loading model... |
| Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.48, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`. |
| [sft][GPU 6] Loading model... |
| Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.48, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`. |
| [sft][GPU 1] Loading model... |
| Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.48, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`. |
| The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored. |
|
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored. |
| The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored. |
| The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored. |
| The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored. |
| The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored. |
| The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored. |
| The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored. |
|
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]
Loading checkpoint shards: 50%|βββββ | 1/2 [00:02<00:02, 2.98s/it]
Loading checkpoint shards: 50%|βββββ | 1/2 [00:02<00:02, 3.00s/it]
Loading checkpoint shards: 50%|βββββ | 1/2 [00:03<00:03, 3.12s/it]
Loading checkpoint shards: 50%|βββββ | 1/2 [00:03<00:03, 3.18s/it]
Loading checkpoint shards: 50%|βββββ | 1/2 [00:03<00:03, 3.12s/it]
Loading checkpoint shards: 50%|βββββ | 1/2 [00:03<00:03, 3.18s/it]
Loading checkpoint shards: 50%|βββββ | 1/2 [00:03<00:03, 3.18s/it]
Loading checkpoint shards: 50%|βββββ | 1/2 [00:03<00:03, 3.42s/it]
Loading checkpoint shards: 100%|ββββββββββ| 2/2 [00:03<00:00, 1.70s/it]
Loading checkpoint shards: 100%|ββββββββββ| 2/2 [00:03<00:00, 1.89s/it] |
| [sft][GPU 2] Model loaded. Processing 192 samples. |
|
Loading checkpoint shards: 100%|ββββββββββ| 2/2 [00:03<00:00, 1.65s/it]
Loading checkpoint shards: 100%|ββββββββββ| 2/2 [00:03<00:00, 1.86s/it] |
|
Loading checkpoint shards: 100%|ββββββββββ| 2/2 [00:03<00:00, 1.71s/it]
Loading checkpoint shards: 100%|ββββββββββ| 2/2 [00:03<00:00, 1.92s/it] |
| [sft][GPU 5] Model loaded. Processing 192 samples. |
| [sft][GPU 4] Model loaded. Processing 192 samples. |
|
Loading checkpoint shards: 100%|ββββββββββ| 2/2 [00:03<00:00, 1.72s/it]
Loading checkpoint shards: 100%|ββββββββββ| 2/2 [00:03<00:00, 1.93s/it] |
|
Loading checkpoint shards: 100%|ββββββββββ| 2/2 [00:03<00:00, 1.76s/it]
Loading checkpoint shards: 100%|ββββββββββ| 2/2 [00:03<00:00, 1.97s/it] |
| [sft][GPU 1] Model loaded. Processing 192 samples. |
| [sft][GPU 6] Model loaded. Processing 192 samples. |
|
Loading checkpoint shards: 100%|ββββββββββ| 2/2 [00:04<00:00, 1.80s/it]
Loading checkpoint shards: 100%|ββββββββββ| 2/2 [00:04<00:00, 2.01s/it] |
|
Loading checkpoint shards: 100%|ββββββββββ| 2/2 [00:04<00:00, 1.80s/it]
Loading checkpoint shards: 100%|ββββββββββ| 2/2 [00:04<00:00, 2.01s/it] |
| [sft][GPU 3] Model loaded. Processing 192 samples. |
| [sft][GPU 7] Model loaded. Processing 189 samples. |
|
Loading checkpoint shards: 100%|ββββββββββ| 2/2 [00:04<00:00, 1.95s/it]
Loading checkpoint shards: 100%|ββββββββββ| 2/2 [00:04<00:00, 2.18s/it] |
| [sft][GPU 0] Model loaded. Processing 192 samples. |
| [sft][GPU 6] 20/192 done |
| [sft][GPU 5] 20/192 done |
| [sft][GPU 7] 20/189 done |
| [sft][GPU 0] 20/192 done |
| [sft][GPU 4] 20/192 done |
| [sft][GPU 3] 20/192 done |
| [sft][GPU 2] 20/192 done |
| [sft][GPU 1] 20/192 done |
| [sft][GPU 6] 40/192 done |
| [sft][GPU 0] 40/192 done |
| [sft][GPU 5] 40/192 done |
| [sft][GPU 4] 40/192 done |
| [sft][GPU 7] 40/189 done |
| [sft][GPU 2] 40/192 done |
| [sft][GPU 3] 40/192 done |
| [sft][GPU 1] 40/192 done |
| [sft][GPU 0] 60/192 done |
| [sft][GPU 6] 60/192 done |
| [sft][GPU 4] 60/192 done |
| [sft][GPU 7] 60/189 done |
| [sft][GPU 5] 60/192 done |
| [sft][GPU 2] 60/192 done |
| [sft][GPU 3] 60/192 done |
| [sft][GPU 0] 80/192 done |
| [sft][GPU 1] 60/192 done |
| [sft][GPU 4] 80/192 done |
| [sft][GPU 5] 80/192 done |
| [sft][GPU 6] 80/192 done |
| [sft][GPU 7] 80/189 done |
| [sft][GPU 2] 80/192 done |
| [sft][GPU 3] 80/192 done |
| [sft][GPU 0] 100/192 done |
| [sft][GPU 1] 80/192 done |
| [sft][GPU 5] 100/192 done |
| [sft][GPU 4] 100/192 done |
| [sft][GPU 6] 100/192 done |
| [sft][GPU 2] 100/192 done |
| [sft][GPU 3] 100/192 done |
| [sft][GPU 7] 100/189 done |
| [sft][GPU 0] 120/192 done |
| [sft][GPU 1] 100/192 done |
| [sft][GPU 5] 120/192 done |
| [sft][GPU 6] 120/192 done |
| [sft][GPU 4] 120/192 done |
| [sft][GPU 3] 120/192 done |
| [sft][GPU 2] 120/192 done |
| [sft][GPU 5] 140/192 done |
| [sft][GPU 1] 120/192 done |
| [sft][GPU 7] 120/189 done |
| [sft][GPU 0] 140/192 done |
| [sft][GPU 6] 140/192 done |
| [sft][GPU 3] 140/192 done |
| [sft][GPU 4] 140/192 done |
| [sft][GPU 2] 140/192 done |
| [sft][GPU 5] 160/192 done |
| [sft][GPU 1] 140/192 done |
| [sft][GPU 0] 160/192 done |
| [sft][GPU 7] 140/189 done |
| [sft][GPU 6] 160/192 done |
| [sft][GPU 4] 160/192 done |
| [sft][GPU 2] 160/192 done |
| [sft][GPU 3] 160/192 done |
| [sft][GPU 5] 180/192 done |
| [sft][GPU 1] 160/192 done |
| [sft][GPU 5] 192/192 done |
| [sft][GPU 5] Saved 192 results to /workspace/rl4phyx/RL4Phyx/SFT/sft_eval_footprint/inference_results_lora_math_f_gpu5.jsonl |
| [sft][GPU 0] 180/192 done |
| [sft][GPU 4] 180/192 done |
| [sft][GPU 7] 160/189 done |
| [sft][GPU 6] 180/192 done |
| [sft][GPU 2] 180/192 done |
| [sft][GPU 3] 180/192 done |
| [sft][GPU 1] 180/192 done |
| [sft][GPU 0] 192/192 done |
| [sft][GPU 0] Saved 192 results to /workspace/rl4phyx/RL4Phyx/SFT/sft_eval_footprint/inference_results_lora_math_f_gpu0.jsonl |
| [sft][GPU 4] 192/192 done |
| [sft][GPU 4] Saved 192 results to /workspace/rl4phyx/RL4Phyx/SFT/sft_eval_footprint/inference_results_lora_math_f_gpu4.jsonl |
| [sft][GPU 2] 192/192 done |
| [sft][GPU 2] Saved 192 results to /workspace/rl4phyx/RL4Phyx/SFT/sft_eval_footprint/inference_results_lora_math_f_gpu2.jsonl |
| [sft][GPU 3] 192/192 done |
| [sft][GPU 3] Saved 192 results to /workspace/rl4phyx/RL4Phyx/SFT/sft_eval_footprint/inference_results_lora_math_f_gpu3.jsonl |
| [sft][GPU 6] 192/192 done |
| [sft][GPU 6] Saved 192 results to /workspace/rl4phyx/RL4Phyx/SFT/sft_eval_footprint/inference_results_lora_math_f_gpu6.jsonl |
| [sft][GPU 1] 192/192 done |
| [sft][GPU 1] Saved 192 results to /workspace/rl4phyx/RL4Phyx/SFT/sft_eval_footprint/inference_results_lora_math_f_gpu1.jsonl |
| [sft][GPU 7] 180/189 done |
| [sft][GPU 7] 189/189 done |
| [sft][GPU 7] Saved 189 results to /workspace/rl4phyx/RL4Phyx/SFT/sft_eval_footprint/inference_results_lora_math_f_gpu7.jsonl |
|
|
| ============================================================ |
| INFERENCE COMPLETE in 84.9 min |
| Base results: 0 β /workspace/rl4phyx/RL4Phyx/SFT/sft_eval_footprint/inference_results_base.jsonl |
| SFT results: 1533 β /workspace/rl4phyx/RL4Phyx/SFT/sft_eval_footprint/inference_results_lora_math_f.jsonl |
| ============================================================ |
|
|