rl4phyx-backup / logs /eval_inference.log
YUNTA88's picture
Upload folder using huggingface_hub
3eee49d verified
============================================================
OPEN-ENDED EVAL: Base vs SFT (Multi-GPU)
Base model: /workspace/rl4phyx/models/Qwen2.5-VL-3B-Instruct
SFT model: /workspace/rl4phyx/RL4Phyx/SFT/checkpoints/sft_qwen25vl_3b_fullft/final
Base GPUs: [0, 1, 2, 3]
SFT GPUs: [4, 5, 6, 7]
============================================================
Loaded 1533 test samples
Mechanics: 276
Electromagnetism: 275
Thermodynamics: 255
Waves/Acoustics: 253
Optics: 252
Modern Physics: 222
>>> Starting BASE model inference...
[base][GPU 2] Loading model...
Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.48, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.
[base][GPU 0] Loading model...
Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.48, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.
[base][GPU 1] Loading model...
Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.48, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.
[base][GPU 3] Loading model...
Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.48, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s] Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s] Loading checkpoint shards: 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1/2 [00:01<00:01, 1.64s/it] Loading checkpoint shards: 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1/2 [00:01<00:01, 1.70s/it] Loading checkpoint shards: 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1/2 [00:01<00:01, 1.59s/it] Loading checkpoint shards: 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1/2 [00:01<00:01, 1.71s/it] Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:02<00:00, 1.11s/it] Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:02<00:00, 1.19s/it]
Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:02<00:00, 1.11s/it] Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:02<00:00, 1.20s/it]
[base][GPU 2] Model loaded. Processing 384 samples.
[base][GPU 0] Model loaded. Processing 384 samples.
Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:02<00:00, 1.21s/it] Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:02<00:00, 1.28s/it]
Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:02<00:00, 1.17s/it] Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:02<00:00, 1.23s/it]
[base][GPU 1] Model loaded. Processing 384 samples.
[base][GPU 3] Model loaded. Processing 381 samples.
[base][GPU 0] 20/384 done
[base][GPU 3] 20/381 done
[base][GPU 2] 20/384 done
[base][GPU 1] 20/384 done
[base][GPU 0] 40/384 done
[base][GPU 3] 40/381 done
[base][GPU 2] 40/384 done
[base][GPU 1] 40/384 done
[base][GPU 0] 60/384 done
[base][GPU 2] 60/384 done
[base][GPU 3] 60/381 done
[base][GPU 1] 60/384 done
[base][GPU 0] 80/384 done
[base][GPU 3] 80/381 done
[base][GPU 1] 80/384 done
[base][GPU 2] 80/384 done
[base][GPU 0] 100/384 done
[base][GPU 3] 100/381 done
[base][GPU 2] 100/384 done
[base][GPU 1] 100/384 done
[base][GPU 0] 120/384 done
[base][GPU 2] 120/384 done
[base][GPU 3] 120/381 done
[base][GPU 1] 120/384 done
[base][GPU 2] 140/384 done
[base][GPU 3] 140/381 done
[base][GPU 1] 140/384 done
[base][GPU 0] 140/384 done
[base][GPU 1] 160/384 done
[base][GPU 2] 160/384 done
[base][GPU 0] 160/384 done
[base][GPU 3] 160/381 done
[base][GPU 1] 180/384 done
[base][GPU 3] 180/381 done
[base][GPU 2] 180/384 done
[base][GPU 0] 180/384 done
[base][GPU 1] 200/384 done
[base][GPU 2] 200/384 done
[base][GPU 0] 200/384 done
[base][GPU 3] 200/381 done
[base][GPU 1] 220/384 done
[base][GPU 2] 220/384 done
[base][GPU 3] 220/381 done
[base][GPU 0] 220/384 done
[base][GPU 1] 240/384 done
[base][GPU 2] 240/384 done
[base][GPU 3] 240/381 done
[base][GPU 0] 240/384 done
[base][GPU 1] 260/384 done
[base][GPU 3] 260/381 done
[base][GPU 2] 260/384 done
[base][GPU 0] 260/384 done
[base][GPU 1] 280/384 done
[base][GPU 2] 280/384 done
[base][GPU 3] 280/381 done
[base][GPU 0] 280/384 done
[base][GPU 1] 300/384 done
[base][GPU 2] 300/384 done
[base][GPU 0] 300/384 done
[base][GPU 3] 300/381 done
[base][GPU 1] 320/384 done
[base][GPU 2] 320/384 done
[base][GPU 0] 320/384 done
[base][GPU 2] 340/384 done
[base][GPU 1] 340/384 done
[base][GPU 3] 320/381 done
[base][GPU 0] 340/384 done
[base][GPU 2] 360/384 done
[base][GPU 1] 360/384 done
[base][GPU 3] 340/381 done
[base][GPU 0] 360/384 done
[base][GPU 2] 380/384 done
[base][GPU 2] 384/384 done
[base][GPU 2] Saved 384 results to /workspace/rl4phyx/RL4Phyx/SFT/sft_eval_footprint/inference_results_base_gpu2.jsonl
[base][GPU 1] 380/384 done
[base][GPU 1] 384/384 done
[base][GPU 1] Saved 384 results to /workspace/rl4phyx/RL4Phyx/SFT/sft_eval_footprint/inference_results_base_gpu1.jsonl
[base][GPU 3] 360/381 done
[base][GPU 0] 380/384 done
[base][GPU 0] 384/384 done
[base][GPU 0] Saved 384 results to /workspace/rl4phyx/RL4Phyx/SFT/sft_eval_footprint/inference_results_base_gpu0.jsonl
[base][GPU 3] 380/381 done
[base][GPU 3] 381/381 done
[base][GPU 3] Saved 381 results to /workspace/rl4phyx/RL4Phyx/SFT/sft_eval_footprint/inference_results_base_gpu3.jsonl
>>> Starting SFT model inference...
[sft][GPU 6] Loading model...
Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.48, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.
[sft][GPU 4] Loading model...
Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.48, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.
[sft][GPU 5] Loading model...
Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.48, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.
[sft][GPU 7] Loading model...
Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.48, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s] Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s] Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s] Loading checkpoint shards: 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1/2 [00:01<00:01, 1.89s/it] Loading checkpoint shards: 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1/2 [00:01<00:01, 1.90s/it] Loading checkpoint shards: 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1/2 [00:01<00:01, 1.83s/it] Loading checkpoint shards: 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1/2 [00:01<00:01, 1.84s/it] Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:02<00:00, 1.24s/it] Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:02<00:00, 1.34s/it]
[sft][GPU 6] Model loaded. Processing 384 samples.
Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:02<00:00, 1.34s/it] Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:02<00:00, 1.42s/it]
Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:03<00:00, 1.44s/it] Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:03<00:00, 1.51s/it]
Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:02<00:00, 1.33s/it] Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:02<00:00, 1.41s/it]
[sft][GPU 7] Model loaded. Processing 381 samples.
[sft][GPU 5] Model loaded. Processing 384 samples.
[sft][GPU 4] Model loaded. Processing 384 samples.
[sft][GPU 7] 20/381 done
[sft][GPU 4] 20/384 done
[sft][GPU 5] 20/384 done
[sft][GPU 6] 20/384 done
[sft][GPU 4] 40/384 done
[sft][GPU 7] 40/381 done
[sft][GPU 5] 40/384 done
[sft][GPU 6] 40/384 done
[sft][GPU 4] 60/384 done
[sft][GPU 7] 60/381 done
[sft][GPU 6] 60/384 done
[sft][GPU 5] 60/384 done
[sft][GPU 4] 80/384 done
[sft][GPU 5] 80/384 done
[sft][GPU 6] 80/384 done
[sft][GPU 7] 80/381 done
[sft][GPU 4] 100/384 done
[sft][GPU 7] 100/381 done
[sft][GPU 6] 100/384 done
[sft][GPU 5] 100/384 done
[sft][GPU 7] 120/381 done
[sft][GPU 4] 120/384 done
[sft][GPU 6] 120/384 done
[sft][GPU 5] 120/384 done
[sft][GPU 7] 140/381 done
[sft][GPU 5] 140/384 done
[sft][GPU 6] 140/384 done
[sft][GPU 4] 140/384 done
[sft][GPU 7] 160/381 done
[sft][GPU 5] 160/384 done
[sft][GPU 6] 160/384 done
[sft][GPU 4] 160/384 done
[sft][GPU 7] 180/381 done
[sft][GPU 5] 180/384 done
[sft][GPU 6] 180/384 done
[sft][GPU 4] 180/384 done
[sft][GPU 5] 200/384 done
[sft][GPU 7] 200/381 done
[sft][GPU 6] 200/384 done
[sft][GPU 4] 200/384 done
[sft][GPU 5] 220/384 done
[sft][GPU 7] 220/381 done
[sft][GPU 6] 220/384 done
[sft][GPU 4] 220/384 done
[sft][GPU 5] 240/384 done
[sft][GPU 7] 240/381 done
[sft][GPU 6] 240/384 done
[sft][GPU 5] 260/384 done
[sft][GPU 4] 240/384 done
[sft][GPU 7] 260/381 done
[sft][GPU 6] 260/384 done
[sft][GPU 5] 280/384 done
[sft][GPU 4] 260/384 done
[sft][GPU 7] 280/381 done
[sft][GPU 6] 280/384 done
[sft][GPU 5] 300/384 done
[sft][GPU 4] 280/384 done
[sft][GPU 6] 300/384 done
[sft][GPU 7] 300/381 done
[sft][GPU 5] 320/384 done
[sft][GPU 4] 300/384 done
[sft][GPU 6] 320/384 done
[sft][GPU 5] 340/384 done
[sft][GPU 7] 320/381 done
[sft][GPU 4] 320/384 done
[sft][GPU 6] 340/384 done
[sft][GPU 5] 360/384 done
[sft][GPU 6] 360/384 done
[sft][GPU 4] 340/384 done
[sft][GPU 7] 340/381 done
[sft][GPU 6] 380/384 done
[sft][GPU 5] 380/384 done
[sft][GPU 4] 360/384 done
[sft][GPU 6] 384/384 done
[sft][GPU 6] Saved 384 results to /workspace/rl4phyx/RL4Phyx/SFT/sft_eval_footprint/inference_results_sft_gpu6.jsonl
[sft][GPU 5] 384/384 done
[sft][GPU 5] Saved 384 results to /workspace/rl4phyx/RL4Phyx/SFT/sft_eval_footprint/inference_results_sft_gpu5.jsonl
[sft][GPU 7] 360/381 done
[sft][GPU 4] 380/384 done
[sft][GPU 7] 380/381 done
[sft][GPU 7] 381/381 done
[sft][GPU 7] Saved 381 results to /workspace/rl4phyx/RL4Phyx/SFT/sft_eval_footprint/inference_results_sft_gpu7.jsonl
[sft][GPU 4] 384/384 done
[sft][GPU 4] Saved 384 results to /workspace/rl4phyx/RL4Phyx/SFT/sft_eval_footprint/inference_results_sft_gpu4.jsonl
============================================================
INFERENCE COMPLETE in 252.6 min
Base results: 1533 β†’ /workspace/rl4phyx/RL4Phyx/SFT/sft_eval_footprint/inference_results_base.jsonl
SFT results: 1533 β†’ /workspace/rl4phyx/RL4Phyx/SFT/sft_eval_footprint/inference_results_sft.jsonl
============================================================