nie10 commited on Jun 19, 2025

Commit

90b2866

verified ·

1 Parent(s): e7f2879

Delete logs

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

logs/20250526_182357/process_pids.txt +0 -2
logs/20250526_182357/remote_rm_qa.log +0 -0
logs/20250526_182357/train.log +0 -0
logs/20250526_185638/process_pids.txt +0 -2
logs/20250526_185638/remote_rm_qa.log +0 -0
logs/20250526_185638/train.log +0 -0
logs/20250526_191827/process_pids.txt +0 -2
logs/20250526_191827/remote_rm_qa.log +0 -0
logs/20250526_191827/train.log +0 -0
logs/20250526_193312/process_pids.txt +0 -2
logs/20250526_193312/remote_rm_qa.log +0 -9
logs/20250526_193312/train.log +0 -2
logs/20250526_193656/process_pids.txt +0 -2
logs/20250526_193656/remote_rm_qa.log +0 -9
logs/20250526_193656/train.log +0 -77
logs/20250526_194456/process_pids.txt +0 -2
logs/20250526_194456/remote_rm_qa.log +0 -0
logs/20250526_194456/train.log +0 -92
logs/20250527_011343/process_pids.txt +0 -2
logs/20250527_011343/remote_rm_qa.log +0 -9
logs/20250527_011343/train.log +0 -45
logs/20250527_095510/process_pids.txt +0 -2
logs/20250527_095510/remote_rm_qa.log +0 -3
logs/20250527_095510/train.log +0 -0
logs/20250527_235509/process_pids.txt +0 -2
logs/20250527_235509/remote_rm_qa.log +0 -0
logs/20250527_235509/train.log +0 -0
logs/20250528_110535/process_pids.txt +0 -2
logs/20250528_110535/remote_rm_qa.log +0 -3
logs/20250528_110535/train.log +0 -0
logs/20250528_161139/process_pids.txt +0 -2
logs/20250528_161139/remote_rm_qa.log +0 -3
logs/20250528_161139/train.log +0 -0
logs/20250529_214257/process_pids.txt +0 -2
logs/20250529_214257/remote_rm_qa.log +0 -3
logs/20250529_214257/train.log +0 -0
logs/20250611_110725/process_pids.txt +0 -2
logs/20250611_110725/remote_rm_qa.log +0 -3
logs/20250611_110725/train.log +0 -0
logs/20250611_150946/process_pids.txt +0 -2
logs/20250611_150946/remote_rm_qa.log +0 -0
logs/20250611_150946/train.log +0 -0
logs/20250611_160325/process_pids.txt +0 -2
logs/20250611_160325/remote_rm_qa.log +0 -9
logs/20250611_160325/train.log +0 -0
logs/20250611_161239/process_pids.txt +0 -2
logs/20250611_161239/remote_rm_qa.log +0 -9
logs/20250611_161239/train.log +0 -188
logs/20250611_162203/process_pids.txt +0 -2
logs/20250611_162203/remote_rm_qa.log +0 -0

logs/20250526_182357/process_pids.txt DELETED Viewed

	@@ -1,2 +0,0 @@
1	- Remote RM PID: 126386
2	- Train PID: 126387

logs/20250526_182357/remote_rm_qa.log DELETED Viewed

The diff for this file is too large to render. See raw diff

logs/20250526_182357/train.log DELETED Viewed

The diff for this file is too large to render. See raw diff

logs/20250526_185638/process_pids.txt DELETED Viewed

	@@ -1,2 +0,0 @@
1	- Remote RM PID: 435069
2	- Train PID: 435070

logs/20250526_185638/remote_rm_qa.log DELETED Viewed

The diff for this file is too large to render. See raw diff

logs/20250526_185638/train.log DELETED Viewed

The diff for this file is too large to render. See raw diff

logs/20250526_191827/process_pids.txt DELETED Viewed

	@@ -1,2 +0,0 @@
1	- Remote RM PID: 74501
2	- Train PID: 74502

logs/20250526_191827/remote_rm_qa.log DELETED Viewed

The diff for this file is too large to render. See raw diff

logs/20250526_191827/train.log DELETED Viewed

The diff for this file is too large to render. See raw diff

logs/20250526_193312/process_pids.txt DELETED Viewed

	@@ -1,2 +0,0 @@
1	- Remote RM PID: 279066
2	- Train PID: 279067

logs/20250526_193312/remote_rm_qa.log DELETED Viewed

@@ -1,9 +0,0 @@
-[2025-05-26 19:33:56,196] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect)
-load dataset success
- * Serving Flask app 'math_verifier_wolatex'
- * Debug mode: off
-WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
- * Running on all addresses (0.0.0.0)
- * Running on http://127.0.0.1:2394
- * Running on http://10.140.0.144:2394
-Press CTRL+C to quit

logs/20250526_193312/train.log DELETED Viewed

	@@ -1,2 +0,0 @@
1	- 2025-05-26 19:33:39,938 INFO dashboard_sdk.py:338 -- Uploading package gcs://_ray_pkg_321e0871e56ca1df.zip.
2	- 2025-05-26 19:33:39,938 INFO packaging.py:575 -- Creating a file package for local module '/mnt/petrelfs/luyiting/MultiAgentEval/lmm-r1'.

logs/20250526_193656/process_pids.txt DELETED Viewed

	@@ -1,2 +0,0 @@
1	- Remote RM PID: 29110
2	- Train PID: 29111

logs/20250526_193656/remote_rm_qa.log DELETED Viewed

@@ -1,9 +0,0 @@
-[2025-05-26 19:37:40,391] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect)
-load dataset success
- * Serving Flask app 'math_verifier_wolatex'
- * Debug mode: off
-WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
- * Running on all addresses (0.0.0.0)
- * Running on http://127.0.0.1:2394
- * Running on http://10.140.0.167:2394
-Press CTRL+C to quit

logs/20250526_193656/train.log DELETED Viewed

@@ -1,77 +0,0 @@
-2025-05-26 19:37:24,046	INFO dashboard_sdk.py:338 -- Uploading package gcs://_ray_pkg_321e0871e56ca1df.zip.
-2025-05-26 19:37:24,047	INFO packaging.py:575 -- Creating a file package for local module '/mnt/petrelfs/luyiting/MultiAgentEval/lmm-r1'.
-2025-05-26 19:37:22,460	INFO cli.py:39 -- [37mJob submission server address[39m: [1mhttp://127.0.0.1:2983[22m
-2025-05-26 19:37:32,118	SUCC cli.py:63 -- [32m-------------------------------------------------------[39m
-2025-05-26 19:37:32,118	SUCC cli.py:64 -- [32mJob 'raysubmit_endd6V8YzvkhTPfY' submitted successfully[39m
-2025-05-26 19:37:32,118	SUCC cli.py:65 -- [32m-------------------------------------------------------[39m
-2025-05-26 19:37:32,118	INFO cli.py:289 -- [36mNext steps[39m
-2025-05-26 19:37:32,118	INFO cli.py:290 -- Query the logs of the job:
-2025-05-26 19:37:32,118	INFO cli.py:292 -- [1mray job logs raysubmit_endd6V8YzvkhTPfY[22m
-2025-05-26 19:37:32,118	INFO cli.py:294 -- Query the status of the job:
-2025-05-26 19:37:32,118	INFO cli.py:296 -- [1mray job status raysubmit_endd6V8YzvkhTPfY[22m
-2025-05-26 19:37:32,118	INFO cli.py:298 -- Request the job to be stopped:
-2025-05-26 19:37:32,119	INFO cli.py:300 -- [1mray job stop raysubmit_endd6V8YzvkhTPfY[22m
-2025-05-26 19:37:32,121	INFO cli.py:307 -- Tailing logs until the job exits (disable with --no-wait):
-2025-05-26 19:37:31,103	INFO job_manager.py:531 -- Runtime env is setting up.
-[2025-05-26 19:38:02,682] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect)
-INFO 05-26 19:38:09 [__init__.py:239] Automatically detected platform cuda.
-2025-05-26 19:38:11,252	INFO worker.py:1520 -- Using address 10.140.0.167:6231 set in the environment variable RAY_ADDRESS
-2025-05-26 19:38:11,253	INFO worker.py:1660 -- Connecting to existing Ray cluster at address: 10.140.0.167:6231...
-2025-05-26 19:38:11,274	INFO worker.py:1843 -- Connected to Ray cluster. View the dashboard at [1m[32m10.140.0.167:2983 [39m[22m
-[36m(pid=42922)[0m INFO 05-26 19:38:44 [__init__.py:239] Automatically detected platform cuda.
-[36m(LLMRayActor pid=42923)[0m INFO 05-26 19:39:22 [config.py:585] This model supports multiple tasks: {'score', 'reward', 'generate', 'embed', 'classify'}. Defaulting to 'generate'.
-[36m(LLMRayActor pid=42923)[0m WARNING 05-26 19:39:22 [arg_utils.py:1846] VLLM_ATTENTION_BACKEND=triton is not supported by the V1 Engine. Falling back to V0. We recommend to remove VLLM_ATTENTION_BACKEND=triton from your config in favor of the V1 Engine.
-[36m(LLMRayActor pid=42923)[0m WARNING 05-26 19:39:22 [arg_utils.py:1745] --enable-prefix-caching is not supported for multimodal models in V0 and has been disabled.
-[36m(LLMRayActor pid=42923)[0m INFO 05-26 19:39:22 [llm_engine.py:241] Initializing a V0 LLM engine (v0.8.2.dev76+gf68cce8) with config: model='/mnt/petrelfs/luyiting/ckt/Qwen2.5-VL-7B-Instruct/Qwen2.5-VL-7B-Instruct/', speculative_config=None, tokenizer='/mnt/petrelfs/luyiting/ckt/Qwen2.5-VL-7B-Instruct/Qwen2.5-VL-7B-Instruct/', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=8192, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto,  device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='xgrammar', reasoning_backend=None), observability_config=ObservabilityConfig(show_hidden_metrics=False, otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=43, served_model_name=/mnt/petrelfs/luyiting/ckt/Qwen2.5-VL-7B-Instruct/Qwen2.5-VL-7B-Instruct/, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=False, chunked_prefill_enabled=False, use_async_output_proc=True, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"splitting_ops":[],"compile_sizes":[],"cudagraph_capture_sizes":[256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":256}, use_cached_outputs=False,
-[36m(pid=42927)[0m INFO 05-26 19:38:44 [__init__.py:239] Automatically detected platform cuda.[32m [repeated 7x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/user-guides/configure-logging.html#log-deduplication for more options.)[0m
-[36m(LLMRayActor pid=42928)[0m INFO 05-26 19:39:22 [config.py:585] This model supports multiple tasks: {'generate', 'reward', 'embed', 'classify', 'score'}. Defaulting to 'generate'.
-[36m(LLMRayActor pid=42921)[0m INFO 05-26 19:39:22 [config.py:585] This model supports multiple tasks: {'reward', 'embed', 'generate', 'classify', 'score'}. Defaulting to 'generate'.
-[36m(LLMRayActor pid=42927)[0m INFO 05-26 19:39:22 [config.py:585] This model supports multiple tasks: {'classify', 'embed', 'score', 'reward', 'generate'}. Defaulting to 'generate'.
-[36m(LLMRayActor pid=42924)[0m INFO 05-26 19:39:23 [config.py:585] This model supports multiple tasks: {'score', 'embed', 'generate', 'reward', 'classify'}. Defaulting to 'generate'.
-[36m(LLMRayActor pid=42922)[0m INFO 05-26 19:39:23 [config.py:585] This model supports multiple tasks: {'generate', 'score', 'embed', 'reward', 'classify'}. Defaulting to 'generate'.
-[36m(LLMRayActor pid=42925)[0m INFO 05-26 19:39:23 [config.py:585] This model supports multiple tasks: {'score', 'generate', 'reward', 'classify', 'embed'}. Defaulting to 'generate'.
-[36m(LLMRayActor pid=42926)[0m INFO 05-26 19:39:23 [config.py:585] This model supports multiple tasks: {'reward', 'score', 'classify', 'generate', 'embed'}. Defaulting to 'generate'.
-[36m(LLMRayActor pid=42923)[0m [2025-05-26 19:39:26,028] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect)
-[36m(LLMRayActor pid=42923)[0m INFO 05-26 19:39:33 [cuda.py:293] Using Flash Attention backend.
-[36m(LLMRayActor pid=42925)[0m WARNING 05-26 19:39:23 [arg_utils.py:1846] VLLM_ATTENTION_BACKEND=triton is not supported by the V1 Engine. Falling back to V0. We recommend to remove VLLM_ATTENTION_BACKEND=triton from your config in favor of the V1 Engine.[32m [repeated 7x across cluster][0m
-[36m(LLMRayActor pid=42925)[0m WARNING 05-26 19:39:23 [arg_utils.py:1745] --enable-prefix-caching is not supported for multimodal models in V0 and has been disabled.[32m [repeated 7x across cluster][0m
-[36m(LLMRayActor pid=42925)[0m INFO 05-26 19:39:23 [llm_engine.py:241] Initializing a V0 LLM engine (v0.8.2.dev76+gf68cce8) with config: model='/mnt/petrelfs/luyiting/ckt/Qwen2.5-VL-7B-Instruct/Qwen2.5-VL-7B-Instruct/', speculative_config=None, tokenizer='/mnt/petrelfs/luyiting/ckt/Qwen2.5-VL-7B-Instruct/Qwen2.5-VL-7B-Instruct/', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=8192, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto,  device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='xgrammar', reasoning_backend=None), observability_config=ObservabilityConfig(show_hidden_metrics=False, otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=44, served_model_name=/mnt/petrelfs/luyiting/ckt/Qwen2.5-VL-7B-Instruct/Qwen2.5-VL-7B-Instruct/, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=False, chunked_prefill_enabled=False, use_async_output_proc=True, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"splitting_ops":[],"compile_sizes":[],"cudagraph_capture_sizes":[256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":256}, use_cached_outputs=False, [32m [repeated 7x across cluster][0m
-[36m(LLMRayActor pid=42926)[0m [2025-05-26 19:39:26,784] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect)[32m [repeated 7x across cluster][0m
-[36m(LLMRayActor pid=42923)[0m INFO 05-26 19:39:36 [parallel_state.py:967] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0
-[36m(LLMRayActor pid=42924)[0m INFO 05-26 19:39:36 [model_runner.py:1110] Starting to load model /mnt/petrelfs/luyiting/ckt/Qwen2.5-VL-7B-Instruct/Qwen2.5-VL-7B-Instruct/...
-[36m(LLMRayActor pid=42922)[0m INFO 05-26 19:39:37 [config.py:3229] cudagraph sizes specified by model runner [1, 2, 4, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160, 168, 176, 184, 192, 200, 208, 216, 224, 232, 240, 248, 256] is overridden by config [256, 128, 2, 1, 4, 136, 8, 144, 16, 152, 24, 160, 32, 168, 40, 176, 48, 184, 56, 192, 64, 200, 72, 208, 80, 216, 88, 120, 224, 96, 232, 104, 240, 112, 248]
-[36m(LLMRayActor pid=42924)[0m
-Loading safetensors checkpoint shards:   0% Completed | 0/5 [00:00<?, ?it/s]
-[36m(LLMRayActor pid=42924)[0m
-Loading safetensors checkpoint shards:  20% Completed | 1/5 [00:00<00:03,  1.07it/s]
-[36m(LLMRayActor pid=42927)[0m
-Loading safetensors checkpoint shards:   0% Completed | 0/5 [00:00<?, ?it/s][32m [repeated 7x across cluster][0m
-[36m(LLMRayActor pid=42924)[0m
-Loading safetensors checkpoint shards:  60% Completed | 3/5 [00:08<00:06,  3.28s/it][32m [repeated 16x across cluster][0m
-[36m(LLMRayActor pid=42924)[0m INFO 05-26 19:39:54 [loader.py:429] Loading weights took 16.67 seconds
-[36m(LLMRayActor pid=42926)[0m INFO 05-26 19:39:33 [cuda.py:293] Using Flash Attention backend.[32m [repeated 7x across cluster][0m
-[36m(LLMRayActor pid=42926)[0m INFO 05-26 19:39:37 [parallel_state.py:967] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0[32m [repeated 7x across cluster][0m
-[36m(LLMRayActor pid=42926)[0m INFO 05-26 19:39:37 [model_runner.py:1110] Starting to load model /mnt/petrelfs/luyiting/ckt/Qwen2.5-VL-7B-Instruct/Qwen2.5-VL-7B-Instruct/...[32m [repeated 7x across cluster][0m
-[36m(LLMRayActor pid=42926)[0m INFO 05-26 19:39:37 [config.py:3229] cudagraph sizes specified by model runner [1, 2, 4, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160, 168, 176, 184, 192, 200, 208, 216, 224, 232, 240, 248, 256] is overridden by config [256, 128, 2, 1, 4, 136, 8, 144, 16, 152, 24, 160, 32, 168, 40, 176, 48, 184, 56, 192, 64, 200, 72, 208, 80, 216, 88, 120, 224, 96, 232, 104, 240, 112, 248][32m [repeated 7x across cluster][0m
-[36m(LLMRayActor pid=42924)[0m
-[36m(LLMRayActor pid=42924)[0m
-Loading safetensors checkpoint shards: 100% Completed | 5/5 [00:16<00:00,  3.30s/it][32m [repeated 17x across cluster][0m
-[36m(LLMRayActor pid=42925)[0m
-[36m(LLMRayActor pid=42922)[0m
-[36m(LLMRayActor pid=42923)[0m
-[36m(LLMRayActor pid=42926)[0m
-[36m(LLMRayActor pid=42921)[0m
-[36m(LLMRayActor pid=42928)[0m
-[36m(LLMRayActor pid=42927)[0m
-[36m(LLMRayActor pid=42924)[0m INFO 05-26 19:39:54 [model_runner.py:1146] Model loading took 15.6271 GB and 17.157056 seconds
-[36m(LLMRayActor pid=42924)[0m WARNING 05-26 19:39:55 [model_runner.py:1296] Computed max_num_seqs (min(256, 8192 // 32768)) to be less than 1. Setting it to the minimum value of 1.
-[36m(LLMRayActor pid=42924)[0m Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.
-[36m(LLMRayActor pid=42927)[0m WARNING 05-26 19:40:00 [profiling.py:222] The sequence length used for profiling (max_num_batched_tokens / max_num_seqs = 8192) is too short to hold the multi-modal embeddings in the worst case (32768 tokens in total, out of which {'image': 16384, 'video': 16384} are reserved for multi-modal embeddings). This may cause certain multi-modal inputs to fail during inference, even when the input text is short. To avoid this, you should increase `max_model_len`, reduce `max_num_seqs`, and/or reduce `mm_counts`.
-[36m(LLMRayActor pid=42927)[0m INFO 05-26 19:39:54 [loader.py:429] Loading weights took 16.66 seconds[32m [repeated 7x across cluster][0m
-[36m(LLMRayActor pid=42928)[0m INFO 05-26 19:39:54 [model_runner.py:1146] Model loading took 15.6271 GB and 17.248244 seconds[32m [repeated 7x across cluster][0m
-[36m(LLMRayActor pid=42928)[0m WARNING 05-26 19:39:55 [model_runner.py:1296] Computed max_num_seqs (min(256, 8192 // 32768)) to be less than 1. Setting it to the minimum value of 1.[32m [repeated 7x across cluster][0m
-[36m(LLMRayActor pid=42924)[0m INFO 05-26 19:40:02 [worker.py:267] Memory profiling takes 7.96 seconds
-[36m(LLMRayActor pid=42924)[0m INFO 05-26 19:40:02 [worker.py:267] the current vLLM instance can use total_gpu_memory (79.32GiB) x gpu_memory_utilization (0.50) = 39.66GiB
-[36m(LLMRayActor pid=42924)[0m INFO 05-26 19:40:02 [worker.py:267] model weights take 15.63GiB; non_torch_memory takes 0.21GiB; PyTorch activation peak memory takes 1.09GiB; the rest of the memory reserved for KV Cache is 22.73GiB.
-[36m(LLMRayActor pid=42924)[0m INFO 05-26 19:40:02 [executor_base.py:111] # cuda blocks: 26598, # CPU blocks: 4681
-[36m(LLMRayActor pid=42924)[0m INFO 05-26 19:40:02 [executor_base.py:116] Maximum concurrency for 8192 tokens per request: 51.95x

logs/20250526_194456/process_pids.txt DELETED Viewed

	@@ -1,2 +0,0 @@
1	- Remote RM PID: 11287
2	- Train PID: 11288

logs/20250526_194456/remote_rm_qa.log DELETED Viewed

File without changes

logs/20250526_194456/train.log DELETED Viewed

@@ -1,92 +0,0 @@
-Traceback (most recent call last):
-  File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/urllib3/connection.py", line 198, in _new_conn
-    sock = connection.create_connection(
-  File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/urllib3/util/connection.py", line 85, in create_connection
-    raise err
-  File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/urllib3/util/connection.py", line 73, in create_connection
-    sock.connect(sa)
-ConnectionRefusedError: [Errno 111] Connection refused
-The above exception was the direct cause of the following exception:
-Traceback (most recent call last):
-  File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/urllib3/connectionpool.py", line 787, in urlopen
-    response = self._make_request(
-  File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/urllib3/connectionpool.py", line 493, in _make_request
-    conn.request(
-  File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/urllib3/connection.py", line 445, in request
-    self.endheaders()
-  File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/http/client.py", line 1278, in endheaders
-    self._send_output(message_body, encode_chunked=encode_chunked)
-  File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/http/client.py", line 1038, in _send_output
-    self.send(msg)
-  File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/http/client.py", line 976, in send
-    self.connect()
-  File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/urllib3/connection.py", line 276, in connect
-    self.sock = self._new_conn()
-  File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/urllib3/connection.py", line 213, in _new_conn
-    raise NewConnectionError(
-urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7fabc4ee1cf0>: Failed to establish a new connection: [Errno 111] Connection refused
-The above exception was the direct cause of the following exception:
-Traceback (most recent call last):
-  File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/requests/adapters.py", line 667, in send
-    resp = conn.urlopen(
-  File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/urllib3/connectionpool.py", line 841, in urlopen
-    retries = retries.increment(
-  File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/urllib3/util/retry.py", line 519, in increment
-    raise MaxRetryError(_pool, url, reason) from reason  # type: ignore[arg-type]
-urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='127.0.0.1', port=2983): Max retries exceeded with url: /api/version (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fabc4ee1cf0>: Failed to establish a new connection: [Errno 111] Connection refused'))
-During handling of the above exception, another exception occurred:
-Traceback (most recent call last):
-  File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/ray/dashboard/modules/dashboard_sdk.py", line 262, in _check_connection_and_version_with_url
-    r = self._do_request("GET", url)
-  File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/ray/dashboard/modules/dashboard_sdk.py", line 303, in _do_request
-    return requests.request(
-  File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/requests/api.py", line 59, in request
-    return session.request(method=method, url=url, **kwargs)
-  File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/requests/sessions.py", line 589, in request
-    resp = self.send(prep, **send_kwargs)
-  File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/requests/sessions.py", line 703, in send
-    r = adapter.send(request, **kwargs)
-  File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/requests/adapters.py", line 700, in send
-    raise ConnectionError(e, request=request)
-requests.exceptions.ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=2983): Max retries exceeded with url: /api/version (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fabc4ee1cf0>: Failed to establish a new connection: [Errno 111] Connection refused'))
-During handling of the above exception, another exception occurred:
-Traceback (most recent call last):
-  File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/bin/ray", line 8, in <module>
-    sys.exit(main())
-  File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/ray/scripts/scripts.py", line 2690, in main
-    return cli()
-  File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/click/core.py", line 1161, in __call__
-    return self.main(*args, **kwargs)
-  File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/click/core.py", line 1082, in main
-    rv = self.invoke(ctx)
-  File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/click/core.py", line 1697, in invoke
-    return _process_result(sub_ctx.command.invoke(sub_ctx))
-  File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/click/core.py", line 1697, in invoke
-    return _process_result(sub_ctx.command.invoke(sub_ctx))
-  File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/click/core.py", line 1443, in invoke
-    return ctx.invoke(self.callback, **ctx.params)
-  File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/click/core.py", line 788, in invoke
-    return __callback(*args, **kwargs)
-  File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/ray/dashboard/modules/job/cli_utils.py", line 54, in wrapper
-    return func(*args, **kwargs)
-  File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/ray/autoscaler/_private/cli_logger.py", line 823, in wrapper
-    return f(*args, **kwargs)
-  File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/ray/dashboard/modules/job/cli.py", line 267, in submit
-    client = _get_sdk_client(
-  File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/ray/dashboard/modules/job/cli.py", line 32, in _get_sdk_client
-    client = JobSubmissionClient(
-  File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/ray/dashboard/modules/job/sdk.py", line 105, in __init__
-    self._check_connection_and_version(
-  File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/ray/dashboard/modules/dashboard_sdk.py", line 248, in _check_connection_and_version
-    self._check_connection_and_version_with_url(min_version, version_error_message)
-  File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/ray/dashboard/modules/dashboard_sdk.py", line 278, in _check_connection_and_version_with_url
-    raise ConnectionError(
-ConnectionError: Failed to connect to Ray at address: http://127.0.0.1:2983.

logs/20250527_011343/process_pids.txt DELETED Viewed

	@@ -1,2 +0,0 @@
1	- Remote RM PID: 4686
2	- Train PID: 4687

logs/20250527_011343/remote_rm_qa.log DELETED Viewed

@@ -1,9 +0,0 @@
-[2025-05-27 01:14:20,482] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect)
-load dataset success
- * Serving Flask app 'math_verifier_wolatex'
- * Debug mode: off
-WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
- * Running on all addresses (0.0.0.0)
- * Running on http://127.0.0.1:2394
- * Running on http://10.140.0.151:2394
-Press CTRL+C to quit

logs/20250527_011343/train.log DELETED Viewed

@@ -1,45 +0,0 @@
-2025-05-27 01:14:05,551	INFO dashboard_sdk.py:338 -- Uploading package gcs://_ray_pkg_ebceb3f924b2a11c.zip.
-2025-05-27 01:14:05,551	INFO packaging.py:575 -- Creating a file package for local module '/mnt/petrelfs/luyiting/MultiAgentEval/lmm-r1'.
-2025-05-27 01:14:04,421	INFO cli.py:39 -- [37mJob submission server address[39m: [1mhttp://127.0.0.1:2983[22m
-Traceback (most recent call last):
-  File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/bin/ray", line 8, in <module>
-    sys.exit(main())
-  File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/ray/scripts/scripts.py", line 2690, in main
-    return cli()
-  File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/click/core.py", line 1161, in __call__
-    return self.main(*args, **kwargs)
-  File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/click/core.py", line 1082, in main
-    rv = self.invoke(ctx)
-  File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/click/core.py", line 1697, in invoke
-    return _process_result(sub_ctx.command.invoke(sub_ctx))
-  File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/click/core.py", line 1697, in invoke
-    return _process_result(sub_ctx.command.invoke(sub_ctx))
-  File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/click/core.py", line 1443, in invoke
-    return ctx.invoke(self.callback, **ctx.params)
-  File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/click/core.py", line 788, in invoke
-    return __callback(*args, **kwargs)
-  File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/ray/dashboard/modules/job/cli_utils.py", line 54, in wrapper
-    return func(*args, **kwargs)
-  File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/ray/autoscaler/_private/cli_logger.py", line 823, in wrapper
-    return f(*args, **kwargs)
-  File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/ray/dashboard/modules/job/cli.py", line 276, in submit
-    job_id = client.submit_job(
-  File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/ray/dashboard/modules/job/sdk.py", line 250, in submit_job
-    self._raise_error(r)
-  File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/ray/dashboard/modules/dashboard_sdk.py", line 283, in _raise_error
-    raise RuntimeError(
-RuntimeError: Request failed with status code 500: Traceback (most recent call last):
-  File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/ray/dashboard/modules/job/job_head.py", line 388, in submit_job
-    resp = await job_agent_client.submit_job_internal(submit_request)
-  File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/ray/dashboard/modules/job/job_head.py", line 82, in submit_job_internal
-    async with self._session.post(
-  File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/aiohttp/client.py", line 1425, in __aenter__
-    self._resp: _RetType = await self._coro
-  File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/aiohttp/client.py", line 730, in _request
-    await resp.start(conn)
-  File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/aiohttp/client_reqrep.py", line 1059, in start
-    message, payload = await protocol.read()  # type: ignore[union-attr]
-  File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/aiohttp/streams.py", line 672, in read
-    await self._waiter
-aiohttp.client_exceptions.ServerDisconnectedError: Server disconnected
-.

logs/20250527_095510/process_pids.txt DELETED Viewed

	@@ -1,2 +0,0 @@
1	- Remote RM PID: 299008
2	- Train PID: 299009

logs/20250527_095510/remote_rm_qa.log DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:8687dd357ffe27589cfa1830fb7e278b2da07b36db566391235527667230c5da
-size 68981940

logs/20250527_095510/train.log DELETED Viewed

The diff for this file is too large to render. See raw diff

logs/20250527_235509/process_pids.txt DELETED Viewed

	@@ -1,2 +0,0 @@
1	- Remote RM PID: 412160
2	- Train PID: 412161

logs/20250527_235509/remote_rm_qa.log DELETED Viewed

The diff for this file is too large to render. See raw diff

logs/20250527_235509/train.log DELETED Viewed

The diff for this file is too large to render. See raw diff

logs/20250528_110535/process_pids.txt DELETED Viewed

	@@ -1,2 +0,0 @@
1	- Remote RM PID: 268533
2	- Train PID: 268534

logs/20250528_110535/remote_rm_qa.log DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:34ca2314ebf41df0ef2e41309a606861f152d7d4f8c45c559336e02c69cdaea1
-size 22480261

logs/20250528_110535/train.log DELETED Viewed

The diff for this file is too large to render. See raw diff

logs/20250528_161139/process_pids.txt DELETED Viewed

	@@ -1,2 +0,0 @@
1	- Remote RM PID: 395733
2	- Train PID: 395734

logs/20250528_161139/remote_rm_qa.log DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:1c3ce9ee9d15d3a606842f7b7fcadd18f5efe8426c15279342e86e14f021a801
-size 59636609

logs/20250528_161139/train.log DELETED Viewed

The diff for this file is too large to render. See raw diff

logs/20250529_214257/process_pids.txt DELETED Viewed

	@@ -1,2 +0,0 @@
1	- Remote RM PID: 332906
2	- Train PID: 332907

logs/20250529_214257/remote_rm_qa.log DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:1a81eeda8d4083e2da026795759cdbf80c7be314480b3705f06293d4cd26cbb4
-size 31232415

logs/20250529_214257/train.log DELETED Viewed

The diff for this file is too large to render. See raw diff

logs/20250611_110725/process_pids.txt DELETED Viewed

	@@ -1,2 +0,0 @@
1	- Remote RM PID: 128943
2	- Train PID: 128944

logs/20250611_110725/remote_rm_qa.log DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:b03d175ee35078ea79f97a809654b6eb1c8f9b7944755da98a05420253009b30
-size 20260314

logs/20250611_110725/train.log DELETED Viewed

The diff for this file is too large to render. See raw diff

logs/20250611_150946/process_pids.txt DELETED Viewed

	@@ -1,2 +0,0 @@
1	- Remote RM PID: 328854
2	- Train PID: 328855

logs/20250611_150946/remote_rm_qa.log DELETED Viewed

The diff for this file is too large to render. See raw diff

logs/20250611_150946/train.log DELETED Viewed

The diff for this file is too large to render. See raw diff

logs/20250611_160325/process_pids.txt DELETED Viewed

	@@ -1,2 +0,0 @@
1	- Remote RM PID: 459479
2	- Train PID: 459480

logs/20250611_160325/remote_rm_qa.log DELETED Viewed

@@ -1,9 +0,0 @@
-[2025-06-11 16:04:02,932] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect)
-load dataset success
- * Serving Flask app 'math_verifier_wolatex'
- * Debug mode: off
-WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
- * Running on all addresses (0.0.0.0)
- * Running on http://127.0.0.1:2399
- * Running on http://10.140.1.42:2399
-Press CTRL+C to quit

logs/20250611_160325/train.log DELETED Viewed

The diff for this file is too large to render. See raw diff

logs/20250611_161239/process_pids.txt DELETED Viewed

	@@ -1,2 +0,0 @@
1	- Remote RM PID: 104691
2	- Train PID: 104692

logs/20250611_161239/remote_rm_qa.log DELETED Viewed

@@ -1,9 +0,0 @@
-[2025-06-11 16:13:16,143] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect)
-load dataset success
- * Serving Flask app 'math_verifier_wolatex'
- * Debug mode: off
-WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
- * Running on all addresses (0.0.0.0)
- * Running on http://127.0.0.1:2399
- * Running on http://10.140.1.48:2399
-Press CTRL+C to quit

logs/20250611_161239/train.log DELETED Viewed

@@ -1,188 +0,0 @@
-2025-06-11 16:13:02,348	INFO dashboard_sdk.py:338 -- Uploading package gcs://_ray_pkg_e427a4376bfc803e.zip.
-2025-06-11 16:13:02,349	INFO packaging.py:575 -- Creating a file package for local module '/mnt/petrelfs/luyiting/MultiAgentEval/lmm-r1'.
-2025-06-11 16:13:00,858	INFO cli.py:39 -- [37mJob submission server address[39m: [1mhttp://127.0.0.1:2989[22m
-2025-06-11 16:13:08,059	SUCC cli.py:63 -- [32m-------------------------------------------------------[39m
-2025-06-11 16:13:08,059	SUCC cli.py:64 -- [32mJob 'raysubmit_aazeFP8fmRtZyntC' submitted successfully[39m
-2025-06-11 16:13:08,059	SUCC cli.py:65 -- [32m-------------------------------------------------------[39m
-2025-06-11 16:13:08,059	INFO cli.py:289 -- [36mNext steps[39m
-2025-06-11 16:13:08,060	INFO cli.py:290 -- Query the logs of the job:
-2025-06-11 16:13:08,060	INFO cli.py:292 -- [1mray job logs raysubmit_aazeFP8fmRtZyntC[22m
-2025-06-11 16:13:08,060	INFO cli.py:294 -- Query the status of the job:
-2025-06-11 16:13:08,060	INFO cli.py:296 -- [1mray job status raysubmit_aazeFP8fmRtZyntC[22m
-2025-06-11 16:13:08,060	INFO cli.py:298 -- Request the job to be stopped:
-2025-06-11 16:13:08,060	INFO cli.py:300 -- [1mray job stop raysubmit_aazeFP8fmRtZyntC[22m
-2025-06-11 16:13:08,062	INFO cli.py:307 -- Tailing logs until the job exits (disable with --no-wait):
-2025-06-11 16:13:07,524	INFO job_manager.py:531 -- Runtime env is setting up.
-[2025-06-11 16:13:32,045] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect)
-INFO 06-11 16:13:37 [__init__.py:239] Automatically detected platform cuda.
-2025-06-11 16:13:38,183	INFO worker.py:1520 -- Using address 10.140.1.48:6239 set in the environment variable RAY_ADDRESS
-2025-06-11 16:13:38,184	INFO worker.py:1660 -- Connecting to existing Ray cluster at address: 10.140.1.48:6239...
-2025-06-11 16:13:38,204	INFO worker.py:1843 -- Connected to Ray cluster. View the dashboard at [1m[32m10.140.1.48:2989 [39m[22m
-[36m(pid=117690)[0m INFO 06-11 16:14:04 [__init__.py:239] Automatically detected platform cuda.
-[36m(LLMRayActor pid=117685)[0m INFO 06-11 16:14:34 [config.py:585] This model supports multiple tasks: {'reward', 'score', 'embed', 'classify', 'generate'}. Defaulting to 'generate'.
-[36m(LLMRayActor pid=117685)[0m WARNING 06-11 16:14:34 [arg_utils.py:1846] VLLM_ATTENTION_BACKEND=triton is not supported by the V1 Engine. Falling back to V0. We recommend to remove VLLM_ATTENTION_BACKEND=triton from your config in favor of the V1 Engine.
-[36m(LLMRayActor pid=117685)[0m WARNING 06-11 16:14:34 [arg_utils.py:1745] --enable-prefix-caching is not supported for multimodal models in V0 and has been disabled.
-[36m(LLMRayActor pid=117685)[0m INFO 06-11 16:14:34 [llm_engine.py:241] Initializing a V0 LLM engine (v0.8.2.dev76+gf68cce8) with config: model='/mnt/petrelfs/luyiting/ckt/Qwen2.5-VL-7B-Instruct/Qwen2.5-VL-7B-Instruct/', speculative_config=None, tokenizer='/mnt/petrelfs/luyiting/ckt/Qwen2.5-VL-7B-Instruct/Qwen2.5-VL-7B-Instruct/', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=8192, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto,  device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='xgrammar', reasoning_backend=None), observability_config=ObservabilityConfig(show_hidden_metrics=False, otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=42, served_model_name=/mnt/petrelfs/luyiting/ckt/Qwen2.5-VL-7B-Instruct/Qwen2.5-VL-7B-Instruct/, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=False, chunked_prefill_enabled=False, use_async_output_proc=True, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"splitting_ops":[],"compile_sizes":[],"cudagraph_capture_sizes":[256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":256}, use_cached_outputs=False,
-[36m(pid=117689)[0m INFO 06-11 16:14:04 [__init__.py:239] Automatically detected platform cuda.[32m [repeated 7x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/user-guides/configure-logging.html#log-deduplication for more options.)[0m
-[36m(LLMRayActor pid=117691)[0m INFO 06-11 16:14:34 [config.py:585] This model supports multiple tasks: {'classify', 'score', 'generate', 'embed', 'reward'}. Defaulting to 'generate'.
-[36m(LLMRayActor pid=117686)[0m INFO 06-11 16:14:34 [config.py:585] This model supports multiple tasks: {'embed', 'generate', 'score', 'classify', 'reward'}. Defaulting to 'generate'.
-[36m(LLMRayActor pid=117689)[0m INFO 06-11 16:14:34 [config.py:585] This model supports multiple tasks: {'reward', 'score', 'generate', 'classify', 'embed'}. Defaulting to 'generate'.
-[36m(LLMRayActor pid=117690)[0m INFO 06-11 16:14:34 [config.py:585] This model supports multiple tasks: {'reward', 'generate', 'score', 'embed', 'classify'}. Defaulting to 'generate'.
-[36m(LLMRayActor pid=117688)[0m INFO 06-11 16:14:34 [config.py:585] This model supports multiple tasks: {'generate', 'reward', 'embed', 'score', 'classify'}. Defaulting to 'generate'.
-[36m(LLMRayActor pid=117687)[0m INFO 06-11 16:14:34 [config.py:585] This model supports multiple tasks: {'embed', 'reward', 'score', 'generate', 'classify'}. Defaulting to 'generate'.
-[36m(LLMRayActor pid=117692)[0m INFO 06-11 16:14:34 [config.py:585] This model supports multiple tasks: {'classify', 'reward', 'generate', 'embed', 'score'}. Defaulting to 'generate'.
-[36m(LLMRayActor pid=117685)[0m [2025-06-11 16:14:37,962] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect)
-[36m(LLMRayActor pid=117685)[0m INFO 06-11 16:14:43 [cuda.py:293] Using Flash Attention backend.
-[36m(LLMRayActor pid=117692)[0m WARNING 06-11 16:14:34 [arg_utils.py:1846] VLLM_ATTENTION_BACKEND=triton is not supported by the V1 Engine. Falling back to V0. We recommend to remove VLLM_ATTENTION_BACKEND=triton from your config in favor of the V1 Engine.[32m [repeated 7x across cluster][0m
-[36m(LLMRayActor pid=117692)[0m WARNING 06-11 16:14:34 [arg_utils.py:1745] --enable-prefix-caching is not supported for multimodal models in V0 and has been disabled.[32m [repeated 7x across cluster][0m
-[36m(LLMRayActor pid=117692)[0m INFO 06-11 16:14:34 [llm_engine.py:241] Initializing a V0 LLM engine (v0.8.2.dev76+gf68cce8) with config: model='/mnt/petrelfs/luyiting/ckt/Qwen2.5-VL-7B-Instruct/Qwen2.5-VL-7B-Instruct/', speculative_config=None, tokenizer='/mnt/petrelfs/luyiting/ckt/Qwen2.5-VL-7B-Instruct/Qwen2.5-VL-7B-Instruct/', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=8192, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto,  device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='xgrammar', reasoning_backend=None), observability_config=ObservabilityConfig(show_hidden_metrics=False, otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=45, served_model_name=/mnt/petrelfs/luyiting/ckt/Qwen2.5-VL-7B-Instruct/Qwen2.5-VL-7B-Instruct/, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=False, chunked_prefill_enabled=False, use_async_output_proc=True, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"splitting_ops":[],"compile_sizes":[],"cudagraph_capture_sizes":[256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":256}, use_cached_outputs=False, [32m [repeated 7x across cluster][0m
-[36m(LLMRayActor pid=117692)[0m [2025-06-11 16:14:37,973] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect)[32m [repeated 7x across cluster][0m
-[36m(LLMRayActor pid=117685)[0m INFO 06-11 16:14:47 [parallel_state.py:967] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0
-[36m(LLMRayActor pid=117685)[0m INFO 06-11 16:14:47 [model_runner.py:1110] Starting to load model /mnt/petrelfs/luyiting/ckt/Qwen2.5-VL-7B-Instruct/Qwen2.5-VL-7B-Instruct/...
-[36m(LLMRayActor pid=117685)[0m INFO 06-11 16:14:48 [config.py:3229] cudagraph sizes specified by model runner [1, 2, 4, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160, 168, 176, 184, 192, 200, 208, 216, 224, 232, 240, 248, 256] is overridden by config [256, 128, 2, 1, 4, 136, 8, 144, 16, 152, 24, 160, 32, 168, 40, 176, 48, 184, 56, 192, 64, 200, 72, 208, 80, 216, 88, 120, 224, 96, 232, 104, 240, 112, 248]
-[36m(LLMRayActor pid=117692)[0m
-Loading safetensors checkpoint shards:   0% Completed | 0/5 [00:00<?, ?it/s]
-[36m(LLMRayActor pid=117692)[0m INFO 06-11 16:14:43 [cuda.py:293] Using Flash Attention backend.[32m [repeated 7x across cluster][0m
-[36m(LLMRayActor pid=117685)[0m
-Loading safetensors checkpoint shards:  20% Completed | 1/5 [00:00<00:03,  1.24it/s]
-[36m(LLMRayActor pid=117689)[0m
-Loading safetensors checkpoint shards:   0% Completed | 0/5 [00:00<?, ?it/s][32m [repeated 7x across cluster][0m
-[36m(LLMRayActor pid=117685)[0m
-Loading safetensors checkpoint shards:  60% Completed | 3/5 [00:08<00:06,  3.04s/it][32m [repeated 16x across cluster][0m
-[36m(LLMRayActor pid=117685)[0m INFO 06-11 16:15:04 [loader.py:429] Loading weights took 15.74 seconds
-[36m(LLMRayActor pid=117689)[0m INFO 06-11 16:14:47 [parallel_state.py:967] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0[32m [repeated 7x across cluster][0m
-[36m(LLMRayActor pid=117689)[0m INFO 06-11 16:14:47 [model_runner.py:1110] Starting to load model /mnt/petrelfs/luyiting/ckt/Qwen2.5-VL-7B-Instruct/Qwen2.5-VL-7B-Instruct/...[32m [repeated 7x across cluster][0m
-[36m(LLMRayActor pid=117690)[0m INFO 06-11 16:14:48 [config.py:3229] cudagraph sizes specified by model runner [1, 2, 4, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160, 168, 176, 184, 192, 200, 208, 216, 224, 232, 240, 248, 256] is overridden by config [256, 128, 2, 1, 4, 136, 8, 144, 16, 152, 24, 160, 32, 168, 40, 176, 48, 184, 56, 192, 64, 200, 72, 208, 80, 216, 88, 120, 224, 96, 232, 104, 240, 112, 248][32m [repeated 7x across cluster][0m
-[36m(LLMRayActor pid=117685)[0m
-[36m(LLMRayActor pid=117685)[0m
-Loading safetensors checkpoint shards: 100% Completed | 5/5 [00:15<00:00,  3.12s/it][32m [repeated 17x across cluster][0m
-[36m(LLMRayActor pid=117690)[0m
-[36m(LLMRayActor pid=117687)[0m
-[36m(LLMRayActor pid=117691)[0m
-[36m(LLMRayActor pid=117689)[0m
-[36m(LLMRayActor pid=117692)[0m
-[36m(LLMRayActor pid=117686)[0m
-[36m(LLMRayActor pid=117688)[0m
-[36m(LLMRayActor pid=117692)[0m INFO 06-11 16:15:04 [model_runner.py:1146] Model loading took 15.6271 GB and 16.982959 seconds
-[36m(LLMRayActor pid=117692)[0m Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.
-[36m(LLMRayActor pid=117692)[0m WARNING 06-11 16:15:05 [model_runner.py:1296] Computed max_num_seqs (min(256, 8192 // 32768)) to be less than 1. Setting it to the minimum value of 1.
-[36m(LLMRayActor pid=117685)[0m WARNING 06-11 16:15:10 [profiling.py:222] The sequence length used for profiling (max_num_batched_tokens / max_num_seqs = 8192) is too short to hold the multi-modal embeddings in the worst case (32768 tokens in total, out of which {'image': 16384, 'video': 16384} are reserved for multi-modal embeddings). This may cause certain multi-modal inputs to fail during inference, even when the input text is short. To avoid this, you should increase `max_model_len`, reduce `max_num_seqs`, and/or reduce `mm_counts`.
-[36m(LLMRayActor pid=117692)[0m INFO 06-11 16:15:04 [loader.py:429] Loading weights took 15.90 seconds[32m [repeated 7x across cluster][0m
-[36m(LLMRayActor pid=117687)[0m INFO 06-11 16:15:04 [model_runner.py:1146] Model loading took 15.6271 GB and 17.057258 seconds[32m [repeated 7x across cluster][0m
-[36m(LLMRayActor pid=117687)[0m WARNING 06-11 16:15:05 [model_runner.py:1296] Computed max_num_seqs (min(256, 8192 // 32768)) to be less than 1. Setting it to the minimum value of 1.[32m [repeated 7x across cluster][0m
-[36m(LLMRayActor pid=117685)[0m INFO 06-11 16:15:12 [worker.py:267] Memory profiling takes 8.24 seconds
-[36m(LLMRayActor pid=117685)[0m INFO 06-11 16:15:12 [worker.py:267] the current vLLM instance can use total_gpu_memory (79.32GiB) x gpu_memory_utilization (0.50) = 39.66GiB
-[36m(LLMRayActor pid=117685)[0m INFO 06-11 16:15:12 [worker.py:267] model weights take 15.63GiB; non_torch_memory takes 0.21GiB; PyTorch activation peak memory takes 1.09GiB; the rest of the memory reserved for KV Cache is 22.73GiB.
-[36m(LLMRayActor pid=117685)[0m INFO 06-11 16:15:13 [executor_base.py:111] # cuda blocks: 26598, # CPU blocks: 4681
-[36m(LLMRayActor pid=117685)[0m INFO 06-11 16:15:13 [executor_base.py:116] Maximum concurrency for 8192 tokens per request: 51.95x
-[36m(LLMRayActor pid=117688)[0m CUDA Error: out of memory at /mnt/petrelfs/luyiting/MultiAgentEval/vllm/csrc/cumem_allocator.cpp:62
-[36m(LLMRayActor pid=117688)[0m
-Loading safetensors checkpoint shards: 100% Completed | 5/5 [00:15<00:00,  3.12s/it][32m [repeated 14x across cluster][0m
-[36m(LLMRayActor pid=117688)[0m Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.[32m [repeated 7x across cluster][0m
-Traceback (most recent call last):
-  File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/runpy.py", line 196, in _run_module_as_main
-    return _run_code(code, main_globals, None,
-  File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/runpy.py", line 86, in _run_code
-    exec(code, run_globals)
-  File "/mnt/petrelfs/luyiting/tmp_ray/session_2025-06-11_16-12-48_168623_103703/runtime_resources/working_dir_files/_ray_pkg_e427a4376bfc803e/openrlhf/cli/train_ppo_ray.py", line 497, in <module>
-    train(args)
-  File "/mnt/petrelfs/luyiting/tmp_ray/session_2025-06-11_16-12-48_168623_103703/runtime_resources/working_dir_files/_ray_pkg_e427a4376bfc803e/openrlhf/cli/train_ppo_ray.py", line 86, in train
-    vllm_engines = create_vllm_engines(
-  File "/mnt/petrelfs/luyiting/tmp_ray/session_2025-06-11_16-12-48_168623_103703/runtime_resources/working_dir_files/_ray_pkg_e427a4376bfc803e/openrlhf/trainer/ray/vllm_engine.py", line 189, in create_vllm_engines
-    batch_vllm_engine_call(vllm_engines, "sleep", rank_0_only=False)
-  File "/mnt/petrelfs/luyiting/tmp_ray/session_2025-06-11_16-12-48_168623_103703/runtime_resources/working_dir_files/_ray_pkg_e427a4376bfc803e/openrlhf/trainer/ray/vllm_engine.py", line 216, in batch_vllm_engine_call
-    return ray.get(refs)
-  File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 21, in auto_init_wrapper
-    return fn(*args, **kwargs)
-  File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
-    return func(*args, **kwargs)
-  File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/ray/_private/worker.py", line 2782, in get
-    values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
-  File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/ray/_private/worker.py", line 931, in get_objects
-    raise value
-ray.exceptions.ActorDiedError: The actor died because of an error raised in its creation task, [36mray::LLMRayActor.__init__()[39m (pid=117688, ip=10.140.1.48, actor_id=ac9437056a85b29812c677da02000000, repr=<openrlhf.trainer.ray.vllm_engine.LLMRayActor object at 0x7f59ee72e170>)
-  File "/mnt/petrelfs/luyiting/tmp_ray/session_2025-06-11_16-12-48_168623_103703/runtime_resources/working_dir_files/_ray_pkg_e427a4376bfc803e/openrlhf/trainer/ray/vllm_engine.py", line 54, in __init__
-    self.llm = LLM(*args, **kwargs)
-  File "/mnt/petrelfs/luyiting/MultiAgentEval/vllm/vllm/utils.py", line 1037, in inner
-    return fn(*args, **kwargs)
-  File "/mnt/petrelfs/luyiting/MultiAgentEval/vllm/vllm/entrypoints/llm.py", line 243, in __init__
-    self.llm_engine = LLMEngine.from_engine_args(
-  File "/mnt/petrelfs/luyiting/MultiAgentEval/vllm/vllm/engine/llm_engine.py", line 520, in from_engine_args
-    return engine_cls.from_vllm_config(
-  File "/mnt/petrelfs/luyiting/MultiAgentEval/vllm/vllm/engine/llm_engine.py", line 496, in from_vllm_config
-    return cls(
-  File "/mnt/petrelfs/luyiting/MultiAgentEval/vllm/vllm/engine/llm_engine.py", line 283, in __init__
-    self._initialize_kv_caches()
-  File "/mnt/petrelfs/luyiting/MultiAgentEval/vllm/vllm/engine/llm_engine.py", line 445, in _initialize_kv_caches
-    self.model_executor.initialize_cache(num_gpu_blocks, num_cpu_blocks)
-  File "/mnt/petrelfs/luyiting/MultiAgentEval/vllm/vllm/executor/executor_base.py", line 122, in initialize_cache
-    self.collective_rpc("initialize_cache",
-  File "/mnt/petrelfs/luyiting/MultiAgentEval/vllm/vllm/executor/uniproc_executor.py", line 56, in collective_rpc
-    answer = run_method(self.driver_worker, method, args, kwargs)
-  File "/mnt/petrelfs/luyiting/MultiAgentEval/vllm/vllm/utils.py", line 2255, in run_method
-    return func(*args, **kwargs)
-  File "/mnt/petrelfs/luyiting/MultiAgentEval/vllm/vllm/worker/worker.py", line 307, in initialize_cache
-    self._init_cache_engine()
-  File "/mnt/petrelfs/luyiting/MultiAgentEval/vllm/vllm/worker/worker.py", line 312, in _init_cache_engine
-    self.cache_engine = [
-  File "/mnt/petrelfs/luyiting/MultiAgentEval/vllm/vllm/worker/worker.py", line 313, in <listcomp>
-    CacheEngine(self.cache_config, self.model_config,
-  File "/mnt/petrelfs/luyiting/MultiAgentEval/vllm/vllm/worker/cache_engine.py", line 64, in __init__
-    self.gpu_cache = self._allocate_kv_cache(
-  File "/mnt/petrelfs/luyiting/MultiAgentEval/vllm/vllm/worker/cache_engine.py", line 83, in _allocate_kv_cache
-    layer_kv_cache = torch.zeros(kv_cache_shape,
-torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 832.00 MiB. GPU 0 has a total capacity of 79.32 GiB of which 611.56 MiB is free. Process 71866 has 48.06 GiB memory in use. Including non-PyTorch memory, this process has 30.66 GiB memory in use. Of the allocated memory 29.60 GiB is allocated by PyTorch, with 159.90 MiB allocated in private pools (e.g., CUDA Graphs), and 13.87 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
-[36m(LLMRayActor pid=117688)[0m Exception raised in creation task: The actor died because of an error raised in its creation task, [36mray::LLMRayActor.__init__()[39m (pid=117688, ip=10.140.1.48, actor_id=ac9437056a85b29812c677da02000000, repr=<openrlhf.trainer.ray.vllm_engine.LLMRayActor object at 0x7f59ee72e170>)
-[36m(LLMRayActor pid=117688)[0m   File "/mnt/petrelfs/luyiting/tmp_ray/session_2025-06-11_16-12-48_168623_103703/runtime_resources/working_dir_files/_ray_pkg_e427a4376bfc803e/openrlhf/trainer/ray/vllm_engine.py", line 54, in __init__
-[36m(LLMRayActor pid=117688)[0m     self.llm = LLM(*args, **kwargs)
-[36m(LLMRayActor pid=117688)[0m   File "/mnt/petrelfs/luyiting/MultiAgentEval/vllm/vllm/utils.py", line 1037, in inner
-[36m(LLMRayActor pid=117688)[0m     return fn(*args, **kwargs)
-[36m(LLMRayActor pid=117688)[0m   File "/mnt/petrelfs/luyiting/MultiAgentEval/vllm/vllm/entrypoints/llm.py", line 243, in __init__
-[36m(LLMRayActor pid=117688)[0m     self.llm_engine = LLMEngine.from_engine_args(
-[36m(LLMRayActor pid=117688)[0m   File "/mnt/petrelfs/luyiting/MultiAgentEval/vllm/vllm/engine/llm_engine.py", line 520, in from_engine_args
-[36m(LLMRayActor pid=117688)[0m     return engine_cls.from_vllm_config(
-[36m(LLMRayActor pid=117688)[0m   File "/mnt/petrelfs/luyiting/MultiAgentEval/vllm/vllm/engine/llm_engine.py", line 496, in from_vllm_config
-[36m(LLMRayActor pid=117688)[0m     return cls(
-[36m(LLMRayActor pid=117688)[0m   File "/mnt/petrelfs/luyiting/MultiAgentEval/vllm/vllm/engine/llm_engine.py", line 283, in __init__
-[36m(LLMRayActor pid=117688)[0m     self._initialize_kv_caches()
-[36m(LLMRayActor pid=117688)[0m   File "/mnt/petrelfs/luyiting/MultiAgentEval/vllm/vllm/engine/llm_engine.py", line 445, in _initialize_kv_caches
-[36m(LLMRayActor pid=117688)[0m     self.model_executor.initialize_cache(num_gpu_blocks, num_cpu_blocks)
-[36m(LLMRayActor pid=117688)[0m   File "/mnt/petrelfs/luyiting/MultiAgentEval/vllm/vllm/executor/executor_base.py", line 122, in initialize_cache
-[36m(LLMRayActor pid=117688)[0m     self.collective_rpc("initialize_cache",
-[36m(LLMRayActor pid=117688)[0m   File "/mnt/petrelfs/luyiting/MultiAgentEval/vllm/vllm/executor/uniproc_executor.py", line 56, in collective_rpc
-[36m(LLMRayActor pid=117688)[0m     answer = run_method(self.driver_worker, method, args, kwargs)
-[36m(LLMRayActor pid=117688)[0m   File "/mnt/petrelfs/luyiting/MultiAgentEval/vllm/vllm/utils.py", line 2255, in run_method
-[36m(LLMRayActor pid=117688)[0m     return func(*args, **kwargs)
-[36m(LLMRayActor pid=117688)[0m   File "/mnt/petrelfs/luyiting/MultiAgentEval/vllm/vllm/worker/worker.py", line 307, in initialize_cache
-[36m(LLMRayActor pid=117688)[0m     self._init_cache_engine()
-[36m(LLMRayActor pid=117688)[0m   File "/mnt/petrelfs/luyiting/MultiAgentEval/vllm/vllm/worker/worker.py", line 312, in _init_cache_engine
-[36m(LLMRayActor pid=117688)[0m     self.cache_engine = [
-[36m(LLMRayActor pid=117688)[0m   File "/mnt/petrelfs/luyiting/MultiAgentEval/vllm/vllm/worker/worker.py", line 313, in <listcomp>
-[36m(LLMRayActor pid=117688)[0m     CacheEngine(self.cache_config, self.model_config,
-[36m(LLMRayActor pid=117688)[0m   File "/mnt/petrelfs/luyiting/MultiAgentEval/vllm/vllm/worker/cache_engine.py", line 64, in __init__
-[36m(LLMRayActor pid=117688)[0m     self.gpu_cache = self._allocate_kv_cache(
-[36m(LLMRayActor pid=117688)[0m   File "/mnt/petrelfs/luyiting/MultiAgentEval/vllm/vllm/worker/cache_engine.py", line 83, in _allocate_kv_cache
-[36m(LLMRayActor pid=117688)[0m     layer_kv_cache = torch.zeros(kv_cache_shape,
-[36m(LLMRayActor pid=117688)[0m torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 832.00 MiB. GPU 0 has a total capacity of 79.32 GiB of which 611.56 MiB is free. Process 71866 has 48.06 GiB memory in use. Including non-PyTorch memory, this process has 30.66 GiB memory in use. Of the allocated memory 29.60 GiB is allocated by PyTorch, with 159.90 MiB allocated in private pools (e.g., CUDA Graphs), and 13.87 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
-[36m(LLMRayActor pid=117687)[0m WARNING 06-11 16:15:11 [profiling.py:222] The sequence length used for profiling (max_num_batched_tokens / max_num_seqs = 8192) is too short to hold the multi-modal embeddings in the worst case (32768 tokens in total, out of which {'image': 16384, 'video': 16384} are reserved for multi-modal embeddings). This may cause certain multi-modal inputs to fail during inference, even when the input text is short. To avoid this, you should increase `max_model_len`, reduce `max_num_seqs`, and/or reduce `mm_counts`.[32m [repeated 7x across cluster][0m
-[36m(LLMRayActor pid=117687)[0m INFO 06-11 16:15:13 [worker.py:267] Memory profiling takes 8.37 seconds[32m [repeated 7x across cluster][0m
-[36m(LLMRayActor pid=117687)[0m INFO 06-11 16:15:13 [worker.py:267] the current vLLM instance can use total_gpu_memory (79.32GiB) x gpu_memory_utilization (0.50) = 39.66GiB[32m [repeated 7x across cluster][0m
-[36m(LLMRayActor pid=117687)[0m INFO 06-11 16:15:13 [worker.py:267] model weights take 15.63GiB; non_torch_memory takes 0.21GiB; PyTorch activation peak memory takes 1.09GiB; the rest of the memory reserved for KV Cache is 22.73GiB.[32m [repeated 7x across cluster][0m
-[36m(LLMRayActor pid=117687)[0m INFO 06-11 16:15:13 [executor_base.py:111] # cuda blocks: 26598, # CPU blocks: 4681[32m [repeated 7x across cluster][0m
-[36m(LLMRayActor pid=117687)[0m INFO 06-11 16:15:13 [executor_base.py:116] Maximum concurrency for 8192 tokens per request: 51.95x[32m [repeated 7x across cluster][0m
-[36m(LLMRayActor pid=117689)[0m CUDA Error: out of memory at /mnt/petrelfs/luyiting/MultiAgentEval/vllm/csrc/cumem_allocator.cpp:62[32m [repeated 2x across cluster][0m
-2025-06-11 16:15:19,325	ERR cli.py:71 -- [31m---------------------------------------[39m
-2025-06-11 16:15:19,326	ERR cli.py:72 -- [31mJob 'raysubmit_aazeFP8fmRtZyntC' failed[39m
-2025-06-11 16:15:19,326	ERR cli.py:73 -- [31m---------------------------------------[39m
-2025-06-11 16:15:19,326	INFO cli.py:86 -- Status message: Job entrypoint command failed with exit code 1, last available logs (truncated to 20,000 chars):
-[36m(LLMRayActor pid=117688)[0m   File "/mnt/petrelfs/luyiting/MultiAgentEval/vllm/vllm/worker/cache_engine.py", line 83, in _allocate_kv_cache
-[36m(LLMRayActor pid=117688)[0m     layer_kv_cache = torch.zeros(kv_cache_shape,
-[36m(LLMRayActor pid=117688)[0m torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 832.00 MiB. GPU 0 has a total capacity of 79.32 GiB of which 611.56 MiB is free. Process 71866 has 48.06 GiB memory in use. Including non-PyTorch memory, this process has 30.66 GiB memory in use. Of the allocated memory 29.60 GiB is allocated by PyTorch, with 159.90 MiB allocated in private pools (e.g., CUDA Graphs), and 13.87 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
-[36m(LLMRayActor pid=117687)[0m WARNING 06-11 16:15:11 [profiling.py:222] The sequence length used for profiling (max_num_batched_tokens / max_num_seqs = 8192) is too short to hold the multi-modal embeddings in the worst case (32768 tokens in total, out of which {'image': 16384, 'video': 16384} are reserved for multi-modal embeddings). This may cause certain multi-modal inputs to fail during inference, even when the input text is short. To avoid this, you should increase `max_model_len`, reduce `max_num_seqs`, and/or reduce `mm_counts`.[32m [repeated 7x across cluster][0m
-[36m(LLMRayActor pid=117687)[0m INFO 06-11 16:15:13 [worker.py:267] Memory profiling takes 8.37 seconds[32m [repeated 7x across cluster][0m
-[36m(LLMRayActor pid=117687)[0m INFO 06-11 16:15:13 [worker.py:267] the current vLLM instance can use total_gpu_memory (79.32GiB) x gpu_memory_utilization (0.50) = 39.66GiB[32m [repeated 7x across cluster][0m
-[36m(LLMRayActor pid=117687)[0m INFO 06-11 16:15:13 [worker.py:267] model weights take 15.63GiB; non_torch_memory takes 0.21GiB; PyTorch activation peak memory takes 1.09GiB; the rest of the memory reserved for KV Cache is 22.73GiB.[32m [repeated 7x across cluster][0m
-[36m(LLMRayActor pid=117687)[0m INFO 06-11 16:15:13 [executor_base.py:111] # cuda blocks: 26598, # CPU blocks: 4681[32m [repeated 7x across cluster][0m
-[36m(LLMRayActor pid=117687)[0m INFO 06-11 16:15:13 [executor_base.py:116] Maximum concurrency for 8192 tokens per request: 51.95x[32m [repeated 7x across cluster][0m
-[36m(LLMRayActor pid=117689)[0m CUDA Error: out of memory at /mnt/petrelfs/luyiting/MultiAgentEval/vllm/csrc/cumem_allocator.cpp:62[32m [repeated 2x across cluster][0m

logs/20250611_162203/process_pids.txt DELETED Viewed

	@@ -1,2 +0,0 @@
1	- Remote RM PID: 74069
2	- Train PID: 74070

logs/20250611_162203/remote_rm_qa.log DELETED Viewed

The diff for this file is too large to render. See raw diff