INFO 09-26 12:57:52 [__init__.py:239] Automatically detected platform cuda. === Processing model: /home/khanh/sla/sla_cpt/qwen2.5-0.5b_english_wiki_750M_chinese_wikipedia_corpus_2e_240925/checkpoint-2944, type: qwen === INFO 09-26 12:57:55 [config.py:2700] Downcasting torch.float32 to torch.float16. INFO 09-26 12:58:10 [config.py:600] This model supports multiple tasks: {'classify', 'embed', 'reward', 'score', 'generate'}. Defaulting to 'generate'. WARNING 09-26 12:58:10 [cuda.py:96] To see benefits of async output processing, enable CUDA graph. Since, enforce-eager is enabled, async output processor cannot be used INFO 09-26 12:58:10 [llm_engine.py:242] Initializing a V0 LLM engine (v0.8.3) with config: model='/home/khanh/sla/sla_cpt/qwen2.5-0.5b_english_wiki_750M_chinese_wikipedia_corpus_2e_240925/checkpoint-2944', speculative_config=None, tokenizer='/home/khanh/sla/sla_cpt/qwen2.5-0.5b_english_wiki_750M_chinese_wikipedia_corpus_2e_240925/checkpoint-2944', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.float16, max_seq_len=32768, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=True, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='xgrammar', reasoning_backend=None), observability_config=ObservabilityConfig(show_hidden_metrics=False, otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=None, served_model_name=/home/khanh/sla/sla_cpt/qwen2.5-0.5b_english_wiki_750M_chinese_wikipedia_corpus_2e_240925/checkpoint-2944, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=None, chunked_prefill_enabled=False, use_async_output_proc=False, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"splitting_ops":[],"compile_sizes":[],"cudagraph_capture_sizes":[],"max_capture_size":0}, use_cached_outputs=False, INFO 09-26 12:58:12 [cuda.py:292] Using Flash Attention backend. INFO 09-26 12:58:13 [parallel_state.py:957] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0 INFO 09-26 12:58:13 [model_runner.py:1110] Starting to load model /home/khanh/sla/sla_cpt/qwen2.5-0.5b_english_wiki_750M_chinese_wikipedia_corpus_2e_240925/checkpoint-2944... Loading safetensors checkpoint shards: 0% Completed | 0/1 [00:00... [rank0]: ) [rank0]: huggingface_hub.errors.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/home/khanh/sla/sla_cpt/qwen2.5-0.5b_english_wiki_1.5Bbasque_corpus_24092/checkpoint-2424'. Use `repo_type` argument if needed. [rank0]: The above exception was the direct cause of the following exception: [rank0]: Traceback (most recent call last): [rank0]: File "/home/khanh/sla/lsn-analysis/activation_all.py", line 127, in [rank0]: model = LLM( [rank0]: model=model_name, [rank0]: ...<2 lines>... [rank0]: trust_remote_code=True [rank0]: ) [rank0]: File "/home/khanh/miniforge3/envs/sla/lib/python3.13/site-packages/vllm/utils.py", line 1096, in inner [rank0]: return fn(*args, **kwargs) [rank0]: File "/home/khanh/miniforge3/envs/sla/lib/python3.13/site-packages/vllm/entrypoints/llm.py", line 243, in __init__ [rank0]: self.llm_engine = LLMEngine.from_engine_args( [rank0]: ~~~~~~~~~~~~~~~~~~~~~~~~~~^ [rank0]: engine_args=engine_args, usage_context=UsageContext.LLM_CLASS) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/home/khanh/miniforge3/envs/sla/lib/python3.13/site-packages/vllm/engine/llm_engine.py", line 514, in from_engine_args [rank0]: vllm_config = engine_args.create_engine_config(usage_context) [rank0]: File "/home/khanh/miniforge3/envs/sla/lib/python3.13/site-packages/vllm/engine/arg_utils.py", line 1137, in create_engine_config [rank0]: model_config = self.create_model_config() [rank0]: File "/home/khanh/miniforge3/envs/sla/lib/python3.13/site-packages/vllm/engine/arg_utils.py", line 1026, in create_model_config [rank0]: return ModelConfig( [rank0]: model=self.model, [rank0]: ...<35 lines>... [rank0]: model_impl=self.model_impl, [rank0]: ) [rank0]: File "/home/khanh/miniforge3/envs/sla/lib/python3.13/site-packages/vllm/config.py", line 343, in __init__ [rank0]: hf_config = get_config(self.hf_config_path or self.model, [rank0]: trust_remote_code, revision, code_revision, [rank0]: config_format) [rank0]: File "/home/khanh/miniforge3/envs/sla/lib/python3.13/site-packages/vllm/transformers_utils/config.py", line 283, in get_config [rank0]: raise ValueError(error_message) from e [rank0]: ValueError: Invalid repository ID or local directory specified: '/home/khanh/sla/sla_cpt/qwen2.5-0.5b_english_wiki_1.5Bbasque_corpus_24092/checkpoint-2424'. [rank0]: Please verify the following requirements: [rank0]: 1. Provide a valid Hugging Face repository ID. [rank0]: 2. Specify a local directory that contains a recognized configuration file. [rank0]: - For Hugging Face models: ensure the presence of a 'config.json'. [rank0]: - For Mistral models: ensure the presence of a 'params.json'. [rank0]:[W926 15:17:03.990277284 ProcessGroupNCCL.cpp:1496] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) srun: error: gpu0: task 0: Exited with exit code 1