ValueError: GGUF model with architecture qwen3vl is not supported yet.

#2
by furquan - opened

Hi, just curious how you where able to run these quants? do I need llama.cpp?

$ vllm serve ./chandra-Q8_0.gguf --tokenizer datalab-to/chandra
/local/home/hfurquan/miniconda3/lib/python3.12/site-packages/transformers/utils/hub.py:110: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
  warnings.warn(
INFO 12-19 10:43:36 [scheduler.py:216] Chunked prefill is enabled with max_num_batched_tokens=2048.
(APIServer pid=3163855) INFO 12-19 10:43:36 [api_server.py:1977] vLLM API server version 0.11.2
(APIServer pid=3163855) INFO 12-19 10:43:36 [utils.py:253] non-default args: {'model_tag': './chandra-Q8_0.gguf', 'model': './chandra-Q8_0.gguf', 'tokenizer': 'datalab-to/chandra'}
(APIServer pid=3163855) Traceback (most recent call last):
(APIServer pid=3163855)   File "/local/home/hfurquan/miniconda3/bin/vllm", line 8, in <module>
(APIServer pid=3163855)     sys.exit(main())
(APIServer pid=3163855)              ^^^^^^
(APIServer pid=3163855)   File "/local/home/hfurquan/miniconda3/lib/python3.12/site-packages/vllm/entrypoints/cli/main.py", line 73, in main
(APIServer pid=3163855)     args.dispatch_function(args)
(APIServer pid=3163855)   File "/local/home/hfurquan/miniconda3/lib/python3.12/site-packages/vllm/entrypoints/cli/serve.py", line 60, in cmd
(APIServer pid=3163855)     uvloop.run(run_server(args))
(APIServer pid=3163855)   File "/local/home/hfurquan/miniconda3/lib/python3.12/site-packages/uvloop/__init__.py", line 109, in run
(APIServer pid=3163855)     return __asyncio.run(
(APIServer pid=3163855)            ^^^^^^^^^^^^^^
(APIServer pid=3163855)   File "/local/home/hfurquan/miniconda3/lib/python3.12/asyncio/runners.py", line 194, in run
(APIServer pid=3163855)     return runner.run(main)
(APIServer pid=3163855)            ^^^^^^^^^^^^^^^^
(APIServer pid=3163855)   File "/local/home/hfurquan/miniconda3/lib/python3.12/asyncio/runners.py", line 118, in run
(APIServer pid=3163855)     return self._loop.run_until_complete(task)
(APIServer pid=3163855)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=3163855)   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=3163855)   File "/local/home/hfurquan/miniconda3/lib/python3.12/site-packages/uvloop/__init__.py", line 61, in wrapper
(APIServer pid=3163855)     return await main
(APIServer pid=3163855)            ^^^^^^^^^^
(APIServer pid=3163855)   File "/local/home/hfurquan/miniconda3/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 2024, in run_server
(APIServer pid=3163855)     await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=3163855)   File "/local/home/hfurquan/miniconda3/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 2043, in run_server_worker
(APIServer pid=3163855)     async with build_async_engine_client(
(APIServer pid=3163855)   File "/local/home/hfurquan/miniconda3/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=3163855)     return await anext(self.gen)
(APIServer pid=3163855)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=3163855)   File "/local/home/hfurquan/miniconda3/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 195, in build_async_engine_client
(APIServer pid=3163855)     async with build_async_engine_client_from_engine_args(
(APIServer pid=3163855)   File "/local/home/hfurquan/miniconda3/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=3163855)     return await anext(self.gen)
(APIServer pid=3163855)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=3163855)   File "/local/home/hfurquan/miniconda3/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 221, in build_async_engine_client_from_engine_args
(APIServer pid=3163855)     vllm_config = engine_args.create_engine_config(usage_context=usage_context)
(APIServer pid=3163855)                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=3163855)   File "/local/home/hfurquan/miniconda3/lib/python3.12/site-packages/vllm/engine/arg_utils.py", line 1351, in create_engine_config
(APIServer pid=3163855)     maybe_override_with_speculators(
(APIServer pid=3163855)   File "/local/home/hfurquan/miniconda3/lib/python3.12/site-packages/vllm/transformers_utils/config.py", line 530, in maybe_override_with_speculators
(APIServer pid=3163855)     config_dict, _ = PretrainedConfig.get_config_dict(
(APIServer pid=3163855)                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=3163855)   File "/local/home/hfurquan/miniconda3/lib/python3.12/site-packages/transformers/configuration_utils.py", line 662, in get_config_dict
(APIServer pid=3163855)     config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
(APIServer pid=3163855)                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=3163855)   File "/local/home/hfurquan/miniconda3/lib/python3.12/site-packages/transformers/configuration_utils.py", line 753, in _get_config_dict
(APIServer pid=3163855)     config_dict = load_gguf_checkpoint(resolved_config_file, return_tensors=False)["config"]
(APIServer pid=3163855)                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=3163855)   File "/local/home/hfurquan/miniconda3/lib/python3.12/site-packages/transformers/modeling_gguf_pytorch_utils.py", line 431, in load_gguf_checkpoint
(APIServer pid=3163855)     raise ValueError(f"GGUF model with architecture {architecture} is not supported yet.")
(APIServer pid=3163855) ValueError: GGUF model with architecture qwen3vl is not supported yet.

Sign up or log in to comment