Instructions to use openbmb/MiniCPM-V-4_5-AWQ with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use openbmb/MiniCPM-V-4_5-AWQ with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="openbmb/MiniCPM-V-4_5-AWQ", trust_remote_code=True)
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("openbmb/MiniCPM-V-4_5-AWQ", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use openbmb/MiniCPM-V-4_5-AWQ with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "openbmb/MiniCPM-V-4_5-AWQ"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "openbmb/MiniCPM-V-4_5-AWQ",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/openbmb/MiniCPM-V-4_5-AWQ

SGLang

How to use openbmb/MiniCPM-V-4_5-AWQ with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "openbmb/MiniCPM-V-4_5-AWQ" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "openbmb/MiniCPM-V-4_5-AWQ",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "openbmb/MiniCPM-V-4_5-AWQ" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "openbmb/MiniCPM-V-4_5-AWQ",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use openbmb/MiniCPM-V-4_5-AWQ with Docker Model Runner:
```
docker model run hf.co/openbmb/MiniCPM-V-4_5-AWQ
```

get error in load MiniCPM-V-4_5-AWQ

by xiexie1234567 - opened Sep 24, 2025

Discussion

xiexie1234567

Sep 24, 2025

model = AutoModel.from_pretrained('openbmb/MiniCPM-V-4_5-AWQ', trust_remote_code=True, # or openbmb/MiniCPM-o-2_6
attn_implementation='sdpa', torch_dtype=torch.bfloat16)

get error
File "/home/usr/anaconda3/envs/minicpm/lib/python3.10/site-packages/transformers/modeling_utils.py", line 311, in _wrapper
return func(*args, **kwargs)
File "/home/usr/anaconda3/envs/minicpm/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4800, in from_pretrained
hf_quantizer.preprocess_model(
File "/home/usr/anaconda3/envs/minicpm/lib/python3.10/site-packages/transformers/quantizers/base.py", line 225, in preprocess_model
return self._process_model_before_weight_loading(model, **kwargs)
File "/home/usr/anaconda3/envs/minicpm/lib/python3.10/site-packages/transformers/quantizers/quantizer_awq.py", line 113, in _process_model_before_weight_loading
model, has_been_replaced = replace_with_awq_linear(
File "/home/usr/anaconda3/envs/minicpm/lib/python3.10/site-packages/transformers/integrations/awq.py", line 187, in replace_with_awq_linear
_, has_been_replaced = replace_with_awq_linear(
File "/home/usr/anaconda3/envs/minicpm/lib/python3.10/site-packages/transformers/integrations/awq.py", line 187, in replace_with_awq_linear
_, has_been_replaced = replace_with_awq_linear(
File "/home/usr/anaconda3/envs/minicpm/lib/python3.10/site-packages/transformers/integrations/awq.py", line 187, in replace_with_awq_linear
_, has_been_replaced = replace_with_awq_linear(
[Previous line repeated 2 more times]
File "/home/usr/anaconda3/envs/minicpm/lib/python3.10/site-packages/transformers/integrations/awq.py", line 174, in replace_with_awq_linear
model._modules[name] = target_cls(
File "/home/usr/anaconda3/envs/minicpm/lib/python3.10/site-packages/awq/modules/linear/gemm.py", line 132, in init
assert self.in_features % self.group_size == 0
AssertionError

xiexie1234567

Sep 24, 2025

torch==2.8.0
torchvision==0.23.0
transformers==4.53.2
safetensors==0.5.3
tokenizers==0.21.2

decord==0.6.0
imageio==2.37.0
Pillow==11.0.0
tqdm==4.67.1
huggingface-hub==0.34.3
sympy==1.13.3

tc-mb

OpenBMB org Sep 24, 2025

https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/quantization/awq/minicpm-v4_5_awq_quantize.md
Is the awq repository using the repo we provide? You can refer to this document for use.

xiexie1234567

Sep 24, 2025

https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/quantization/awq/minicpm-v4_5_awq_quantize.md
Is the awq repository using the repo we provide? You can refer to this document for use.

I see. in the demo. Run with vllm. but I want run with transformers.
any suggestion?

tc-mb

OpenBMB org Sep 24, 2025

It can be used independently, this is not difficult.
Let me think and see if I can add some extra instructions to the tutorial.

xiexie1234567

Sep 24, 2025

It can be used independently, this is not difficult.
Let me think and see if I can add some extra instructions to the tutorial.

thanks! looking forward to the update!

tc-mb

OpenBMB org Sep 25, 2025

https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/quantization/awq/minicpm-v4_5_awq_quantize.md#method-2-use-the-pre-quantized-model
@xiexie1234567 I have added a script for direct inference, you can try it

xiexie1234567

Sep 26, 2025

https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/quantization/awq/minicpm-v4_5_awq_quantize.md#method-2-use-the-pre-quantized-model
@xiexie1234567 I have added a script for direct inference, you can try it

thanks!
unfortunately. I get error:
Traceback (most recent call last):
File "/home/usr/anaconda3/envs/minicpm/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/usr/anaconda3/envs/minicpm/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/usr/.vscode-server/extensions/ms-python.debugpy-2025.10.0/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/main.py", line 71, in
cli.main()
File "/home/usr/.vscode-server/extensions/ms-python.debugpy-2025.10.0/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 501, in main
run()
File "/home/usr/.vscode-server/extensions/ms-python.debugpy-2025.10.0/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 351, in run_file
runpy.run_path(target, run_name="main")
File "/home/usr/.vscode-server/extensions/ms-python.debugpy-2025.10.0/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 310, in run_path
return _run_module_code(code, init_globals, run_name, pkg_name=pkg_name, script_name=fname)
File "/home/usr/.vscode-server/extensions/ms-python.debugpy-2025.10.0/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 127, in _run_module_code
_run_code(code, mod_globals, init_globals, mod_name, mod_spec, pkg_name, script_name)
File "/home/usr/.vscode-server/extensions/ms-python.debugpy-2025.10.0/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 118, in _run_code
exec(code, run_globals)
File "/mnt/d/zjd/openbmb/demo_awq.py", line 13, in
model = AutoAWQForCausalLM.from_quantized(model_path, trust_remote_code=True).to('cuda')
File "/home/usr/anaconda3/envs/minicpm/lib/python3.10/site-packages/awq/models/auto.py", line 116, in from_quantized
model_type = check_and_get_model_type(quant_path, trust_remote_code)
File "/home/usr/anaconda3/envs/minicpm/lib/python3.10/site-packages/awq/models/auto.py", line 55, in check_and_get_model_type
raise TypeError(f"{config.model_type} isn't supported yet.")

xiexie1234567

Sep 26, 2025

torch==2.8.0
torchvision==0.23.0
transformers==4.53.2
safetensors==0.5.3
tokenizers==0.21.2

decord==0.6.0
imageio==2.37.0
Pillow==11.0.0
tqdm==4.67.1
huggingface-hub==0.34.3
sympy==1.13.3

tc-mb

OpenBMB org Sep 26, 2025

@xiexie1234567 awq is not referenced correctly.

Did you recompile and install our awq according to the documentation?
And if you have installed it, please confirm whether the awq in the environment is the installed version. It may not be successful or the application is still in the old location.

xiexie1234567

Sep 26, 2025

NO. my fault. I did not recompile and install.
Thank you for your help!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment