Instructions to use openbmb/MiniCPM-V-4_5-AWQ with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use openbmb/MiniCPM-V-4_5-AWQ with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="openbmb/MiniCPM-V-4_5-AWQ", trust_remote_code=True) messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("openbmb/MiniCPM-V-4_5-AWQ", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use openbmb/MiniCPM-V-4_5-AWQ with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "openbmb/MiniCPM-V-4_5-AWQ" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "openbmb/MiniCPM-V-4_5-AWQ", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/openbmb/MiniCPM-V-4_5-AWQ
- SGLang
How to use openbmb/MiniCPM-V-4_5-AWQ with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "openbmb/MiniCPM-V-4_5-AWQ" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "openbmb/MiniCPM-V-4_5-AWQ", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "openbmb/MiniCPM-V-4_5-AWQ" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "openbmb/MiniCPM-V-4_5-AWQ", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use openbmb/MiniCPM-V-4_5-AWQ with Docker Model Runner:
docker model run hf.co/openbmb/MiniCPM-V-4_5-AWQ
get error in load MiniCPM-V-4_5-AWQ
model = AutoModel.from_pretrained('openbmb/MiniCPM-V-4_5-AWQ', trust_remote_code=True, # or openbmb/MiniCPM-o-2_6
attn_implementation='sdpa', torch_dtype=torch.bfloat16)
get error
File "/home/usr/anaconda3/envs/minicpm/lib/python3.10/site-packages/transformers/modeling_utils.py", line 311, in _wrapper
return func(*args, **kwargs)
File "/home/usr/anaconda3/envs/minicpm/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4800, in from_pretrained
hf_quantizer.preprocess_model(
File "/home/usr/anaconda3/envs/minicpm/lib/python3.10/site-packages/transformers/quantizers/base.py", line 225, in preprocess_model
return self._process_model_before_weight_loading(model, **kwargs)
File "/home/usr/anaconda3/envs/minicpm/lib/python3.10/site-packages/transformers/quantizers/quantizer_awq.py", line 113, in _process_model_before_weight_loading
model, has_been_replaced = replace_with_awq_linear(
File "/home/usr/anaconda3/envs/minicpm/lib/python3.10/site-packages/transformers/integrations/awq.py", line 187, in replace_with_awq_linear
_, has_been_replaced = replace_with_awq_linear(
File "/home/usr/anaconda3/envs/minicpm/lib/python3.10/site-packages/transformers/integrations/awq.py", line 187, in replace_with_awq_linear
_, has_been_replaced = replace_with_awq_linear(
File "/home/usr/anaconda3/envs/minicpm/lib/python3.10/site-packages/transformers/integrations/awq.py", line 187, in replace_with_awq_linear
_, has_been_replaced = replace_with_awq_linear(
[Previous line repeated 2 more times]
File "/home/usr/anaconda3/envs/minicpm/lib/python3.10/site-packages/transformers/integrations/awq.py", line 174, in replace_with_awq_linear
model._modules[name] = target_cls(
File "/home/usr/anaconda3/envs/minicpm/lib/python3.10/site-packages/awq/modules/linear/gemm.py", line 132, in init
assert self.in_features % self.group_size == 0
AssertionError
torch==2.8.0
torchvision==0.23.0
transformers==4.53.2
safetensors==0.5.3
tokenizers==0.21.2
decord==0.6.0
imageio==2.37.0
Pillow==11.0.0
tqdm==4.67.1
huggingface-hub==0.34.3
sympy==1.13.3
https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/quantization/awq/minicpm-v4_5_awq_quantize.md
Is the awq repository using the repo we provide? You can refer to this document for use.
https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/quantization/awq/minicpm-v4_5_awq_quantize.md
Is the awq repository using the repo we provide? You can refer to this document for use.
I see. in the demo. Run with vllm. but I want run with transformers.
any suggestion?
It can be used independently, this is not difficult.
Let me think and see if I can add some extra instructions to the tutorial.
It can be used independently, this is not difficult.
Let me think and see if I can add some extra instructions to the tutorial.
thanks! looking forward to the update!
https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/quantization/awq/minicpm-v4_5_awq_quantize.md#method-2-use-the-pre-quantized-model
@xiexie1234567 I have added a script for direct inference, you can try it
https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/quantization/awq/minicpm-v4_5_awq_quantize.md#method-2-use-the-pre-quantized-model
@xiexie1234567 I have added a script for direct inference, you can try it
thanks!
unfortunately. I get error:
Traceback (most recent call last):
File "/home/usr/anaconda3/envs/minicpm/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/usr/anaconda3/envs/minicpm/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/usr/.vscode-server/extensions/ms-python.debugpy-2025.10.0/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/main.py", line 71, in
cli.main()
File "/home/usr/.vscode-server/extensions/ms-python.debugpy-2025.10.0/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 501, in main
run()
File "/home/usr/.vscode-server/extensions/ms-python.debugpy-2025.10.0/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 351, in run_file
runpy.run_path(target, run_name="main")
File "/home/usr/.vscode-server/extensions/ms-python.debugpy-2025.10.0/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 310, in run_path
return _run_module_code(code, init_globals, run_name, pkg_name=pkg_name, script_name=fname)
File "/home/usr/.vscode-server/extensions/ms-python.debugpy-2025.10.0/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 127, in _run_module_code
_run_code(code, mod_globals, init_globals, mod_name, mod_spec, pkg_name, script_name)
File "/home/usr/.vscode-server/extensions/ms-python.debugpy-2025.10.0/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 118, in _run_code
exec(code, run_globals)
File "/mnt/d/zjd/openbmb/demo_awq.py", line 13, in
model = AutoAWQForCausalLM.from_quantized(model_path, trust_remote_code=True).to('cuda')
File "/home/usr/anaconda3/envs/minicpm/lib/python3.10/site-packages/awq/models/auto.py", line 116, in from_quantized
model_type = check_and_get_model_type(quant_path, trust_remote_code)
File "/home/usr/anaconda3/envs/minicpm/lib/python3.10/site-packages/awq/models/auto.py", line 55, in check_and_get_model_type
raise TypeError(f"{config.model_type} isn't supported yet.")
torch==2.8.0
torchvision==0.23.0
transformers==4.53.2
safetensors==0.5.3
tokenizers==0.21.2
decord==0.6.0
imageio==2.37.0
Pillow==11.0.0
tqdm==4.67.1
huggingface-hub==0.34.3
sympy==1.13.3
@xiexie1234567 awq is not referenced correctly.
- Did you recompile and install our awq according to the documentation?
- And if you have installed it, please confirm whether the awq in the environment is the installed version. It may not be successful or the application is still in the old location.
NO. my fault. I did not recompile and install.
Thank you for your help!