Instructions to use AIDC-AI/Ovis2.5-9B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use AIDC-AI/Ovis2.5-9B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="AIDC-AI/Ovis2.5-9B", trust_remote_code=True) messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("AIDC-AI/Ovis2.5-9B", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use AIDC-AI/Ovis2.5-9B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "AIDC-AI/Ovis2.5-9B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AIDC-AI/Ovis2.5-9B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/AIDC-AI/Ovis2.5-9B
- SGLang
How to use AIDC-AI/Ovis2.5-9B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "AIDC-AI/Ovis2.5-9B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AIDC-AI/Ovis2.5-9B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "AIDC-AI/Ovis2.5-9B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AIDC-AI/Ovis2.5-9B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use AIDC-AI/Ovis2.5-9B with Docker Model Runner:
docker model run hf.co/AIDC-AI/Ovis2.5-9B
need a demo for the visual grounding task
I need a demo for the visual grounding task. The coordinates I obtained using the prompt "Please provide the bounding box coordinates. (for boxes)" cannot be correctly mapped back to the image.
Could your provide the prompt and model response?
my prompt:
"Find the human body parts ,Include face, eye, nose, mouth,breast,hand, foot, leg in the image. Please provide the bounding box coordinates.Coordinates are normalized to [0,1) with the origin (0,0) at the top-left corner of the image.output format json example: [ { "score": 0.7, "bbox_2d": [0.401,0.526,0.430,0.557], "label": "eye" }, { "score": 0.6, "bbox_2d": [0.489,0.494, 0.516,0.526], "label": "eye" }, { "score": 0.5, "bbox_2d": [0.296,0.529, 0.324,0.576], "label": "face" } ]
"
The code for converting to pixel coordinates is as follows:pixel_x1 = int(x1 * image_width)
Could your provide the prompt and model response?
my prompt:
"Find the human body parts ,Include face, eye, nose, mouth,breast,hand, foot, leg in the image. Please provide the bounding box coordinates.Coordinates are normalized to [0,1) with the origin (0,0) at the top-left corner of the image.output format json example: [ { "score": 0.7, "bbox_2d": [0.401,0.526,0.430,0.557], "label": "eye" }, { "score": 0.6, "bbox_2d": [0.489,0.494, 0.516,0.526], "label": "eye" }, { "score": 0.5, "bbox_2d": [0.296,0.529, 0.324,0.576], "label": "face" } ]
"
The code for converting to pixel coordinates is as follows:pixel_x1 = int(x1 * image_width)
Try like this:
Find the human body parts ,Include face, eye, nose, mouth,breast,hand, foot, leg in the image. Please provide the bounding box coordinates. Respond in JSON format like [{"label": "eye", "bbox": "<box>...</box>"}, ...].
Try like this:
Find the human body parts ,Include face, eye, nose, mouth,breast,hand, foot, leg in the image. Please provide the bounding box coordinates. Respond in JSON format like [{"label": "eye", "bbox": "<box>...</box>"}, ...].
How should we convert the output bbox back to the image ? The README guide says to use like this:
But when I try output is between (0,1000) it seems.
Thank you for the clarification
Image:
Ovis 9B Space , prompt: Find the human hands in the image. Please provide the bounding box coordinates.
Prompt like you said:
I saw the same thing in Ovis-2B as well
But sometimes it works. I think sometimes its in [0,1000] range, sometimes [0,1] range but it can mix both for the same answer
It's okay to change with if x > 1 then x = x / 1000 I am guessing
@liamtoran The thinking training data does not include grounding-related datasets. Therefore, please disable the thinking feature when using Ovis 2.5 for grounding tasks.


