Instructions to use AIDC-AI/Ovis2.5-9B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use AIDC-AI/Ovis2.5-9B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="AIDC-AI/Ovis2.5-9B", trust_remote_code=True)
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("AIDC-AI/Ovis2.5-9B", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use AIDC-AI/Ovis2.5-9B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "AIDC-AI/Ovis2.5-9B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AIDC-AI/Ovis2.5-9B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/AIDC-AI/Ovis2.5-9B

SGLang

How to use AIDC-AI/Ovis2.5-9B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "AIDC-AI/Ovis2.5-9B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AIDC-AI/Ovis2.5-9B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "AIDC-AI/Ovis2.5-9B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AIDC-AI/Ovis2.5-9B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use AIDC-AI/Ovis2.5-9B with Docker Model Runner:
```
docker model run hf.co/AIDC-AI/Ovis2.5-9B
```

need a demo for the visual grounding task

#11

by zanepoe - opened Sep 2, 2025

Discussion

zanepoe

Sep 2, 2025

I need a demo for the visual grounding task. The coordinates I obtained using the prompt "Please provide the bounding box coordinates. (for boxes)" cannot be correctly mapped back to the image.

runninglsy

AIDC-AI org Sep 2, 2025

Could your provide the prompt and model response?

zanepoe

Sep 2, 2025

•

edited Sep 2, 2025

my prompt:
"Find the human body parts ,Include face, eye, nose, mouth,breast,hand, foot, leg in the image. Please provide the bounding box coordinates.Coordinates are normalized to [0,1) with the origin (0,0) at the top-left corner of the image.output format json example: [ { "score": 0.7, "bbox_2d": [0.401,0.526,0.430,0.557], "label": "eye" }, { "score": 0.6, "bbox_2d": [0.489,0.494, 0.516,0.526], "label": "eye" }, { "score": 0.5, "bbox_2d": [0.296,0.529, 0.324,0.576], "label": "face" } ]
"
The code for converting to pixel coordinates is as follows:pixel_x1 = int(x1 * image_width)

zanepoe

Sep 3, 2025

Could your provide the prompt and model response?

runninglsy

AIDC-AI org Sep 3, 2025

•

edited Sep 3, 2025

Try like this:

Find the human body parts ,Include face, eye, nose, mouth,breast,hand, foot, leg in the image. Please provide the bounding box coordinates. Respond in JSON format like [{"label": "eye", "bbox": "<box>...</box>"}, ...].

liamtoran

Sep 6, 2025

@runninglsy

Try like this:

Find the human body parts ,Include face, eye, nose, mouth,breast,hand, foot, leg in the image. Please provide the bounding box coordinates. Respond in JSON format like [{"label": "eye", "bbox": "<box>...</box>"}, ...].

How should we convert the output bbox back to the image ? The README guide says to use like this:

But when I try output is between (0,1000) it seems.

Thank you for the clarification

runninglsy

AIDC-AI org Sep 8, 2025

@liamtoran Could you provide the image, prompt, and generation config?

liamtoran

Sep 8, 2025

•

edited Sep 8, 2025

Image:

Ovis 9B Space , prompt: Find the human hands in the image. Please provide the bounding box coordinates.

Prompt like you said:

I saw the same thing in Ovis-2B as well

But sometimes it works. I think sometimes its in [0,1000] range, sometimes [0,1] range but it can mix both for the same answer
It's okay to change with if x > 1 then x = x / 1000 I am guessing

runninglsy

AIDC-AI org Sep 9, 2025

@liamtoran The thinking training data does not include grounding-related datasets. Therefore, please disable the thinking feature when using Ovis 2.5 for grounding tasks.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment