Instructions to use VideoSearchR1/didemo-stage2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use VideoSearchR1/didemo-stage2 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="VideoSearchR1/didemo-stage2")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("VideoSearchR1/didemo-stage2")
model = AutoModelForMultimodalLM.from_pretrained("VideoSearchR1/didemo-stage2")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use VideoSearchR1/didemo-stage2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "VideoSearchR1/didemo-stage2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "VideoSearchR1/didemo-stage2",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/VideoSearchR1/didemo-stage2

SGLang

How to use VideoSearchR1/didemo-stage2 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "VideoSearchR1/didemo-stage2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "VideoSearchR1/didemo-stage2",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "VideoSearchR1/didemo-stage2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "VideoSearchR1/didemo-stage2",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use VideoSearchR1/didemo-stage2 with Docker Model Runner:
```
docker model run hf.co/VideoSearchR1/didemo-stage2
```

didemo-stage2 / eval /result.json

seohyun8825

Initial model release

a4d1f4d 11 days ago

Raw

History Blame Contribute Delete

2.29 kB

	{
	"source_jsonl": "eval/external_verified_test_temporal_grounding.check.jsonl",
	"num_examples": 4013,
	"final_retrieval": {
	"N": 4013,
	"R@1": 0.5898330426115126,
	"R@5": 0.8195863443807625,
	"R@10": 0.8778968352853227,
	"R@100": 0.975080986792923,
	"MRR": 0.6891669260789297,
	"mean_rank": 11.761774233740343
	},
	"original_retrieval": {
	"N": 4013,
	"R@1": 0.551457762272614,
	"R@5": 0.7931721903812609,
	"R@10": 0.8562172937951658,
	"R@100": 0.9698479940194368,
	"MRR": 0.6583015862230686,
	"mean_rank": 13.97408422626464
	},
	"final_temporal": {
	"N": 4013,
	"mIoU@R1": 0.26725640214899044,
	"IoU@0.3@R1": 0.3331672065786195,
	"IoU@0.5@R1": 0.3025168203339148,
	"IoU@0.7@R1": 0.1976077747321206
	},
	"turns": {
	"1": {
	"retrieval": {
	"N": 4013,
	"R@1": 0.5895838524794418,
	"R@5": 0.8193371542486918,
	"R@10": 0.8778968352853227,
	"R@100": 0.975080986792923,
	"MRR": 0.689068532671443,
	"mean_rank": 11.76052828307999
	},
	"temporal": {
	"N": 4013,
	"mIoU@R1": 0.24688095568333712,
	"IoU@0.3@R1": 0.30824819337154247,
	"IoU@0.5@R1": 0.2790929479192624,
	"IoU@0.7@R1": 0.18165960627959132
	}
	},
	"2": {
	"retrieval": {
	"N": 1181,
	"R@1": 0.2938187976291279,
	"R@5": 0.6375952582557155,
	"R@10": 0.7476714648602879,
	"R@100": 0.9483488569009314,
	"MRR": 0.4417973908465668,
	"mean_rank": 23.166807790008466
	},
	"temporal": {
	"N": 4013,
	"mIoU@R1": 0.020126256333582525,
	"IoU@0.3@R1": 0.02466982307500623,
	"IoU@0.5@R1": 0.02317468228258161,
	"IoU@0.7@R1": 0.01569897832045851
	}
	},
	"3": {
	"retrieval": {
	"N": 878,
	"R@1": 0.21867881548974943,
	"R@5": 0.5990888382687927,
	"R@10": 0.7209567198177677,
	"R@100": 0.9419134396355353,
	"MRR": 0.3830839938747691,
	"mean_rank": 25.518223234624145
	},
	"temporal": {
	"N": 4013,
	"mIoU@R1": 0.00024919013207077,
	"IoU@0.3@R1": 0.00024919013207077,
	"IoU@0.5@R1": 0.00024919013207077,
	"IoU@0.7@R1": 0.00024919013207077
	}
	}
	}
	}