Instructions to use rednote-hilab/dots.ocr with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use rednote-hilab/dots.ocr with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="rednote-hilab/dots.ocr", trust_remote_code=True)
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("rednote-hilab/dots.ocr", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use rednote-hilab/dots.ocr with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "rednote-hilab/dots.ocr"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "rednote-hilab/dots.ocr",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/rednote-hilab/dots.ocr

SGLang

How to use rednote-hilab/dots.ocr with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "rednote-hilab/dots.ocr" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "rednote-hilab/dots.ocr",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "rednote-hilab/dots.ocr" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "rednote-hilab/dots.ocr",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use rednote-hilab/dots.ocr with Docker Model Runner:
```
docker model run hf.co/rednote-hilab/dots.ocr
```

emanuelevivoli commited on 13 days ago

Commit

9ca0570

verified ·

1 Parent(s): c0111ce

fix: transformers 5.x compat (cache_position + kwargs naming)

Browse files

## Summary

This PR fixes two issues that prevent dots.ocr from working with transformers>=5.0:

### 1. cache_position TypeError on generation

In transformers 5.x, cache_position is no longer maintained in the generation loop. The current code does `cache_position[0] == 0` which crashes with `TypeError: 'NoneType' object is not subscriptable`.

Fix: Use a combined check compatible with both transformers 4.x and 5.x — fall back to `past_key_values is None` when cache_position is unavailable.

### 2. _validate_model_kwargs ValueError for processor outputs

forward() uses `**loss_kwargs` instead of `**kwargs`. Transformers 5.x validation only recognizes `**kwargs`/`**model_kwargs` as catch-all params, causing processor outputs like mm_token_type_ids to fail validation.

Fix: Rename `**loss_kwargs` to `**kwargs` (functionally identical).

### Backward compatibility

Both fixes maintain full backward compatibility with transformers 4.x.

Files changed (1) hide show

modeling_dots_ocr.py +11 -3

modeling_dots_ocr.py CHANGED Viewed

@@ -80,7 +80,7 @@ class DotsOCRForCausalLM(Qwen2ForCausalLM):
         return_dict: Optional[bool] = None,
         use_cache: Optional[bool] = None,
         logits_to_keep: int = 0,
-        **loss_kwargs,
     ) -> Union[Tuple, CausalLMOutputWithPast]:
         return_dict = return_dict if return_dict is not None else self.config.use_return_dict
         assert len(input_ids) >= 1, f"empty input_ids {input_ids.shape=} will cause gradnorm nan"
@@ -99,7 +99,7 @@ class DotsOCRForCausalLM(Qwen2ForCausalLM):
             output_hidden_states=output_hidden_states,
             # return_dict=return_dict,
             logits_to_keep=logits_to_keep,
-            **loss_kwargs,
         )
         return outputs
@@ -125,7 +125,15 @@ class DotsOCRForCausalLM(Qwen2ForCausalLM):
             **kwargs,
         )
-        if cache_position[0] == 0:
             model_inputs["pixel_values"] = pixel_values
         return model_inputs

         return_dict: Optional[bool] = None,
         use_cache: Optional[bool] = None,
         logits_to_keep: int = 0,
+        **kwargs,
     ) -> Union[Tuple, CausalLMOutputWithPast]:
         return_dict = return_dict if return_dict is not None else self.config.use_return_dict
         assert len(input_ids) >= 1, f"empty input_ids {input_ids.shape=} will cause gradnorm nan"
             output_hidden_states=output_hidden_states,
             # return_dict=return_dict,
             logits_to_keep=logits_to_keep,
+            **kwargs,
         )
         return outputs
             **kwargs,
         )
+        # Pass pixel_values only on the first generation step (prefill).
+        # Compatible with both transformers 4.x (cache_position available)
+        # and 5.x (cache_position removed, use past_key_values instead).
+        is_prefill = (
+            (cache_position is not None and cache_position[0] == 0)
+            or past_key_values is None
+            or (hasattr(past_key_values, "get_seq_length") and past_key_values.get_seq_length() == 0)
+        )
+        if is_prefill:
             model_inputs["pixel_values"] = pixel_values
         return model_inputs