Instructions to use rednote-hilab/dots.ocr.base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use rednote-hilab/dots.ocr.base with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="rednote-hilab/dots.ocr.base", trust_remote_code=True) messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("rednote-hilab/dots.ocr.base", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use rednote-hilab/dots.ocr.base with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "rednote-hilab/dots.ocr.base" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "rednote-hilab/dots.ocr.base", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/rednote-hilab/dots.ocr.base
- SGLang
How to use rednote-hilab/dots.ocr.base with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "rednote-hilab/dots.ocr.base" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "rednote-hilab/dots.ocr.base", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "rednote-hilab/dots.ocr.base" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "rednote-hilab/dots.ocr.base", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use rednote-hilab/dots.ocr.base with Docker Model Runner:
docker model run hf.co/rednote-hilab/dots.ocr.base
Update model card: Correct `library_name`, add paper/code/project links, and sync with GitHub README
Browse filesThis PR significantly improves the model card for `rednote-hilab/dots.ocr` by:
* **Updating `library_name`**: Changed the `library_name` in the metadata from `dots_ocr` to `transformers`. This is crucial as the model uses `transformers.AutoModelForCausalLM` and `transformers.AutoProcessor`, enabling the "How to use" widget on the Hub for easier adoption.
* **Adding prominent links**: Introduced new badges at the top for the paper, GitHub repository, and live demo (project page) for better discoverability. The existing live demo link in the text has been replaced by the badge. The `X` (Twitter) link from the GitHub README has also been added.
* **Syncing content with GitHub README**:
* Updated the "News" section with the latest release information.
* Revised the "Download Model Weights" section to include the ModelScope option.
* Refreshed the "vLLM inference" instructions under "Deployment" to reflect official vLLM integration (v0.11.0+) and simplified usage.
* Added a new "Huggingface inference with CPU" section.
* Updated the "Document Parse" section with the correct `--num_thread` argument and instructions for Transformers-based parsing.
These changes ensure the model card is up-to-date, more accurate, and more user-friendly, providing clearer guidance for researchers and users.
|
@@ -1,6 +1,10 @@
|
|
| 1 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
license: mit
|
| 3 |
-
library_name: dots_ocr
|
| 4 |
pipeline_tag: image-text-to-text
|
| 5 |
tags:
|
| 6 |
- image-to-text
|
|
@@ -11,10 +15,6 @@ tags:
|
|
| 11 |
- formula
|
| 12 |
- transformers
|
| 13 |
- custom_code
|
| 14 |
-
language:
|
| 15 |
-
- en
|
| 16 |
-
- zh
|
| 17 |
-
- multilingual
|
| 18 |
---
|
| 19 |
|
| 20 |
<div align="center">
|
|
@@ -27,14 +27,17 @@ language:
|
|
| 27 |
dots.ocr: Multilingual Document Layout Parsing in a Single Vision-Language Model
|
| 28 |
</h1>
|
| 29 |
|
| 30 |
-
[](https://huggingface.co/rednote-hilab/dots.ocr)
|
|
|
|
| 32 |
|
| 33 |
|
| 34 |
<div align="center">
|
| 35 |
-
<a href="https://dotsocr.xiaohongshu.com" target="_blank" rel="noopener noreferrer"><strong>🖥️ Live Demo</strong></a> |
|
| 36 |
<a href="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/wechat.jpg" target="_blank" rel="noopener noreferrer"><strong>💬 WeChat</strong></a> |
|
| 37 |
-
<a href="https://www.xiaohongshu.com/user/profile/683ffe42000000001d021a4c" target="_blank" rel="noopener noreferrer"><strong>📕 rednote</strong></a>
|
|
|
|
| 38 |
</div>
|
| 39 |
|
| 40 |
</div>
|
|
@@ -138,6 +141,7 @@ print(output_text)
|
|
| 138 |
|
| 139 |
|
| 140 |
## News
|
|
|
|
| 141 |
* ```2025.07.30 ``` 🚀 We release [dots.ocr](https://github.com/rednote-hilab/dots.ocr), — a multilingual documents parsing model based on 1.7b llm, with SOTA performance.
|
| 142 |
|
| 143 |
|
|
@@ -433,7 +437,6 @@ print(output_text)
|
|
| 433 |
<td>0.100</td>
|
| 434 |
<td>0.185</td>
|
| 435 |
</tr>
|
| 436 |
-
<tr>
|
| 437 |
|
| 438 |
<td rowspan="5"><strong>General<br>VLMs</strong></td>
|
| 439 |
<td>GPT4o</td>
|
|
@@ -1113,28 +1116,23 @@ pip install -e .
|
|
| 1113 |
> 💡**Note:** Please use a directory name without periods (e.g., `DotsOCR` instead of `dots.ocr`) for the model save path. This is a temporary workaround pending our integration with Transformers.
|
| 1114 |
```shell
|
| 1115 |
python3 tools/download_model.py
|
|
|
|
|
|
|
|
|
|
| 1116 |
```
|
| 1117 |
|
| 1118 |
|
| 1119 |
## 2. Deployment
|
| 1120 |
### vLLM inference
|
| 1121 |
-
We highly recommend using
|
| 1122 |
-
The [Docker Image](https://hub.docker.com/r/rednotehilab/dots.ocr) is based on the official vllm image. You can also follow [Dockerfile](https://github.com/rednote-hilab/dots.ocr/blob/master/docker/Dockerfile) to build the deployment environment by yourself.
|
| 1123 |
|
| 1124 |
```shell
|
| 1125 |
-
#
|
| 1126 |
-
|
| 1127 |
-
export hf_model_path=./weights/DotsOCR # Path to your downloaded model weights, Please use a directory name without periods (e.g., `DotsOCR` instead of `dots.ocr`) for the model save path. This is a temporary workaround pending our integration with Transformers.
|
| 1128 |
-
export PYTHONPATH=$(dirname "$hf_model_path"):$PYTHONPATH
|
| 1129 |
-
sed -i '/^from vllm\.entrypoints\.cli\.main import main$/a\
|
| 1130 |
-
from DotsOCR import modeling_dots_ocr_vllm' `which vllm` # If you downloaded model weights by yourself, please replace `DotsOCR` by your model saved directory name, and remember to use a directory name without periods (e.g., `DotsOCR` instead of `dots.ocr`)
|
| 1131 |
-
|
| 1132 |
-
# launch vllm server
|
| 1133 |
-
CUDA_VISIBLE_DEVICES=0 vllm serve ${hf_model_path} --tensor-parallel-size 1 --gpu-memory-utilization 0.95 --chat-template-content-format string --served-model-name model --trust-remote-code
|
| 1134 |
|
| 1135 |
-
#
|
| 1136 |
-
|
| 1137 |
-
#
|
| 1138 |
python3 ./demo/demo_vllm.py --prompt_mode prompt_layout_all_en
|
| 1139 |
```
|
| 1140 |
|
|
@@ -1226,6 +1224,10 @@ print(output_text)
|
|
| 1226 |
|
| 1227 |
</details>
|
| 1228 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1229 |
## 3. Document Parse
|
| 1230 |
**Based on vLLM server**, you can parse an image or a pdf file using the following commands:
|
| 1231 |
```bash
|
|
@@ -1234,7 +1236,7 @@ print(output_text)
|
|
| 1234 |
# Parse a single image
|
| 1235 |
python3 dots_ocr/parser.py demo/demo_image1.jpg
|
| 1236 |
# Parse a single PDF
|
| 1237 |
-
python3 dots_ocr/parser.py demo/demo_pdf1.pdf --
|
| 1238 |
|
| 1239 |
# Layout detection only
|
| 1240 |
python3 dots_ocr/parser.py demo/demo_image1.jpg --prompt prompt_layout_only_en
|
|
@@ -1246,6 +1248,9 @@ python3 dots_ocr/parser.py demo/demo_image1.jpg --prompt prompt_ocr
|
|
| 1246 |
python3 dots_ocr/parser.py demo/demo_image1.jpg --prompt prompt_grounding_ocr --bbox 163 241 1536 705
|
| 1247 |
|
| 1248 |
```
|
|
|
|
|
|
|
|
|
|
| 1249 |
|
| 1250 |
<details>
|
| 1251 |
<summary><b>Output Results</b></summary>
|
|
|
|
| 1 |
---
|
| 2 |
+
language:
|
| 3 |
+
- en
|
| 4 |
+
- zh
|
| 5 |
+
- multilingual
|
| 6 |
+
library_name: transformers
|
| 7 |
license: mit
|
|
|
|
| 8 |
pipeline_tag: image-text-to-text
|
| 9 |
tags:
|
| 10 |
- image-to-text
|
|
|
|
| 15 |
- formula
|
| 16 |
- transformers
|
| 17 |
- custom_code
|
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
---
|
| 19 |
|
| 20 |
<div align="center">
|
|
|
|
| 27 |
dots.ocr: Multilingual Document Layout Parsing in a Single Vision-Language Model
|
| 28 |
</h1>
|
| 29 |
|
| 30 |
+
[](https://huggingface.co/papers/2512.02498)
|
| 31 |
+
[](https://github.com/rednote-hilab/dots.ocr)
|
| 32 |
+
[](https://dotsocr.xiaohongshu.com)
|
| 33 |
[](https://huggingface.co/rednote-hilab/dots.ocr)
|
| 34 |
+
[](https://github.com/rednote-hilab/dots.ocr/blob/master/assets/blog.md)
|
| 35 |
|
| 36 |
|
| 37 |
<div align="center">
|
|
|
|
| 38 |
<a href="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/wechat.jpg" target="_blank" rel="noopener noreferrer"><strong>💬 WeChat</strong></a> |
|
| 39 |
+
<a href="https://www.xiaohongshu.com/user/profile/683ffe42000000001d021a4c" target="_blank" rel="noopener noreferrer"><strong>📕 rednote</strong></a> |
|
| 40 |
+
<a href="https://x.com/rednotehilab" target="_blank" rel="noopener noreferrer"><strong>🐦 X</strong></a>
|
| 41 |
</div>
|
| 42 |
|
| 43 |
</div>
|
|
|
|
| 141 |
|
| 142 |
|
| 143 |
## News
|
| 144 |
+
* ```2025.10.31 ``` 🚀 We release [dots.ocr.base](https://huggingface.co/rednote-hilab/dots.ocr.base), foundation VLM focus on OCR tasks, also the base model of [dots.ocr](https://github.com/rednote-hilab/dots.ocr). Try it out!
|
| 145 |
* ```2025.07.30 ``` 🚀 We release [dots.ocr](https://github.com/rednote-hilab/dots.ocr), — a multilingual documents parsing model based on 1.7b llm, with SOTA performance.
|
| 146 |
|
| 147 |
|
|
|
|
| 437 |
<td>0.100</td>
|
| 438 |
<td>0.185</td>
|
| 439 |
</tr>
|
|
|
|
| 440 |
|
| 441 |
<td rowspan="5"><strong>General<br>VLMs</strong></td>
|
| 442 |
<td>GPT4o</td>
|
|
|
|
| 1116 |
> 💡**Note:** Please use a directory name without periods (e.g., `DotsOCR` instead of `dots.ocr`) for the model save path. This is a temporary workaround pending our integration with Transformers.
|
| 1117 |
```shell
|
| 1118 |
python3 tools/download_model.py
|
| 1119 |
+
|
| 1120 |
+
# with modelscope
|
| 1121 |
+
python3 tools/download_model.py --type modelscope
|
| 1122 |
```
|
| 1123 |
|
| 1124 |
|
| 1125 |
## 2. Deployment
|
| 1126 |
### vLLM inference
|
| 1127 |
+
We highly recommend using vLLM for deployment and inference. All of our evaluations results are based on vLLM 0.9.1 via out-of-tree model registration. **Since vLLM version 0.11.0, Dots OCR has been officially integrated into vLLM with verified performance** and you can use vLLM docker image directly (e.g, `vllm/vllm-openai:v0.11.0`) to deploy the model server.
|
|
|
|
| 1128 |
|
| 1129 |
```shell
|
| 1130 |
+
# Launch vLLM model server
|
| 1131 |
+
vllm serve rednote-hilab/dots.ocr --trust-remote-code --async-scheduling --gpu-memory-utilization 0.95
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1132 |
|
| 1133 |
+
# vLLM API Demo
|
| 1134 |
+
# See dots_ocr/model/inference.py for details on parameter and prompt settings
|
| 1135 |
+
# that help achieve the best output quality.
|
| 1136 |
python3 ./demo/demo_vllm.py --prompt_mode prompt_layout_all_en
|
| 1137 |
```
|
| 1138 |
|
|
|
|
| 1224 |
|
| 1225 |
</details>
|
| 1226 |
|
| 1227 |
+
### Hugginface inference with CPU
|
| 1228 |
+
Please refer to [CPU inference](https://github.com/rednote-hilab/dots.ocr/issues/1#issuecomment-3148962536)
|
| 1229 |
+
|
| 1230 |
+
|
| 1231 |
## 3. Document Parse
|
| 1232 |
**Based on vLLM server**, you can parse an image or a pdf file using the following commands:
|
| 1233 |
```bash
|
|
|
|
| 1236 |
# Parse a single image
|
| 1237 |
python3 dots_ocr/parser.py demo/demo_image1.jpg
|
| 1238 |
# Parse a single PDF
|
| 1239 |
+
python3 dots_ocr/parser.py demo/demo_pdf1.pdf --num_thread 64 # try bigger num_threads for pdf with a large number of pages
|
| 1240 |
|
| 1241 |
# Layout detection only
|
| 1242 |
python3 dots_ocr/parser.py demo/demo_image1.jpg --prompt prompt_layout_only_en
|
|
|
|
| 1248 |
python3 dots_ocr/parser.py demo/demo_image1.jpg --prompt prompt_grounding_ocr --bbox 163 241 1536 705
|
| 1249 |
|
| 1250 |
```
|
| 1251 |
+
**Based on Transformers**, you can parse an image or a pdf file using the same commands above, just add `--use_hf true`.
|
| 1252 |
+
|
| 1253 |
+
> Notice: transformers is slower than vllm, if you want to use demo/* with transformers,just add `use_hf=True` in `DotsOCRParser(..,use_hf=True)`
|
| 1254 |
|
| 1255 |
<details>
|
| 1256 |
<summary><b>Output Results</b></summary>
|