Image-Text-to-Text
Safetensors
Transformers
English
Chinese
multilingual
dots_ocr
text-generation
image-to-text
ocr
document-parse
layout
table
formula
custom_code
conversational
Eval Results
Instructions to use rednote-hilab/dots.ocr with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use rednote-hilab/dots.ocr with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="rednote-hilab/dots.ocr", trust_remote_code=True) messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("rednote-hilab/dots.ocr", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use rednote-hilab/dots.ocr with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "rednote-hilab/dots.ocr" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "rednote-hilab/dots.ocr", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/rednote-hilab/dots.ocr
- SGLang
How to use rednote-hilab/dots.ocr with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "rednote-hilab/dots.ocr" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "rednote-hilab/dots.ocr", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "rednote-hilab/dots.ocr" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "rednote-hilab/dots.ocr", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use rednote-hilab/dots.ocr with Docker Model Runner:
docker model run hf.co/rednote-hilab/dots.ocr
Update VisionSdpaAttention to support memory efficient backend.
#27
by warrenwjk - opened
- modeling_dots_vision.py +14 -5
modeling_dots_vision.py
CHANGED
|
@@ -274,12 +274,21 @@ class VisionSdpaAttention(nn.Module):
|
|
| 274 |
for i in range(1, len(cu_seqlens)):
|
| 275 |
attention_mask[..., cu_seqlens[i - 1]: cu_seqlens[i], cu_seqlens[i - 1]: cu_seqlens[i]] = True
|
| 276 |
|
| 277 |
-
|
| 278 |
-
|
| 279 |
-
|
|
|
|
| 280 |
|
| 281 |
-
|
| 282 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 283 |
attn_output = attn_output.reshape(seq_length, -1)
|
| 284 |
|
| 285 |
attn_output = self.proj(attn_output)
|
|
|
|
| 274 |
for i in range(1, len(cu_seqlens)):
|
| 275 |
attention_mask[..., cu_seqlens[i - 1]: cu_seqlens[i], cu_seqlens[i - 1]: cu_seqlens[i]] = True
|
| 276 |
|
| 277 |
+
# Convert q, k, v to 4D to enable : (1, num_heads, seq_length, head_dim)
|
| 278 |
+
q = q.transpose(0, 1).unsqueeze(0) # (1, num_heads, seq_length, head_dim)
|
| 279 |
+
k = k.transpose(0, 1).unsqueeze(0)
|
| 280 |
+
v = v.transpose(0, 1).unsqueeze(0)
|
| 281 |
|
| 282 |
+
# See: https://github.com/pytorch/pytorch/issues/127523
|
| 283 |
+
if attention_mask.stride(-1) != 1:
|
| 284 |
+
attention_mask = torch.empty_like(attention_mask, memory_format=torch.contiguous_format).copy_(attention_mask)
|
| 285 |
+
|
| 286 |
+
# use memory efficient backend
|
| 287 |
+
from torch.nn.attention import SDPBackend, sdpa_kernel
|
| 288 |
+
with sdpa_kernel(SDPBackend.EFFICIENT_ATTENTION):
|
| 289 |
+
attn_output = F.scaled_dot_product_attention(q, k, v, attention_mask, dropout_p=0.0)
|
| 290 |
+
|
| 291 |
+
attn_output = attn_output.squeeze(0).transpose(0, 1) # (seq_length, num_heads, head_dim)
|
| 292 |
attn_output = attn_output.reshape(seq_length, -1)
|
| 293 |
|
| 294 |
attn_output = self.proj(attn_output)
|