Image-Text-to-Text
Transformers
Safetensors
dots_ocr
text-generation
ocr
image-to-text
document-parse
quantized
4bit
bitsandbytes
multilingual
conversational
custom_code
4-bit precision
Instructions to use Durgaram/dots.mocr-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Durgaram/dots.mocr-4bit with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="Durgaram/dots.mocr-4bit", trust_remote_code=True) messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("Durgaram/dots.mocr-4bit", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Durgaram/dots.mocr-4bit with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Durgaram/dots.mocr-4bit" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Durgaram/dots.mocr-4bit", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/Durgaram/dots.mocr-4bit
- SGLang
How to use Durgaram/dots.mocr-4bit with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Durgaram/dots.mocr-4bit" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Durgaram/dots.mocr-4bit", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Durgaram/dots.mocr-4bit" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Durgaram/dots.mocr-4bit", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use Durgaram/dots.mocr-4bit with Docker Model Runner:
docker model run hf.co/Durgaram/dots.mocr-4bit
Dots MOCR โ 4-bit Quantized (NF4)
๐ Introduction
This repository provides a 4-bit quantized version of dots.mocr, optimized using BitsAndBytes (NF4 precision) for efficient, low-memory inference.
The original model is a powerful multimodal OCR system capable of:
- Document parsing
- Layout understanding
- Multilingual OCR
- Structured outputs (JSON / Markdown / SVG)
This version enables deployment on low-VRAM GPUs while maintaining strong performance.
โ๏ธ Key Features
- 4-bit quantization (NF4)
- Reduced VRAM usage (~70โ80%)
- Faster inference
- Compatible with Hugging Face Transformers
- Supports OCR and document parsing
- Suitable for edge and local deployments
๐ ๏ธ Installation (Base Setup)
โ ๏ธ This model depends on the original dots.mocr repository.
conda create -n dots_mocr python=3.12
conda activate dots_mocr
git clone https://github.com/rednote-hilab/dots.mocr.git
cd dots.mocr
pip install -e .
pip install flash-attn==2.8.0.post2
๐ Usage (Quantized Inference)
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
model_id = "rednote-hilab/dots.mocr"
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype="float16",
bnb_4bit_use_double_quant=True
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
quantization_config=bnb_config,
device_map="auto",
trust_remote_code=True
)
# Example usage
inputs = tokenizer("Extract text from image", return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
๐ Quantization Details
| Parameter | Value |
|---|---|
| Precision | 4-bit |
| Quant Type | NF4 |
| Compute Dtype | float16 |
| Double Quant | Enabled |
| Library | BitsAndBytes |
๐ Use Cases
- Document OCR
- PDF parsing
- Layout detection
- Structured data extraction
- AI-powered document understanding
- Edge deployment of large OCR models
โ ๏ธ Limitations
- Slight accuracy drop compared to full precision
- GPU recommended for optimal performance
- Some layers remain in higher precision
- Not fully optimized for CPU inference
๐ฎ Future Work
- GGUF conversion for CPU inference
- FlashAttention optimization improvements
- Integration with full OCR pipelines
- Web UI (Gradio / Streamlit demo)
- Benchmark comparisons (VRAM vs accuracy)
๐ Acknowledgement
- Base Model:
rednote-hilab/dots.mocr - Quantization: BitsAndBytes
- Framework: Hugging Face Transformers
๐ License
MIT License
- Downloads last month
- 23
Model tree for Durgaram/dots.mocr-4bit
Base model
rednote-hilab/dots.mocr