Instructions to use datalab-to/chandra with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use datalab-to/chandra with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="datalab-to/chandra") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("datalab-to/chandra") model = AutoModelForImageTextToText.from_pretrained("datalab-to/chandra") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use datalab-to/chandra with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "datalab-to/chandra" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "datalab-to/chandra", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/datalab-to/chandra
- SGLang
How to use datalab-to/chandra with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "datalab-to/chandra" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "datalab-to/chandra", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "datalab-to/chandra" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "datalab-to/chandra", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use datalab-to/chandra with Docker Model Runner:
docker model run hf.co/datalab-to/chandra
How Much GPU Memory (VRAM) is Needed to Run This Model?
I'm deploying the datalab-to/chandra OCR model for document processing and need to understand GPU memory requirements for inference.
Key Questions:
What's the minimum GPU VRAM needed for FP16/BF16 inference?
Does the model support quantization (AWQ, GPTQ, 8-bit)?
How does memory scale with variable document resolution?
Any real-world experience running on RTX 4090, A100, or consumer GPUs?
Context:
Python transformers library via HF
Docker with NVIDIA CUDA support
Batch document processing optimization
Any deployment insights appreciated!
Here are my test results:
Chandra method: HF
GPU: RTX 5090 32GB
Memory usage: around 18.9 GB (utilization 30–50%)
PDF: 40 MB, 340 pages
Speed: roughly 2–3 minutes per page
Just for reference.
Thanks a lot @LastXuanShen42 ! That's exactly the data I needed. Appreciate you sharing the benchmarks!
Can you fellows help me in this. I'm trying to run chandra on kaggle. But to run the ocr model even on just one image it is demanding for 36GB of gpu space. I'm unable to do anything about this but I really need this model. Can you help me with this?
Can you fellows help me in this. I'm trying to run chandra on kaggle. But to run the ocr model even on just one image it is demanding for 36GB of gpu space. I'm unable to do anything about this but I really need this model. Can you help me with this?
Same issue is faced by us when we tried using the model. Please anyone help.
Can you fellows help me in this. I'm trying to run chandra on kaggle. But to run the ocr model even on just one image it is demanding for 36GB of gpu space. I'm unable to do anything about this but I really need this model. Can you help me with this?
Same issue is faced by us when we tried using the model. Please anyone help.
Why dont you use some lightweight model instead(docling or paddle)?
Hi folks - You will need 18GB+ memory to run this model unquantized. Having flash attention installed helps bring down memory usage, using via vllm is best
For GPU-constrained setups, you can use our other model stack - marker. Another option is to use our hosted API at https://www.datalab.to/
@Oblivion07 @PreyumKr