Instructions to use EthannW/HunyuanOCR-1-5-DFlash with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use EthannW/HunyuanOCR-1-5-DFlash with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="EthannW/HunyuanOCR-1-5-DFlash") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("EthannW/HunyuanOCR-1-5-DFlash") model = AutoModelForMultimodalLM.from_pretrained("EthannW/HunyuanOCR-1-5-DFlash") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use EthannW/HunyuanOCR-1-5-DFlash with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "EthannW/HunyuanOCR-1-5-DFlash" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "EthannW/HunyuanOCR-1-5-DFlash", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/EthannW/HunyuanOCR-1-5-DFlash
- SGLang
How to use EthannW/HunyuanOCR-1-5-DFlash with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "EthannW/HunyuanOCR-1-5-DFlash" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "EthannW/HunyuanOCR-1-5-DFlash", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "EthannW/HunyuanOCR-1-5-DFlash" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "EthannW/HunyuanOCR-1-5-DFlash", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use EthannW/HunyuanOCR-1-5-DFlash with Docker Model Runner:
docker model run hf.co/EthannW/HunyuanOCR-1-5-DFlash
HunyuanOCR-1.5 · DFlash Draft · Preview
Speculative-decoding draft for EthannW/HunyuanOCR-1-5
📝 Note. This is a preview release of the DFlash draft weights. The technical report and official weights are coming very soon; checkpoint, interface and file layout may still evolve before the final release. Full toolkit / docs live in the GitHub repo (branch
develop): https://github.com/Tencent-Hunyuan/HunyuanOCR.
⚠️ This model is not usable standalone. It is a draft model used only for speculative decoding together with the target model
EthannW/HunyuanOCR-1-5.
📖 What is DFlash?
End-to-end OCR is often accompanied by long autoregressive decoding — the major bottleneck for dense documents, tables, formulas, and other long structured outputs.
HunyuanOCR-1.5 adopts a speculative-decoding framework based on DFlash:
- A lightweight block-diffusion draft model (this repo) proposes multiple candidate tokens in parallel.
- The target model
(
EthannW/HunyuanOCR-1-5) verifies them in a single forward pass. - Accepted tokens are committed as-is, so the output distribution of the target model is preserved — DFlash is a lossless acceleration.
The result is significantly reduced decoding latency for long structured OCR outputs, without sacrificing accuracy.
Architecture: 5-layer Qwen3-style block-diffusion draft (~360 M params in
bfloat16), predicting 16 masked tokens in a single block. The draft is bound to
target-layer indices [1, 8, 15, 22] of the 24-layer HunyuanOCR-1.5 base.
⚙️ Environment
- Python 3.10+ (3.12 tested)
- PyTorch 2.1+ (CUDA 12.1+; a cu130 build has been tested end-to-end)
- transformers ≥ 4.57
- vLLM nightly (0.23.x, cu130 build tested) — required for real speculative-decoding speedup at deployment time. DFlash support is included in the nightly wheel; no separate patch is needed.
uv pip install -U vllm \
--torch-backend=cu130 \
--extra-index-url https://wheels.vllm.ai/nightly
uv pip install runai-model-streamer
💡 On CUDA 12.x, replace
--torch-backend=cu130with the matching tag (e.g.cu121,cu124).
🚀 How to use
A. transformers — single-image correctness / draft-load check
Use the shipped script from the GitHub repo. It loads the draft, runs it alongside the target for one image, and verifies that the AR reference matches:
git clone -b develop https://github.com/Tencent-Hunyuan/HunyuanOCR.git
cd HunyuanOCR
python inference/infer_dflash.py \
--model EthannW/HunyuanOCR-1-5 \
--dflash-model EthannW/HunyuanOCR-1-5-DFlash \
--image /path/to/document.png \
--num-spec-tokens 15
ℹ️
infer_dflash.pyonly verifies that the DFlash draft loads and produces a matching AR reference on a single image. Real speculative-decoding acceleration is only realized under vLLM, see below.
B. vLLM speculative decoding (recommended for real speedup)
MODEL_PATH=EthannW/HunyuanOCR-1-5 \
DFLASH_PATH=EthannW/HunyuanOCR-1-5-DFlash \
GPU=0 PORT=8001 GPU_MEM_UTIL=0.9 \
NUM_SPEC_TOKENS=15 \
bash inference/serve_dflash.sh
Under the hood the launch script passes:
--speculative-config '{"method":"dflash","model":"EthannW/HunyuanOCR-1-5-DFlash","num_speculative_tokens":15}'
to the vLLM entrypoint. Send an OpenAI-compatible request with the shipped single-image client:
python inference/infer_vllm_client.py \
--host 127.0.0.1 --port 8001 \
--model tencent/HunyuanOCR-1-5 \
--image /path/to/document.png
C. llama.cpp (PC-side)
A DFlash-adapted llama.cpp fork is provided for CPU / consumer-GPU / laptop
speculative decoding. See docs/llama_cpp.md in the GitHub repo for the full
guide (GGUF conversion of both target + draft, llama-server launch, and a
smoke-test client).
📦 Files in this repo
| file | purpose |
|---|---|
model.safetensors |
draft weights (bfloat16) |
config.json |
draft config; sets auto_map to dflash.DFlashDraftModel |
dflash.py |
DFlashDraftModel implementation (loaded via trust_remote_code=True) |
chat_template.jinja, tokenizer.json, tokenizer_config.json, processor_config.json |
tokenizer / processor, kept in sync with the target model |
🔗 Related repositories
- Target model (required):
EthannW/HunyuanOCR-1-5 - GitHub — training & inference toolkit (branch
develop): https://github.com/Tencent-Hunyuan/HunyuanOCR - HunyuanOCR-1.0 (previous generation):
tencent/HunyuanOCR
📜 License
HunyuanOCR-1.5 (including the DFlash draft) is released under the same license as HunyuanOCR 1.0 — the Tencent Hunyuan Community License Agreement.
⚠️ Preview notice. This draft checkpoint is a preview snapshot. The technical report and official model release will follow shortly; interfaces and weights may be updated before the final release.
- Downloads last month
- 52
Model tree for EthannW/HunyuanOCR-1-5-DFlash
Base model
EthannW/HunyuanOCR-1-5