HunyuanOCR MLX

HunyuanOCR converted to Apple MLX for native Apple Silicon inference on Mac.

This is a conversion of Tencent's HunyuanOCR — a 1B parameter OCR expert Vision-Language Model. It achieves SOTA across text spotting, complex document parsing, information extraction, video subtitle extraction, and photo translation.

Model Architecture

Component	Spec
Type	Vision-Language Model (VLM)
Parameters	~1B
Vision Encoder	27-layer ViT, 1152 dim, 16 heads
Language Model	24-layer decoder, 1024 dim, GQA (16Q/8KV)
Features	xdrope RoPE, QK normalization, RMS norm, SiLU SwiGLU
Dtype	float16
Format	MLX

Quick Start

pip install mlx transformers torch torchvision Pillow
git clone https://huggingface.co/AnandSingh/hunyuanocr-mlx

import mlx.core as mx
from PIL import Image

# Import the model code
from hunyuan_ocr_mlx import HunyuanOCR, HunyuanOCRProcessor

model = HunyuanOCR("config.json")
model.load_weights("model.safetensors")
processor = HunyuanOCRProcessor.from_pretrained(".")

# Run OCR
img = Image.open("document.jpg")
prompt = "检测并识别图片中的文字，将文本坐标格式化输出。"
processed = processor.process([img], [prompt])

hidden_states, past_kvs = model(
    input_ids=processed.input_ids,
    pixel_values=processed.pixel_values,
    position_ids=processed.position_ids,
    attention_mask=processed.attention_mask,
    grid_thw=processed.grid_thw,
)

# Generate
logits = model.lm_head(hidden_states[:, -1:, :])
next_token = mx.argmax(logits[:, -1, :], axis=-1)

Prompt Examples

Task	Prompt
Text Spotting	`检测并识别图片中的文字，将文本坐标格式化输出。`
Document Parsing	`提取文档图片中正文的所有信息用markdown格式表示，其中页眉、页脚部分忽略，表格用html格式表达，文档中公式用latex格式表示，按照阅读顺序组织进行解析。`
Formula Recognition	`识别图片中的公式，用LaTeX格式表示。`
Table Extraction	`把图中的表格解析为 HTML。`
Chart Parsing	`解析图中的图表，对于流程图使用Mermaid格式表示，其他图表使用Markdown格式表示。`
Information Extraction	`提取图片中的: ['key1','key2', ...] 的字段内容，并按照JSON格式返回。`
Translation	`先提取文字，再将文字内容翻译为英文。`

Requirements

Apple Silicon Mac (M1/M2/M3/M4)
macOS 14+
Python 3.9+
MLX, transformers, torch, Pillow

License

This model is a derivative of Tencent HunyuanOCR, licensed under the Tencent Hunyuan Community License Agreement.

Attribution

Original model by Tencent Hunyuan Vision Team. This MLX conversion is not affiliated with or endorsed by Tencent.

Downloads last month: 223

Safetensors

Model size

1B params

Tensor type

F16

MLX

Hardware compatibility

Quantized

Model tree for AnandSingh/hunyuanocr-mlx

Unable to build the model tree, the base model loops to the model itself. Learn more.