HunyuanOCR MLX

HunyuanOCR converted to Apple MLX for native Apple Silicon inference on Mac.

This is a conversion of Tencent's HunyuanOCR — a 1B parameter OCR expert Vision-Language Model. It achieves SOTA across text spotting, complex document parsing, information extraction, video subtitle extraction, and photo translation.

Model Architecture

Component Spec
Type Vision-Language Model (VLM)
Parameters ~1B
Vision Encoder 27-layer ViT, 1152 dim, 16 heads
Language Model 24-layer decoder, 1024 dim, GQA (16Q/8KV)
Features xdrope RoPE, QK normalization, RMS norm, SiLU SwiGLU
Dtype float16
Format MLX

Quick Start

pip install mlx transformers torch torchvision Pillow
git clone https://huggingface.co/AnandSingh/hunyuanocr-mlx
import mlx.core as mx
from PIL import Image

# Import the model code
from hunyuan_ocr_mlx import HunyuanOCR, HunyuanOCRProcessor

model = HunyuanOCR("config.json")
model.load_weights("model.safetensors")
processor = HunyuanOCRProcessor.from_pretrained(".")

# Run OCR
img = Image.open("document.jpg")
prompt = "检测并识别图片中的文字,将文本坐标格式化输出。"
processed = processor.process([img], [prompt])

hidden_states, past_kvs = model(
    input_ids=processed.input_ids,
    pixel_values=processed.pixel_values,
    position_ids=processed.position_ids,
    attention_mask=processed.attention_mask,
    grid_thw=processed.grid_thw,
)

# Generate
logits = model.lm_head(hidden_states[:, -1:, :])
next_token = mx.argmax(logits[:, -1, :], axis=-1)

Prompt Examples

Task Prompt
Text Spotting 检测并识别图片中的文字,将文本坐标格式化输出。
Document Parsing 提取文档图片中正文的所有信息用markdown格式表示,其中页眉、页脚部分忽略,表格用html格式表达,文档中公式用latex格式表示,按照阅读顺序组织进行解析。
Formula Recognition 识别图片中的公式,用LaTeX格式表示。
Table Extraction 把图中的表格解析为 HTML。
Chart Parsing 解析图中的图表,对于流程图使用Mermaid格式表示,其他图表使用Markdown格式表示。
Information Extraction 提取图片中的: ['key1','key2', ...] 的字段内容,并按照JSON格式返回。
Translation 先提取文字,再将文字内容翻译为英文。

Requirements

  • Apple Silicon Mac (M1/M2/M3/M4)
  • macOS 14+
  • Python 3.9+
  • MLX, transformers, torch, Pillow

License

This model is a derivative of Tencent HunyuanOCR, licensed under the Tencent Hunyuan Community License Agreement.

Attribution

Original model by Tencent Hunyuan Vision Team. This MLX conversion is not affiliated with or endorsed by Tencent.

Downloads last month
223
Safetensors
Model size
1B params
Tensor type
F16
·
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AnandSingh/hunyuanocr-mlx

Unable to build the model tree, the base model loops to the model itself. Learn more.