Instructions to use mlx-community/docling-layout-heron-mlx-bf16 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use mlx-community/docling-layout-heron-mlx-bf16 with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir docling-layout-heron-mlx-bf16 mlx-community/docling-layout-heron-mlx-bf16
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
docling-layout-heron โ MLX (bfloat16)
MLX-converted weights of docling-project/docling-layout-heron, the default document-layout model of the Docling project. Apache-2.0, same as upstream.
The architecture is RT-DETRv2 with a ResNet-50-vd backbone, 300 queries, and 17 layout classes (caption, footnote, formula, list_item, page_footer, page_header, picture, section_header, table, text, title, document_index, code, checkbox_selected, checkbox_unselected, form, key_value_region).
Inference
Requires the RT-DETRv2 MLX port in mlx-vlm.
from pathlib import Path
from PIL import Image
from huggingface_hub import snapshot_download
from transformers import AutoProcessor
from mlx_vlm.utils import load_model
from mlx_vlm.models.rt_detr_v2.generate import RTDetrV2Predictor
import mlx_vlm.models.rt_detr_v2 # registers the processor with AutoProcessor
path = Path(snapshot_download("mlx-community/docling-layout-heron-mlx-bf16"))
model = load_model(path)
processor = AutoProcessor.from_pretrained(path)
predictor = RTDetrV2Predictor(model, processor, threshold=0.3)
result = predictor.predict(Image.open("page.png"))
for name, score, box in zip(result.class_names, result.scores, result.boxes):
print(f"{name:20s} {score:.3f} {box.tolist()}")
result is a DetectionResult with vectorized fields: boxes (N, 4) xyxy in original-image pixels, scores (N,), labels (N,) integer class ids, and class_names.
Conversion
Produced with:
python -m mlx_vlm.models.rt_detr_v2.convert \
--hf-path docling-project/docling-layout-heron \
--output ./docling-layout-heron-mlx-bf16 \
--dtype bfloat16
Numerical validation against transformers.RTDetrV2ForObjectDetection on real document inputs: max abs error ~2e-5 on logits, sub-pixel on bboxes.
License and citation
Apache-2.0. The original work is described in "Advanced Layout Analysis Models for Docling" by Livathinos et al. (arXiv:2509.11720); please cite the upstream paper if you use this model.
- Downloads last month
- 33
Quantized
Model tree for mlx-community/docling-layout-heron-mlx-bf16
Base model
docling-project/docling-layout-heron