docling-layout-heron — MLX (bfloat16)

MLX-converted weights of docling-project/docling-layout-heron, the default document-layout model of the Docling project. Apache-2.0, same as upstream.

The architecture is RT-DETRv2 with a ResNet-50-vd backbone, 300 queries, and 17 layout classes (caption, footnote, formula, list_item, page_footer, page_header, picture, section_header, table, text, title, document_index, code, checkbox_selected, checkbox_unselected, form, key_value_region).

Inference

Requires the RT-DETRv2 MLX port in mlx-vlm.

from pathlib import Path
from PIL import Image
from huggingface_hub import snapshot_download
from transformers import AutoProcessor
from mlx_vlm.utils import load_model
from mlx_vlm.models.rt_detr_v2.generate import RTDetrV2Predictor
import mlx_vlm.models.rt_detr_v2  # registers the processor with AutoProcessor

path = Path(snapshot_download("mlx-community/docling-layout-heron-mlx-bf16"))
model = load_model(path)
processor = AutoProcessor.from_pretrained(path)
predictor = RTDetrV2Predictor(model, processor, threshold=0.3)

result = predictor.predict(Image.open("page.png"))
for name, score, box in zip(result.class_names, result.scores, result.boxes):
    print(f"{name:20s} {score:.3f} {box.tolist()}")

result is a DetectionResult with vectorized fields: boxes (N, 4) xyxy in original-image pixels, scores (N,), labels (N,) integer class ids, and class_names.

Conversion

Produced with:

python -m mlx_vlm.models.rt_detr_v2.convert \
    --hf-path docling-project/docling-layout-heron \
    --output ./docling-layout-heron-mlx-bf16 \
    --dtype bfloat16

Numerical validation against transformers.RTDetrV2ForObjectDetection on real document inputs: max abs error ~2e-5 on logits, sub-pixel on bboxes.

License and citation

Apache-2.0. The original work is described in "Advanced Layout Analysis Models for Docling" by Livathinos et al. (arXiv:2509.11720); please cite the upstream paper if you use this model.

Downloads last month: 72

Safetensors

Model size

42.9M params

Tensor type

BF16

MLX

Hardware compatibility

Quantized

Model tree for mlx-community/docling-layout-heron-mlx-bf16

Base model

docling-project/docling-layout-heron

Quantized

(2)

this model

Paper for mlx-community/docling-layout-heron-mlx-bf16

Advanced Layout Analysis Models for Docling

Paper • 2509.11720 • Published Sep 15, 2025