BotResources
/

Infinity-Parser2-Flash-mlx-bf16

Image-Text-to-Text

document-parsing

vision-language

Model card Files Files and versions

Infinity-Parser2-Flash-mlx-bf16 / README.md

SaintClair-BR's picture

Upload folder using huggingface_hub

7cae748 verified 1 day ago

|

history blame contribute delete

2.83 kB

	---
	license: apache-2.0
	library_name: mlx
	base_model: infly/Infinity-Parser2-Flash
	tags:
	- mlx
	- mlx-vlm
	- ocr
	- document-parsing
	- vision-language
	pipeline_tag: image-text-to-text
	language:
	- en
	---

	# Infinity-Parser2-Flash MLX BF16

	This model was converted to MLX format from [`infly/Infinity-Parser2-Flash`](https://huggingface.co/infly/Infinity-Parser2-Flash) using mlx-vlm version 0.5.0. Refer to the [original model card](https://huggingface.co/infly/Infinity-Parser2-Flash) for more details on the model.

	## Use with mlx-vlm

	```bash
	pip install -U mlx-vlm
	```

	The model is RL-tuned for the canonical layout-extraction prompt below — using a different prompt may yield unexpected output:

	```bash
	PROMPT=$(cat <<'EOF'
	- Extract layout information from the provided PDF image.
	- For each layout element, output its bbox, category, and the text content within the bbox.
	- Bbox format: [x1, y1, x2, y2].
	- Allowed layout categories: ['header', 'title', 'text', 'figure', 'table', 'formula', 'figure_caption', 'table_caption', 'formula_caption', 'figure_footnote', 'table_footnote', 'page_footnote', 'footer'].
	- Text extraction and formatting:
	1) For 'figure', the text field must be an empty string.
	2) For 'formula', format text as LaTeX.
	3) For 'table', format text as HTML.
	4) For all other categories (e.g., text, title), format text as Markdown.
	- The output text must be exactly the original text from the image, with no translation or rewriting.
	- Sort all layout elements in human reading order.
	- Final output must be a single JSON object.
	EOF
	)

	python -m mlx_vlm.generate \
	--model BotResources/Infinity-Parser2-Flash-mlx-bf16 \
	--max-tokens 32768 --temperature 0.0 \
	--prompt "$PROMPT" \
	--image <path_to_image>
	```

	## Quantization quality

	A companion 8-bit quantization is published at [`BotResources/Infinity-Parser2-Flash-mlx-q8`](https://huggingface.co/BotResources/Infinity-Parser2-Flash-mlx-q8).

	In a BotResources internal benchmark of 50 pages from various PDFs (text, tables, formulas, scans), the BF16 build and the 8-bit build produced byte-identical outputs on all 50 pages at `temperature=0`, `top_p=1`. Token count, character count, and final text are strictly equal between the two builds.

	On the same Apple M4 Max (128 GB unified memory) only the runtime differs:

	\| Build \| On-disk \| Peak RAM \| Generation \|
	\|---\|---:\|---:\|---:\|
	\| BF16 (this build) \| 4.43 GB \| 5.4 GB \| 101 tok/s \|
	\| 8-bit \| 2.48 GB \| 3.7 GB \| 167 tok/s \|

	The 8-bit build is ~65 % faster per token and uses ~33 % less peak RAM, with no measured quality loss for this use case.

	## License

	Inherits the Apache-2.0 license from the base model [`infly/Infinity-Parser2-Flash`](https://huggingface.co/infly/Infinity-Parser2-Flash). All credit for the underlying model goes to the inflyAI team.