Instructions to use BotResources/Infinity-Parser2-Flash-mlx-bf16 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use BotResources/Infinity-Parser2-Flash-mlx-bf16 with MLX:
# Make sure mlx-vlm is installed # pip install --upgrade mlx-vlm from mlx_vlm import load, generate from mlx_vlm.prompt_utils import apply_chat_template from mlx_vlm.utils import load_config # Load the model model, processor = load("BotResources/Infinity-Parser2-Flash-mlx-bf16") config = load_config("BotResources/Infinity-Parser2-Flash-mlx-bf16") # Prepare input image = ["http://images.cocodataset.org/val2017/000000039769.jpg"] prompt = "Describe this image." # Apply chat template formatted_prompt = apply_chat_template( processor, config, prompt, num_images=1 ) # Generate output output = generate(model, processor, formatted_prompt, image) print(output) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
- Pi new
How to use BotResources/Infinity-Parser2-Flash-mlx-bf16 with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "BotResources/Infinity-Parser2-Flash-mlx-bf16"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "BotResources/Infinity-Parser2-Flash-mlx-bf16" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use BotResources/Infinity-Parser2-Flash-mlx-bf16 with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "BotResources/Infinity-Parser2-Flash-mlx-bf16"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default BotResources/Infinity-Parser2-Flash-mlx-bf16
Run Hermes
hermes
Infinity-Parser2-Flash MLX BF16
This model was converted to MLX format from infly/Infinity-Parser2-Flash using mlx-vlm version 0.5.0. Refer to the original model card for more details on the model.
Use with mlx-vlm
pip install -U mlx-vlm
The model is RL-tuned for the canonical layout-extraction prompt below — using a different prompt may yield unexpected output:
PROMPT=$(cat <<'EOF'
- Extract layout information from the provided PDF image.
- For each layout element, output its bbox, category, and the text content within the bbox.
- Bbox format: [x1, y1, x2, y2].
- Allowed layout categories: ['header', 'title', 'text', 'figure', 'table', 'formula', 'figure_caption', 'table_caption', 'formula_caption', 'figure_footnote', 'table_footnote', 'page_footnote', 'footer'].
- Text extraction and formatting:
1) For 'figure', the text field must be an empty string.
2) For 'formula', format text as LaTeX.
3) For 'table', format text as HTML.
4) For all other categories (e.g., text, title), format text as Markdown.
- The output text must be exactly the original text from the image, with no translation or rewriting.
- Sort all layout elements in human reading order.
- Final output must be a single JSON object.
EOF
)
python -m mlx_vlm.generate \
--model BotResources/Infinity-Parser2-Flash-mlx-bf16 \
--max-tokens 32768 --temperature 0.0 \
--prompt "$PROMPT" \
--image <path_to_image>
Quantization quality
A companion 8-bit quantization is published at BotResources/Infinity-Parser2-Flash-mlx-q8.
In a BotResources internal benchmark of 50 pages from various PDFs (text, tables, formulas, scans), the BF16 build and the 8-bit build produced byte-identical outputs on all 50 pages at temperature=0, top_p=1. Token count, character count, and final text are strictly equal between the two builds.
On the same Apple M4 Max (128 GB unified memory) only the runtime differs:
| Build | On-disk | Peak RAM | Generation |
|---|---|---|---|
| BF16 (this build) | 4.43 GB | 5.4 GB | 101 tok/s |
| 8-bit | 2.48 GB | 3.7 GB | 167 tok/s |
The 8-bit build is ~65 % faster per token and uses ~33 % less peak RAM, with no measured quality loss for this use case.
License
Inherits the Apache-2.0 license from the base model infly/Infinity-Parser2-Flash. All credit for the underlying model goes to the inflyAI team.
- Downloads last month
- -
Quantized
Model tree for BotResources/Infinity-Parser2-Flash-mlx-bf16
Base model
infly/Infinity-Parser2-Flash