Surya OCR 2 NVFP4A16

This repository contains an **experimental quantized** artifact derived from [datalab-to/surya-ocr-2](https://huggingface.co/datalab-to/surya-ocr-2).

This NVFP4 artifact is useful for NVIDIA/NVFP4 runtime experimentation. On the mini benchmark it matches the 8-bit MLX split profile: strong on most clean/layout-heavy sections, weak on long tiny text, and not usable for old degraded scans yet.

## What is included

- Source model: `datalab-to/surya-ocr-2`
- Runtime/format: llm-compressor / NVIDIA NVFP4-capable runtimes
- Quantization: NVFP4A16 4-bit float weight quantization; sensitive/unsupported modules remain bf16
- Vision weights included: Yes. Vision weights are included; the current recipe preserves the vision tower in bf16 rather than dropping it.
- Processor/tokenizer assets: included

## Mini olmOCR-bench results

| Candidate | Overall | Arxiv math | Headers/footers | Long tiny text | Multi-column | Old scans | Old scans math | Tables | Baseline |

|---|---:|---:|---:|---:|---:|---:|---:|---:|---:| | Source mini baseline | 91.0% 卤 6.3% | 100.0% | 100.0% | 100.0% | 100.0% | 33.3% | 100.0% | 100.0% | 94.7% | | Surya OCR 2 NVFP4A16 | 79.2% 卤 6.2% | 100.0% | 100.0% | 33.3% | 100.0% | 0.0% | 100.0% | 100.0% | 100.0% |

How to read the benchmark table

This is an early quant release with transparent limitations. The table uses our local 40-test mini slice of allenai/olmOCR-bench, with 3 samples from each named section plus the benchmark baseline checks. It is not the full public score and it is not a claim of >98% parity.

The useful signal is the split behavior: this artifact is currently strong on clean academic/math, headers/footers, multi-column layouts, tables, old-scan math, and baseline OCR checks, but it should not be used for old degraded scans and is weak on long tiny text.

Recommended use

Use this checkpoint for local experimentation and constrained OCR workloads whose documents resemble the passing sections above. Avoid using it as a production replacement for the original model on degraded historical scans, very small dense body text, or workloads requiring full benchmark parity.

## Loading

Load with a runtime stack that understands NVFP4A16 serialized weights. Generic Transformers runtimes may not execute this checkpoint without NVFP4 support.

## Limitations

- This is not a full-parity release yet.
- Do **not** use this artifact for degraded old scans; the current mini split score is 0.0% there.
- Do **not** use this artifact for long tiny text unless you independently validate your data; the current mini split score is 33.3%.
- Math-heavy and table/layout-heavy mini examples looked good in this slice, but full olmOCR-bench is still pending.

## Provenance

Generated non-destructively from the original Hugging Face checkpoint. This is not a fine-tune. The goal of publishing this artifact now is transparency: the files are usable for the passing workload slices above, and the known failing slices are documented clearly.
Downloads last month
-
Safetensors
Model size
0.5B params
Tensor type
F32
BF16
F8_E4M3
U8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for Reza2kn/surya-ocr-2-nvfp4a16

Quantized
(5)
this model