Instructions to use Reza2kn/surya-ocr-2-nvfp4a16 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Reza2kn/surya-ocr-2-nvfp4a16 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="Reza2kn/surya-ocr-2-nvfp4a16")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("Reza2kn/surya-ocr-2-nvfp4a16")
model = AutoModelForMultimodalLM.from_pretrained("Reza2kn/surya-ocr-2-nvfp4a16")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Reza2kn/surya-ocr-2-nvfp4a16 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Reza2kn/surya-ocr-2-nvfp4a16"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Reza2kn/surya-ocr-2-nvfp4a16",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/Reza2kn/surya-ocr-2-nvfp4a16

SGLang

How to use Reza2kn/surya-ocr-2-nvfp4a16 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Reza2kn/surya-ocr-2-nvfp4a16" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Reza2kn/surya-ocr-2-nvfp4a16",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Reza2kn/surya-ocr-2-nvfp4a16" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Reza2kn/surya-ocr-2-nvfp4a16",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use Reza2kn/surya-ocr-2-nvfp4a16 with Docker Model Runner:
```
docker model run hf.co/Reza2kn/surya-ocr-2-nvfp4a16
```

Surya OCR 2 NVFP4A16

This repository contains an **experimental quantized** artifact derived from [datalab-to/surya-ocr-2](https://huggingface.co/datalab-to/surya-ocr-2).

This NVFP4 artifact is useful for NVIDIA/NVFP4 runtime experimentation. On the mini benchmark it matches the 8-bit MLX split profile: strong on most clean/layout-heavy sections, weak on long tiny text, and not usable for old degraded scans yet.

## What is included

- Source model: `datalab-to/surya-ocr-2`
- Runtime/format: llm-compressor / NVIDIA NVFP4-capable runtimes
- Quantization: NVFP4A16 4-bit float weight quantization; sensitive/unsupported modules remain bf16
- Vision weights included: Yes. Vision weights are included; the current recipe preserves the vision tower in bf16 rather than dropping it.
- Processor/tokenizer assets: included

## Mini olmOCR-bench results

| Candidate | Overall | Arxiv math | Headers/footers | Long tiny text | Multi-column | Old scans | Old scans math | Tables | Baseline |

|---|---:|---:|---:|---:|---:|---:|---:|---:|---:| | Source mini baseline | 91.0% ± 6.3% | 100.0% | 100.0% | 100.0% | 100.0% | 33.3% | 100.0% | 100.0% | 94.7% | | Surya OCR 2 NVFP4A16 | 79.2% ± 6.2% | 100.0% | 100.0% | 33.3% | 100.0% | 0.0% | 100.0% | 100.0% | 100.0% |

How to read the benchmark table

This is an early quant release with transparent limitations. The table uses our local 40-test mini slice of allenai/olmOCR-bench, with 3 samples from each named section plus the benchmark baseline checks. It is not the full public score and it is not a claim of >98% parity.

The useful signal is the split behavior: this artifact is currently strong on clean academic/math, headers/footers, multi-column layouts, tables, old-scan math, and baseline OCR checks, but it should not be used for old degraded scans and is weak on long tiny text.

Recommended use

Use this checkpoint for local experimentation and constrained OCR workloads whose documents resemble the passing sections above. Avoid using it as a production replacement for the original model on degraded historical scans, very small dense body text, or workloads requiring full benchmark parity.

## Loading

Load with a runtime stack that understands NVFP4A16 serialized weights. Generic Transformers runtimes may not execute this checkpoint without NVFP4 support.

## Limitations

- This is not a full-parity release yet.
- Do **not** use this artifact for degraded old scans; the current mini split score is 0.0% there.
- Do **not** use this artifact for long tiny text unless you independently validate your data; the current mini split score is 33.3%.
- Math-heavy and table/layout-heavy mini examples looked good in this slice, but full olmOCR-bench is still pending.

## Provenance

Generated non-destructively from the original Hugging Face checkpoint. This is not a fine-tune. The goal of publishing this artifact now is transparency: the files are usable for the passing workload slices above, and the known failing slices are documented clearly.

Downloads last month: -

Safetensors

Model size

0.5B params

Tensor type

F32

BF16

F8_E4M3

Model tree for Reza2kn/surya-ocr-2-nvfp4a16

Base model

datalab-to/surya-ocr-2

Quantized

(5)

this model