Instructions to use ZQTTTT/DOCR-Inspector-7B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ZQTTTT/DOCR-Inspector-7B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="ZQTTTT/DOCR-Inspector-7B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("ZQTTTT/DOCR-Inspector-7B")
model = AutoModelForImageTextToText.from_pretrained("ZQTTTT/DOCR-Inspector-7B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use ZQTTTT/DOCR-Inspector-7B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ZQTTTT/DOCR-Inspector-7B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ZQTTTT/DOCR-Inspector-7B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/ZQTTTT/DOCR-Inspector-7B

SGLang

How to use ZQTTTT/DOCR-Inspector-7B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "ZQTTTT/DOCR-Inspector-7B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ZQTTTT/DOCR-Inspector-7B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "ZQTTTT/DOCR-Inspector-7B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ZQTTTT/DOCR-Inspector-7B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use ZQTTTT/DOCR-Inspector-7B with Docker Model Runner:
```
docker model run hf.co/ZQTTTT/DOCR-Inspector-7B
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

DOCR-Inspector: Fine-Grained and Automated Evaluation of Document Parsing with VLM

This repository hosts DOCR-Inspector-7B, a Vision-Language Model (VLM) for fine-grained and automated evaluation of document parsing, as presented in the paper DOCR-Inspector: Fine-Grained and Automated Evaluation of Document Parsing with VLM.

DOCR-Inspector is a VLM-based evaluation framework designed to automatically assess document parsing results without requiring ground-truth annotations. This repository includes DOCR-Inspector-7B, a document parsing evaluation model fine-tuned from Qwen2.5-VL-7B-Instruct, along with inference & evaluation code demo.

For more information, code, and datasets, visit the official GitHub repository: DOCR-Inspector GitHub. The associated dataset can be found here: DOCRcase-Datasets

🔍 Introduction

DOCR-Inspector is a Vision-Language Model (VLM) designed for quality inspection of document parsing elements. It takes document element images and their corresponding parsing results as input, detects errors in the parsed content, categorizes them into 28 fine-grained error types, and delivers detailed quality assessment feedback. This approach formalizes document parsing assessment as fine-grained error detection and analysis, leveraging a VLM-as-a-Judge paradigm.

🌟 Key Features

No Ground Truth Needed — Evaluates parsing results directly, enabling scalable real-world document quality assessment.
28 Fine-grained Error Types — Covers text, tables, formulas with multi-level error granularity.
Reliable Quality Judgement — Equipped with Chain-of-Checklist (CoCL) reasoning, ensuring robust error discovery & explainable evaluation reports.

🧩 Examples

The GitHub repository provides detailed examples for Text, Table, and Equation elements, showcasing the model's fine-grained error detection and quality assessment. You can find full examples at: DOCR-Inspector Examples

📁 Full definition of error types available at: assets/error_type_definition.json

📊 DOCRcase-200K & DOCRcaseBench

DOCRcase-200K

DOCRcase-200K is a large-scale dataset designed for fine-grained error detection and analysis. It contains 212K element-level parsing cases spanning 28 error types across text, table and equation elements; each error is paired with detailed reasoning annotations.

DOCRcaseBench

DOCRcaseBench is a high-quality benchmark dataset tailored for evaluating document quality assessment models. It comprises real parsed outputs from several state-of-the-art models, including MinerU2.0-pipeline, PP-StructureV3, GPT-4o, Qwen2.5-VL-7B-Instruct, MonkeyOCR-1.2B-Pro, and MinerU2.0-VLM. These models were selected as they represent strong, yet imperfect, performance across various benchmarks. To ensure a balanced distribution of error types for robust evaluation, we meticulously supplemented the dataset with additional, hand-crafted examples. Every parsing result is annotated with human-verified error types.

The overall composition of DOCRcaseBench by model source is detailed in the table below.

Model	Count	Percentage
MonkeyOCR-1.2B-Pro	142	16.1%
PP-StructureV3	99	11.2%
MinerU2.0-pipeline	149	16.9%
MinerU2.0-VLM	22	2.5%
GPT-4o	181	20.5%
Qwen2.5-VL-7B-Instruct	105	11.9%
Experts (Hand-crafted/Supplemented)	64	7.3%
Total	882	100.0%

The distribution of document elements (cases) in DOCRcaseBench is summarized below:

	Text	Table	Equation	Total
Good Case	39	46	62	147
Bad Case with Single Error	339	141	81	561
Bad Case with Multi Error	70	55	49	174
Total	448	242	192	882

🔥 Performance

We present the evaluation results of various models on the DOCRcaseBench.

F1 of Case: Measures the model's accuracy in the binary classification of output quality (Good/Bad).
Recall, Precision, and F1 of Error Type: Quantify the model's performance in detecting and correctly classifying the specific error types within the document parsing results.

DOCR-Inspector-7B achieves state-of-the-art results across all element types.

Model	Text				Table				Equation
	Case		Error Type		Case		Error Type		Case		Error Type
	F1	Recall	F1	Precision	F1	Recall	F1	Precision	F1	Recall	F1	Precision
Proprietary Non-Reasoning Models
GPT-4o w/o CoT	72.05	31.66	28.8	28.04	73.69	29.89	26.36	25.03	79.2	49.31	47.2	46.31
GPT-4o w/ CoT	77.69	30.54	27.25	26.35	81.23	34.23	29.64	28.17	79.38	46.44	45.45	45.4
Gemini 2.5 Flash w/o CoT	84.89	43.29	29.88	25.43	82.21	41.94	25.97	21.29	80.46	53.73	48.17	45.96
Gemini 2.5 Flash w/ CoT	84.75	42.24	29.74	25.69	81.16	42.36	24.1	19.25	80.94	50.61	46.17	44.63
Open-source Non-Reasoning Models
Qwen2.5-VL-7B-Instruct w/o CoT	46.15	12.28	11.98	11.83	48.8	19.42	19.42	19.42	55.8	32.81	32.81	32.81
Qwen2.5-VL-7B-Instruct w/ CoT	38.17	12.05	11.64	11.5	43.48	21.56	21.72	22.11	68.1	32.29	32.12	32.03
Qwen2.5-VL-72B-Instruct w/o CoT	82.68	28.49	24.74	23.43	83.51	40.91	33.94	31.03	78.51	39.93	37.19	35.76
Qwen2.5-VL-72B-Instruct w/ CoT	74.55	30.97	26.23	24.56	76.82	40.7	31.77	28.43	79.14	44.53	41.23	39.79
Reasoning Models
Qwen3-VL-235B-A22B-Thinking	83.9	42.02	31.19	27.46	83.13	39.12	28.57	25.49	78.56	40.8	38.45	37.76
Gemini 2.5 Pro Thinking	88.46	47.17	32.9	28.16	82.01	43.60	32.93	29.63	77.19	53.04	48.58	47.27
Ours:
DOCR-Inspector-7B	96.43	81.06	80.21	81.03	86.41	63.09	62.11	62.95	85.42	74.39	73.81	74.48

🛠️ Usage

For more details on installation and usage, please visit the DOCR-Inspector GitHub repository.

Installation

DOCR-Inspector-7B is trained based on Qwen2.5-VL-7B-Instruct, so you can follow the Qwen2.5-VL-7B-Instruct installation guide.

We highly recommend installing vLLM >= 0.7.2 to improve inference speed.

Inference with vLLM

Prepare your element-cropped image and the corresponding parsing results. The required data format should conform to the structure found in ./DOCR-Inspector/demo_data on the GitHub repository.

Then, run the following command to perform inference:

python run_case_inf_vllm.py --model_path ZQTTTT/DOCR-Inspector-7B --image_path /path/to/image --ocr_path /path/to/parsing_result

Evaluation

Download DOCRcase- dataset from DOCRcaseBench. We provide a complete evaluation pipeline that supports inference using DOCR-Inspector, API models, and vLLM.

Component	Description	Path
vLLM Inference Scripts	Run DOCR-Inspector locally	`bench_inf_DOCR-Inspector.py`
vLLM Inference Scripts	Run other VLM locally	`bench_inf_qwenvl_vllm.py`
API Evaluation Scripts	Evaluate GPT/Gemini etc.	`bench_inf_api.py`
Pre-computed Paper Results	Results used in the main paper	`evaluation/results`
Metric Computation Notebook	Compute F1/Precision/Recall	`metrics.ipynb`

Acknowledgements

Citation

If you find our work helpful or inspiring, please feel free to cite it:

@article{zhang2025docr,
  title={DOCR-Inspector: Fine-Grained and Automated Evaluation of Document Parsing with VLM},
  author={Zhang, Qintong and Zhang, Junyuan and Ren, Zhifei and Ouyang, Linke and Wen, Zichen and Niu, Junbo and Qu, Yuan and Wang, Bin and Chow, Ka-Ho and He, Conghui and others},
  journal={arXiv preprint arXiv:2512.10619},
  year={2025}
}

Downloads last month: 7

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for ZQTTTT/DOCR-Inspector-7B

Quantizations

1 model

Paper for ZQTTTT/DOCR-Inspector-7B

DOCR-Inspector: Fine-Grained and Automated Evaluation of Document Parsing with VLM

Paper • 2512.10619 • Published Dec 11, 2025