acvlab
/

ABot-OCR

Model card Files Files and versions

ABot-OCR / README.md

jiangandy's picture

Upload README.md with huggingface_hub

98968c0 verified 7 days ago

|

history blame contribute delete

1.41 kB

	# ABot-OCR

	ABot-OCR is a document image OCR model that converts PDF/document page images into structured Markdown output, supporting recognition and reconstruction of text, mathematical formulas (LaTeX), tables (HTML), and other elements.

	Code: https://github.com/amap-cvlab/ABot-OCR

	Paper: https://arxiv.org/abs/2605.27978

	## Benchmarks

	![ABot-OCR Benchmark Results](./metric.png)


	## Requirements

	Python 3.11 is recommended. Install the following dependencies:

	```bash
	pip install vllm==0.18.0 torch==2.10.0
	```

	> Note: Inference uses vLLM to load the model. Sufficient GPU memory is required (~4GB model weights; actual usage depends on `batch_size` and image resolution).

	---

	## Inference

	Inference script: [`abot-ocr-infer.py`](./abot-ocr-infer.py)

	### 1. Configure Model Path

	Update the default model path in the script:

	```python
	MODEL_PATH = "./abot-ocr" # Path to the model directory in this repo
	```

	### 2. Run from Command Line

	Edit the parameters in the `__main__` block at the bottom of `abot-ocr-infer.py`, then run:

	```bash
	python abot-ocr-infer.py
	```

	---

	## Acknowledgements

	Our work is inspired by many excellent open-source projects. We sincerely thank the developers of [Qwen-VL](https://github.com/QwenLM/Qwen-VL), [PaddleOCR-VL](https://github.com/PaddlePaddle/PaddleOCR), [MinerU](https://github.com/opendatalab/MinerU), and the broader OCR community.