| # ABot-OCR | |
| ABot-OCR is a document image OCR model that converts PDF/document page images into structured **Markdown** output, supporting recognition and reconstruction of text, mathematical formulas (LaTeX), tables (HTML), and other elements. | |
| Code: https://github.com/amap-cvlab/ABot-OCR | |
| Paper: https://arxiv.org/abs/2605.27978 | |
| ## Benchmarks | |
|  | |
| ## Requirements | |
| Python 3.11 is recommended. Install the following dependencies: | |
| ```bash | |
| pip install vllm==0.18.0 torch==2.10.0 | |
| ``` | |
| > **Note:** Inference uses vLLM to load the model. Sufficient GPU memory is required (~4GB model weights; actual usage depends on `batch_size` and image resolution). | |
| --- | |
| ## Inference | |
| Inference script: [`abot-ocr-infer.py`](./abot-ocr-infer.py) | |
| ### 1. Configure Model Path | |
| Update the default model path in the script: | |
| ```python | |
| MODEL_PATH = "./abot-ocr" # Path to the model directory in this repo | |
| ``` | |
| ### 2. Run from Command Line | |
| Edit the parameters in the `__main__` block at the bottom of `abot-ocr-infer.py`, then run: | |
| ```bash | |
| python abot-ocr-infer.py | |
| ``` | |
| --- | |
| ## Acknowledgements | |
| Our work is inspired by many excellent open-source projects. We sincerely thank the developers of [Qwen-VL](https://github.com/QwenLM/Qwen-VL), [PaddleOCR-VL](https://github.com/PaddlePaddle/PaddleOCR), [MinerU](https://github.com/opendatalab/MinerU), and the broader OCR community. | |