nemotron-ocr-v2 PDF runner

Wrapper script around nvidia/nemotron-ocr-v2 that rasterizes a PDF (via pdf2image / poppler) and runs the V2 pipeline page-by-page, writing one .txt file per page.

Verified working

Hardware: NVIDIA L4 (sm_89), CUDA toolkit 12.8, PyTorch 2.11+cu128, Python 3.12
Test input: arXiv 1706.03762 (Attention Is All You Need), page 1
Result: 25 text regions extracted, full abstract recovered as readable paragraphs

Setup

# CUDA toolkit (must match torch CUDA major version)
# Install cuda-nvcc-12-8 + cuda-cudart-dev-12-8 + cuda-libraries-dev-12-8

pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128
pip install hatchling hatch-vcs huggingface_hub pdf2image pillow

huggingface-cli download nvidia/nemotron-ocr-v2 --local-dir nemotron-ocr-v2
cd nemotron-ocr-v2/nemotron-ocr
TORCH_CUDA_ARCH_LIST="8.9" pip install --no-build-isolation -v .

apt-get install -y poppler-utils

Usage

python run_ocr_pdf.py sample.pdf --page 1
python run_ocr_pdf.py sample.pdf --page 0   # all pages
python run_ocr_pdf.py sample.pdf --lang multi --merge-level sentence

Outputs ocr_out/page_NNN.png and ocr_out/page_NNN.txt.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support