YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
nemotron-ocr-v2 PDF runner
Wrapper script around nvidia/nemotron-ocr-v2 that rasterizes a PDF (via pdf2image / poppler) and runs the V2 pipeline page-by-page, writing one .txt file per page.
Verified working
- Hardware: NVIDIA L4 (sm_89), CUDA toolkit 12.8, PyTorch 2.11+cu128, Python 3.12
- Test input: arXiv 1706.03762 (Attention Is All You Need), page 1
- Result: 25 text regions extracted, full abstract recovered as readable paragraphs
Setup
# CUDA toolkit (must match torch CUDA major version)
# Install cuda-nvcc-12-8 + cuda-cudart-dev-12-8 + cuda-libraries-dev-12-8
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128
pip install hatchling hatch-vcs huggingface_hub pdf2image pillow
huggingface-cli download nvidia/nemotron-ocr-v2 --local-dir nemotron-ocr-v2
cd nemotron-ocr-v2/nemotron-ocr
TORCH_CUDA_ARCH_LIST="8.9" pip install --no-build-isolation -v .
apt-get install -y poppler-utils
Usage
python run_ocr_pdf.py sample.pdf --page 1
python run_ocr_pdf.py sample.pdf --page 0 # all pages
python run_ocr_pdf.py sample.pdf --lang multi --merge-level sentence
Outputs ocr_out/page_NNN.png and ocr_out/page_NNN.txt.
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support