nemotron-ocr-v2 / quickstart.md
Shane30's picture
Duplicate from nvidia/nemotron-ocr-v2
4d53871
# Quickstart
## Prerequisites
- **Python 3.12** (the package requires `>=3.12,<3.13`)
- **CUDA toolkit** with `nvcc` on `PATH` (the package compiles a CUDA C++ extension at install time)
- **A CUDA GPU** (or set `TORCH_CUDA_ARCH_LIST` to cross-compile, e.g. `TORCH_CUDA_ARCH_LIST="8.0 9.0"`)
The CUDA toolkit version must share the same **major version** as the CUDA
bindings in your PyTorch install (e.g. toolkit 12.4 with `torch+cu128` is fine;
toolkit 12.4 with `torch+cu130` will fail).
On Slurm clusters, run the install on a GPU node or load the CUDA module first:
```bash
module load cuda12.4/toolkit/12.4.1 # example; adjust for your cluster
export CUDA_HOME=/usr/local/cuda # or wherever the toolkit lives
```
## Installation
Install PyTorch **first** with bindings matching your CUDA toolkit, then install
this package with `--no-build-isolation` so it builds the C++ extension against
your existing PyTorch:
```bash
# 1. Install PyTorch (adjust the index URL for your CUDA version)
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128
# 2. Install nemotron-ocr
cd nemotron-ocr
pip install --no-build-isolation -v .
```
> **Why `--no-build-isolation`?** Without it, pip creates a temporary build
> environment and installs the latest PyTorch from PyPI. That PyTorch's CUDA
> version may not match your system's `nvcc`, causing the C++ extension build
> to fail with a CUDA version mismatch error.
Verify the installation (the C++ extension must load without errors):
```bash
python -c "from nemotron_ocr.inference.pipeline_v2 import NemotronOCRV2; print('OK')"
```
## Usage
`NemotronOCRV2` is the recommended entry point for OCR inference:
```python
from nemotron_ocr.inference.pipeline_v2 import NemotronOCRV2
ocr = NemotronOCRV2()
predictions = ocr("ocr-example-input-1.png")
for pred in predictions:
print(f" - Text: '{pred['text']}', Confidence: {pred['confidence']:.2f}")
```
The level of detection merging can be adjusted with `merge_level`:
```python
ocr(image_path, merge_level="word") # individual words
ocr(image_path, merge_level="sentence") # merged into sentences
ocr(image_path, merge_level="paragraph") # merged into paragraphs (default)
```
### Inference modes
```python
# Detector only — bounding boxes, no text (fastest, lowest memory)
ocr_det = NemotronOCRV2(detector_only=True)
# Skip relational — per-word text, no reading-order grouping
ocr_fast = NemotronOCRV2(skip_relational=True)
# Profiling — per-phase CUDA-synced timing in logs
ocr_profile = NemotronOCRV2(verbose_post=True)
```
### Example script
```bash
python example.py ocr-example-input-1.png
python example.py ocr-example-input-1.png --merge-level word
python example.py ocr-example-input-1.png --detector-only
python example.py ocr-example-input-1.png --skip-relational
```