File size: 2,851 Bytes

4d53871

# Quickstart

## Prerequisites

- **Python 3.12** (the package requires `>=3.12,<3.13`)
- **CUDA toolkit** with `nvcc` on `PATH` (the package compiles a CUDA C++ extension at install time)
- **A CUDA GPU** (or set `TORCH_CUDA_ARCH_LIST` to cross-compile, e.g. `TORCH_CUDA_ARCH_LIST="8.0 9.0"`)

The CUDA toolkit version must share the same **major version** as the CUDA
bindings in your PyTorch install (e.g. toolkit 12.4 with `torch+cu128` is fine;
toolkit 12.4 with `torch+cu130` will fail).

On Slurm clusters, run the install on a GPU node or load the CUDA module first:

```bash
module load cuda12.4/toolkit/12.4.1   # example; adjust for your cluster
export CUDA_HOME=/usr/local/cuda       # or wherever the toolkit lives
```

## Installation

Install PyTorch **first** with bindings matching your CUDA toolkit, then install
this package with `--no-build-isolation` so it builds the C++ extension against
your existing PyTorch:

```bash
# 1. Install PyTorch (adjust the index URL for your CUDA version)
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128

# 2. Install nemotron-ocr
cd nemotron-ocr
pip install --no-build-isolation -v .
```

> **Why `--no-build-isolation`?** Without it, pip creates a temporary build
> environment and installs the latest PyTorch from PyPI. That PyTorch's CUDA
> version may not match your system's `nvcc`, causing the C++ extension build
> to fail with a CUDA version mismatch error.

Verify the installation (the C++ extension must load without errors):

```bash
python -c "from nemotron_ocr.inference.pipeline_v2 import NemotronOCRV2; print('OK')"
```

## Usage

`NemotronOCRV2` is the recommended entry point for OCR inference:

```python
from nemotron_ocr.inference.pipeline_v2 import NemotronOCRV2

ocr = NemotronOCRV2()
predictions = ocr("ocr-example-input-1.png")

for pred in predictions:
    print(f"  - Text: '{pred['text']}', Confidence: {pred['confidence']:.2f}")
```

The level of detection merging can be adjusted with `merge_level`:

```python
ocr(image_path, merge_level="word")       # individual words
ocr(image_path, merge_level="sentence")   # merged into sentences
ocr(image_path, merge_level="paragraph")  # merged into paragraphs (default)
```

### Inference modes

```python
# Detector only — bounding boxes, no text (fastest, lowest memory)
ocr_det = NemotronOCRV2(detector_only=True)

# Skip relational — per-word text, no reading-order grouping
ocr_fast = NemotronOCRV2(skip_relational=True)

# Profiling — per-phase CUDA-synced timing in logs
ocr_profile = NemotronOCRV2(verbose_post=True)
```

### Example script

```bash
python example.py ocr-example-input-1.png
python example.py ocr-example-input-1.png --merge-level word
python example.py ocr-example-input-1.png --detector-only
python example.py ocr-example-input-1.png --skip-relational
```