File size: 2,851 Bytes
4d53871 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 | # Quickstart
## Prerequisites
- **Python 3.12** (the package requires `>=3.12,<3.13`)
- **CUDA toolkit** with `nvcc` on `PATH` (the package compiles a CUDA C++ extension at install time)
- **A CUDA GPU** (or set `TORCH_CUDA_ARCH_LIST` to cross-compile, e.g. `TORCH_CUDA_ARCH_LIST="8.0 9.0"`)
The CUDA toolkit version must share the same **major version** as the CUDA
bindings in your PyTorch install (e.g. toolkit 12.4 with `torch+cu128` is fine;
toolkit 12.4 with `torch+cu130` will fail).
On Slurm clusters, run the install on a GPU node or load the CUDA module first:
```bash
module load cuda12.4/toolkit/12.4.1 # example; adjust for your cluster
export CUDA_HOME=/usr/local/cuda # or wherever the toolkit lives
```
## Installation
Install PyTorch **first** with bindings matching your CUDA toolkit, then install
this package with `--no-build-isolation` so it builds the C++ extension against
your existing PyTorch:
```bash
# 1. Install PyTorch (adjust the index URL for your CUDA version)
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128
# 2. Install nemotron-ocr
cd nemotron-ocr
pip install --no-build-isolation -v .
```
> **Why `--no-build-isolation`?** Without it, pip creates a temporary build
> environment and installs the latest PyTorch from PyPI. That PyTorch's CUDA
> version may not match your system's `nvcc`, causing the C++ extension build
> to fail with a CUDA version mismatch error.
Verify the installation (the C++ extension must load without errors):
```bash
python -c "from nemotron_ocr.inference.pipeline_v2 import NemotronOCRV2; print('OK')"
```
## Usage
`NemotronOCRV2` is the recommended entry point for OCR inference:
```python
from nemotron_ocr.inference.pipeline_v2 import NemotronOCRV2
ocr = NemotronOCRV2()
predictions = ocr("ocr-example-input-1.png")
for pred in predictions:
print(f" - Text: '{pred['text']}', Confidence: {pred['confidence']:.2f}")
```
The level of detection merging can be adjusted with `merge_level`:
```python
ocr(image_path, merge_level="word") # individual words
ocr(image_path, merge_level="sentence") # merged into sentences
ocr(image_path, merge_level="paragraph") # merged into paragraphs (default)
```
### Inference modes
```python
# Detector only — bounding boxes, no text (fastest, lowest memory)
ocr_det = NemotronOCRV2(detector_only=True)
# Skip relational — per-word text, no reading-order grouping
ocr_fast = NemotronOCRV2(skip_relational=True)
# Profiling — per-phase CUDA-synced timing in logs
ocr_profile = NemotronOCRV2(verbose_post=True)
```
### Example script
```bash
python example.py ocr-example-input-1.png
python example.py ocr-example-input-1.png --merge-level word
python example.py ocr-example-input-1.png --detector-only
python example.py ocr-example-input-1.png --skip-relational
```
|