improve install docs and process

The .gitignore excludes *.so files, which caused hatchling to silently
drop the compiled CUDA extension from the wheel.

pyproject.toml: add artifacts glob so hatchling includes the .so
despite gitignore; add ninja to build hook deps for faster compilation.

hatch_build.py: set a platform-specific wheel tag (cpXY-cpXY-linux_*)
instead of py3-none-any when the extension is built.

README.md + quickstart.md: rewrite install instructions to recommend
installing torch first with the correct CUDA index URL, then using
--no-build-isolation to avoid CUDA version mismatches during the
extension build.

Files changed (4) hide show

README.md +28 -11
nemotron-ocr/hatch_build.py +27 -13
nemotron-ocr/pyproject.toml +4 -1
quickstart.md +36 -9

README.md CHANGED Viewed

@@ -158,36 +158,53 @@ Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated sys
 #### Prerequisites
 - **OS**: Linux amd64 with NVIDIA GPU
-- **CUDA**: CUDA Toolkit 12.8 and compatible NVIDIA driver installed (for PyTorch CUDA). Verify with `nvidia-smi`.
-- **Python**: 3.12 (both subpackages require `python = ~3.12`)
-- **Build tools (when building the C++ extension)**:
   - GCC/G++ with C++17 support
-  - CUDA toolkit headers (for building CUDA kernels)
-  - OpenMP (used by the C++ extension)
 #### Installation
-The model requires torch, and the custom code available in this repository.
 1. Clone the repository
 - Make sure git-lfs is installed (https://git-lfs.com)
 ```
 git lfs install
 ```
 2. Installation
 ##### With pip
-- Create and activate a Python 3.12 environment (optional)
-- Run the following command to install the package:
 ```bash
 cd nemotron-ocr
-pip install hatchling
-pip install -v .
 ```
 ##### With docker

 #### Prerequisites
 - **OS**: Linux amd64 with NVIDIA GPU
+- **CUDA toolkit** with `nvcc` on `PATH`. The toolkit version must be compatible with
+  the version of PyTorch you install (same major version). For example, if you install
+  `torch` with CUDA 12.8 bindings, you need CUDA toolkit 12.x. Verify with
+  `nvcc --version` and `nvidia-smi`.
+- **Python**: 3.12 (the package requires `>=3.12,<3.13`)
+- **Build tools** (for the C++ CUDA extension compiled at install time):
   - GCC/G++ with C++17 support
+  - CUDA toolkit headers
+  - OpenMP
 #### Installation
+The package includes a C++ CUDA extension that is compiled during installation.
+Because the extension must be built against the **same PyTorch CUDA version** as
+your system's CUDA toolkit, **install PyTorch first**, then install this package
+with `--no-build-isolation` so it uses your existing PyTorch.
 1. Clone the repository
 - Make sure git-lfs is installed (https://git-lfs.com)
 ```
 git lfs install
+git clone https://huggingface.co/nvidia/nemotron-ocr-v2
 ```
 2. Installation
 ##### With pip
+- Create and activate a Python 3.12 environment
+- Install PyTorch matching your CUDA toolkit (see https://pytorch.org/get-started/locally/):
+```bash
+# Example for CUDA 12.8:
+pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128
+```
+- Install the package:
 ```bash
 cd nemotron-ocr
+pip install --no-build-isolation -v .
+```
+- Verify the C++ extension loads:
+```bash
+python -c "from nemotron_ocr.inference.pipeline_v2 import NemotronOCRV2; print('OK')"
 ```
 ##### With docker

nemotron-ocr/hatch_build.py CHANGED Viewed

@@ -41,31 +41,45 @@ def _extension_up_to_date(project_root: Path) -> bool:
     return newest_so_mtime >= newest_src_mtime
 class CustomBuildHook(BuildHookInterface):
     def initialize(self, version: str, build_data: dict) -> None:
         project_root = Path(__file__).parent
         script_path = project_root / "scripts" / "build-extension.py"
         env = os.environ.copy()
-        # Ensure the extension actually builds during package build
         env.setdefault("BUILD_CPP_EXTENSION", "1")
-        # Allow users to force rebuild or skip if up-to-date
         force_rebuild = env.get("BUILD_CPP_FORCE", "0") == "1"
         build_enabled = env.get("BUILD_CPP_EXTENSION", "1") == "1"
-        if build_enabled and not force_rebuild and _extension_up_to_date(project_root):
-            # Cached build found and sources unchanged; skip rebuild
             return
-        subprocess.run(
-            [
-                os.fspath(sys.executable),
-                os.fspath(script_path),
-            ],
-            cwd=os.fspath(project_root),
-            env=env,
-            check=True,
-        )

     return newest_so_mtime >= newest_src_mtime
+def _get_platform_tag() -> str:
+    """Return a PEP 425 platform tag for the current system."""
+    import sysconfig
+    platform = sysconfig.get_platform().replace("-", "_").replace(".", "_")
+    return platform
 class CustomBuildHook(BuildHookInterface):
     def initialize(self, version: str, build_data: dict) -> None:
         project_root = Path(__file__).parent
         script_path = project_root / "scripts" / "build-extension.py"
         env = os.environ.copy()
         env.setdefault("BUILD_CPP_EXTENSION", "1")
         force_rebuild = env.get("BUILD_CPP_FORCE", "0") == "1"
         build_enabled = env.get("BUILD_CPP_EXTENSION", "1") == "1"
+        if not build_enabled:
             return
+        if not force_rebuild and _extension_up_to_date(project_root):
+            pass  # skip rebuild, but still set the tag below
+        else:
+            subprocess.run(
+                [
+                    os.fspath(sys.executable),
+                    os.fspath(script_path),
+                ],
+                cwd=os.fspath(project_root),
+                env=env,
+                check=True,
+            )
+        # Tag the wheel as platform-specific so the .so is usable
+        python_tag = f"cp{sys.version_info.major}{sys.version_info.minor}"
+        abi_tag = python_tag
+        platform_tag = _get_platform_tag()
+        build_data["tag"] = f"{python_tag}-{abi_tag}-{platform_tag}"
+        build_data["pure_python"] = False

nemotron-ocr/pyproject.toml CHANGED Viewed

@@ -28,6 +28,9 @@ dev = [
 requires = ["hatchling", "editables"]
 build-backend = "hatchling.build"
 [tool.hatch.build.targets.wheel]
 packages = [
     "src/nemotron_ocr",
@@ -36,7 +39,7 @@ packages = [
 [tool.hatch.build.targets.wheel.hooks.custom]
 path = "hatch_build.py"
-dependencies = ["setuptools>=68", "torch>=2.0"]
 [tool.hatch.build.targets.sdist]
 include = [

 requires = ["hatchling", "editables"]
 build-backend = "hatchling.build"
+[tool.hatch.build]
+artifacts = ["src/nemotron_ocr_cpp/*.so"]
 [tool.hatch.build.targets.wheel]
 packages = [
     "src/nemotron_ocr",
 [tool.hatch.build.targets.wheel.hooks.custom]
 path = "hatch_build.py"
+dependencies = ["setuptools>=68", "torch>=2.0", "ninja"]
 [tool.hatch.build.targets.sdist]
 include = [

quickstart.md CHANGED Viewed

@@ -1,16 +1,43 @@
 # Quickstart
 ## Installation
-Create a Python 3.12 environment and install the package:
 ```bash
 cd nemotron-ocr
-pip install hatchling
-pip install -v .
 ```
-Verify the installation:
 ```bash
 python -c "from nemotron_ocr.inference.pipeline_v2 import NemotronOCRV2; print('OK')"
@@ -54,8 +81,8 @@ ocr_profile = NemotronOCRV2(verbose_post=True)
 ### Example script
 ```bash
-uv run python example.py ocr-example-input-1.png
-uv run python example.py ocr-example-input-1.png --merge-level word
-uv run python example.py ocr-example-input-1.png --detector-only
-uv run python example.py ocr-example-input-1.png --skip-relational
-```

 # Quickstart
+## Prerequisites
+- **Python 3.12** (the package requires `>=3.12,<3.13`)
+- **CUDA toolkit** with `nvcc` on `PATH` (the package compiles a CUDA C++ extension at install time)
+- **A CUDA GPU** (or set `TORCH_CUDA_ARCH_LIST` to cross-compile, e.g. `TORCH_CUDA_ARCH_LIST="8.0 9.0"`)
+The CUDA toolkit version must share the same **major version** as the CUDA
+bindings in your PyTorch install (e.g. toolkit 12.4 with `torch+cu128` is fine;
+toolkit 12.4 with `torch+cu130` will fail).
+On Slurm clusters, run the install on a GPU node or load the CUDA module first:
+```bash
+module load cuda12.4/toolkit/12.4.1   # example; adjust for your cluster
+export CUDA_HOME=/usr/local/cuda       # or wherever the toolkit lives
+```
 ## Installation
+Install PyTorch **first** with bindings matching your CUDA toolkit, then install
+this package with `--no-build-isolation` so it builds the C++ extension against
+your existing PyTorch:
 ```bash
+# 1. Install PyTorch (adjust the index URL for your CUDA version)
+pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128
+# 2. Install nemotron-ocr
 cd nemotron-ocr
+pip install --no-build-isolation -v .
 ```
+> **Why `--no-build-isolation`?** Without it, pip creates a temporary build
+> environment and installs the latest PyTorch from PyPI. That PyTorch's CUDA
+> version may not match your system's `nvcc`, causing the C++ extension build
+> to fail with a CUDA version mismatch error.
+Verify the installation (the C++ extension must load without errors):
 ```bash
 python -c "from nemotron_ocr.inference.pipeline_v2 import NemotronOCRV2; print('OK')"
 ### Example script
 ```bash
+python example.py ocr-example-input-1.png
+python example.py ocr-example-input-1.png --merge-level word
+python example.py ocr-example-input-1.png --detector-only
+python example.py ocr-example-input-1.png --skip-relational
+```