nvidia
/

nemotron-ocr-v1

@@ -16,7 +16,7 @@ tags:
 - ingestion
 ---
-# NeMo Retriever OCR v1
 ## **Model Overview**
@@ -27,11 +27,11 @@ tags:
 ### **Description**
-The NeMo Retriever OCR v1 model is a state-of-the-art text recognition model designed for robust end-to-end optical character recognition (OCR) on complex real-world images. It integrates three core neural network modules: a detector for text region localization, a recognizer for transcription of detected regions, and a relational model for layout and structure analysis.
-This model is optimized for a wide variety of OCR tasks, including multi-line, multi-block, and natural scene text, and it supports advanced reading order analysis via its relational model component. NeMo Retriever OCR v1 has been developed to be production-ready and commercially usable, with a focus on speed and accuracy on both document and natural scene images.
-The NeMo Retriever OCR v1 model is part of the NVIDIA NeMo Retriever collection of NIM microservices, which provides state-of-the-art, commercially-ready models and microservices optimized for the lowest latency and highest throughput. It features a production-ready information retrieval pipeline with enterprise support. The models that form the core of this solution have been trained using responsibly selected, auditable data sources. With multiple pre-trained models available as starting points, developers can readily customize them for domain-specific use cases, such as information technology, human resource help assistants, and research & development research assistants.
 This model is ready for commercial use.
@@ -57,7 +57,7 @@ Global
 ### Use Case
-The **NeMo Retriever OCR v1** model is designed for high-accuracy and high-speed extraction of textual information from images, making it ideal for powering multimodal retrieval systems, Retrieval-Augmented Generation (RAG) pipelines, and agentic applications that require seamless integration of visual and language understanding. Its robust performance and efficiency make it an excellent choice for next-generation AI systems that demand both precision and scalability across diverse real-world content.
 ### Release Date
@@ -71,7 +71,7 @@ The **NeMo Retriever OCR v1** model is designed for high-accuracy and high-speed
 **Architecture Type:** Hybrid detector–recognizer with document-level relational modeling
-The NeMo Retriever OCR v1 model integrates three specialized neural components:
 - **Text Detector:** Utilizes a RegNetY-8GF convolutional backbone for high-accuracy localization of text regions within images.
 - **Text Recognizer:** Employs a Transformer-based sequence recognizer to transcribe text from detected regions, supporting variable word and line lengths.
@@ -230,7 +230,7 @@ for pred in predictions:
 ### Software Integration
 **Runtime Engine(s):**
-- **NeMo Retriever Page Elements v3** NIM
 **Supported Hardware Microarchitecture Compatibility [List in Alphabetic Order]:**
@@ -267,7 +267,7 @@ The model is trained on a large-scale, curated mix of public and proprietary OCR
 ### **Evaluation Datasets**
-The NeMo Retriever OCR v1 model is evaluated on several NVIDIA internal datasets for various tasks, such as pure OCR, table content extraction, and document retrieval.
 **Data Collection Method:** Hybrid (Automated, Human, Synthetic)<br>
 **Labeling Method:** Hybrid (Automated, Human, Synthetic)<br>
@@ -275,9 +275,9 @@ The NeMo Retriever OCR v1 model is evaluated on several NVIDIA internal datasets
 ### **Evaluation Results**
-We benchmarked NeMo Retriever OCR v1 on internal evaluation datasets against PaddleOCR on various tasks, such as pure OCR (Character Error Rate), table content extraction (TEDS), and document retrieval (Recall@5).
-| Metric                                   | NeMo Retriever OCR v1 | PaddleOCR | Net change |
 |-------------------------------------------|--------------------|-----------|-----------------|
 | Character Error Rate                      | 0.1633             | 0.2029    | -19.5% ✔️         |
 | Bag-of-character Error Rate               | 0.0453             | 0.0512    | -11.5% ✔️         |

 - ingestion
 ---
+# Nemotron OCR v1
 ## **Model Overview**
 ### **Description**
+The Nemotron OCR v1 model is a state-of-the-art text recognition model designed for robust end-to-end optical character recognition (OCR) on complex real-world images. It integrates three core neural network modules: a detector for text region localization, a recognizer for transcription of detected regions, and a relational model for layout and structure analysis.
+This model is optimized for a wide variety of OCR tasks, including multi-line, multi-block, and natural scene text, and it supports advanced reading order analysis via its relational model component. Nemotron OCR v1 has been developed to be production-ready and commercially usable, with a focus on speed and accuracy on both document and natural scene images.
+The Nemotron OCR v1 model is part of the NVIDIA NeMo Retriever collection of NIM microservices, which provides state-of-the-art, commercially-ready models and microservices optimized for the lowest latency and highest throughput. It features a production-ready information retrieval pipeline with enterprise support. The models that form the core of this solution have been trained using responsibly selected, auditable data sources. With multiple pre-trained models available as starting points, developers can readily customize them for domain-specific use cases, such as information technology, human resource help assistants, and research & development research assistants.
 This model is ready for commercial use.
 ### Use Case
+The **Nemotron OCR v1** model is designed for high-accuracy and high-speed extraction of textual information from images, making it ideal for powering multimodal retrieval systems, Retrieval-Augmented Generation (RAG) pipelines, and agentic applications that require seamless integration of visual and language understanding. Its robust performance and efficiency make it an excellent choice for next-generation AI systems that demand both precision and scalability across diverse real-world content.
 ### Release Date
 **Architecture Type:** Hybrid detector–recognizer with document-level relational modeling
+The Nemotron OCR v1 model integrates three specialized neural components:
 - **Text Detector:** Utilizes a RegNetY-8GF convolutional backbone for high-accuracy localization of text regions within images.
 - **Text Recognizer:** Employs a Transformer-based sequence recognizer to transcribe text from detected regions, supporting variable word and line lengths.
 ### Software Integration
 **Runtime Engine(s):**
+- **NeMo Nemotron OCR V1** NIM
 **Supported Hardware Microarchitecture Compatibility [List in Alphabetic Order]:**
 ### **Evaluation Datasets**
+The Nemotron OCR v1 model is evaluated on several NVIDIA internal datasets for various tasks, such as pure OCR, table content extraction, and document retrieval.
 **Data Collection Method:** Hybrid (Automated, Human, Synthetic)<br>
 **Labeling Method:** Hybrid (Automated, Human, Synthetic)<br>
 ### **Evaluation Results**
+We benchmarked Nemotron OCR v1 on internal evaluation datasets against PaddleOCR on various tasks, such as pure OCR (Character Error Rate), table content extraction (TEDS), and document retrieval (Recall@5).
+| Metric                                   | Nemotron OCR v1 | PaddleOCR | Net change |
 |-------------------------------------------|--------------------|-----------|-----------------|
 | Character Error Rate                      | 0.1633             | 0.2029    | -19.5% ✔️         |
 | Bag-of-character Error Rate               | 0.0453             | 0.0512    | -11.5% ✔️         |

nemotron-ocr/pyproject.toml CHANGED Viewed

@@ -1,8 +1,8 @@
 [project]
 name = "nemotron-ocr"
 version = "1.0.0"
-description = "NeMo Retriever OCR"
-authors = [{ name = "NVIDIA NeMo Retriever" }]
 requires-python = ">=3.12,<3.13"
 dependencies = [
     "pandas>=2.3.3",

 [project]
 name = "nemotron-ocr"
 version = "1.0.0"
+description = "Nemoton OCR"
+authors = [{ name = "NVIDIA Nemotron" }]
 requires-python = ">=3.12,<3.13"
 dependencies = [
     "pandas>=2.3.3",