Rename NeMo Retriever OCR to Nemotron OCR
Browse files- README.md +10 -10
- nemotron-ocr/pyproject.toml +2 -2
README.md
CHANGED
|
@@ -16,7 +16,7 @@ tags:
|
|
| 16 |
- ingestion
|
| 17 |
---
|
| 18 |
|
| 19 |
-
#
|
| 20 |
|
| 21 |
## **Model Overview**
|
| 22 |
|
|
@@ -27,11 +27,11 @@ tags:
|
|
| 27 |
|
| 28 |
### **Description**
|
| 29 |
|
| 30 |
-
The
|
| 31 |
|
| 32 |
-
This model is optimized for a wide variety of OCR tasks, including multi-line, multi-block, and natural scene text, and it supports advanced reading order analysis via its relational model component.
|
| 33 |
|
| 34 |
-
The
|
| 35 |
|
| 36 |
This model is ready for commercial use.
|
| 37 |
|
|
@@ -57,7 +57,7 @@ Global
|
|
| 57 |
|
| 58 |
### Use Case
|
| 59 |
|
| 60 |
-
The **
|
| 61 |
|
| 62 |
### Release Date
|
| 63 |
|
|
@@ -71,7 +71,7 @@ The **NeMo Retriever OCR v1** model is designed for high-accuracy and high-speed
|
|
| 71 |
|
| 72 |
**Architecture Type:** Hybrid detector–recognizer with document-level relational modeling
|
| 73 |
|
| 74 |
-
The
|
| 75 |
|
| 76 |
- **Text Detector:** Utilizes a RegNetY-8GF convolutional backbone for high-accuracy localization of text regions within images.
|
| 77 |
- **Text Recognizer:** Employs a Transformer-based sequence recognizer to transcribe text from detected regions, supporting variable word and line lengths.
|
|
@@ -230,7 +230,7 @@ for pred in predictions:
|
|
| 230 |
### Software Integration
|
| 231 |
|
| 232 |
**Runtime Engine(s):**
|
| 233 |
-
- **NeMo
|
| 234 |
|
| 235 |
|
| 236 |
**Supported Hardware Microarchitecture Compatibility [List in Alphabetic Order]:**
|
|
@@ -267,7 +267,7 @@ The model is trained on a large-scale, curated mix of public and proprietary OCR
|
|
| 267 |
|
| 268 |
### **Evaluation Datasets**
|
| 269 |
|
| 270 |
-
The
|
| 271 |
|
| 272 |
**Data Collection Method:** Hybrid (Automated, Human, Synthetic)<br>
|
| 273 |
**Labeling Method:** Hybrid (Automated, Human, Synthetic)<br>
|
|
@@ -275,9 +275,9 @@ The NeMo Retriever OCR v1 model is evaluated on several NVIDIA internal datasets
|
|
| 275 |
|
| 276 |
### **Evaluation Results**
|
| 277 |
|
| 278 |
-
We benchmarked
|
| 279 |
|
| 280 |
-
| Metric |
|
| 281 |
|-------------------------------------------|--------------------|-----------|-----------------|
|
| 282 |
| Character Error Rate | 0.1633 | 0.2029 | -19.5% ✔️ |
|
| 283 |
| Bag-of-character Error Rate | 0.0453 | 0.0512 | -11.5% ✔️ |
|
|
|
|
| 16 |
- ingestion
|
| 17 |
---
|
| 18 |
|
| 19 |
+
# Nemotron OCR v1
|
| 20 |
|
| 21 |
## **Model Overview**
|
| 22 |
|
|
|
|
| 27 |
|
| 28 |
### **Description**
|
| 29 |
|
| 30 |
+
The Nemotron OCR v1 model is a state-of-the-art text recognition model designed for robust end-to-end optical character recognition (OCR) on complex real-world images. It integrates three core neural network modules: a detector for text region localization, a recognizer for transcription of detected regions, and a relational model for layout and structure analysis.
|
| 31 |
|
| 32 |
+
This model is optimized for a wide variety of OCR tasks, including multi-line, multi-block, and natural scene text, and it supports advanced reading order analysis via its relational model component. Nemotron OCR v1 has been developed to be production-ready and commercially usable, with a focus on speed and accuracy on both document and natural scene images.
|
| 33 |
|
| 34 |
+
The Nemotron OCR v1 model is part of the NVIDIA NeMo Retriever collection of NIM microservices, which provides state-of-the-art, commercially-ready models and microservices optimized for the lowest latency and highest throughput. It features a production-ready information retrieval pipeline with enterprise support. The models that form the core of this solution have been trained using responsibly selected, auditable data sources. With multiple pre-trained models available as starting points, developers can readily customize them for domain-specific use cases, such as information technology, human resource help assistants, and research & development research assistants.
|
| 35 |
|
| 36 |
This model is ready for commercial use.
|
| 37 |
|
|
|
|
| 57 |
|
| 58 |
### Use Case
|
| 59 |
|
| 60 |
+
The **Nemotron OCR v1** model is designed for high-accuracy and high-speed extraction of textual information from images, making it ideal for powering multimodal retrieval systems, Retrieval-Augmented Generation (RAG) pipelines, and agentic applications that require seamless integration of visual and language understanding. Its robust performance and efficiency make it an excellent choice for next-generation AI systems that demand both precision and scalability across diverse real-world content.
|
| 61 |
|
| 62 |
### Release Date
|
| 63 |
|
|
|
|
| 71 |
|
| 72 |
**Architecture Type:** Hybrid detector–recognizer with document-level relational modeling
|
| 73 |
|
| 74 |
+
The Nemotron OCR v1 model integrates three specialized neural components:
|
| 75 |
|
| 76 |
- **Text Detector:** Utilizes a RegNetY-8GF convolutional backbone for high-accuracy localization of text regions within images.
|
| 77 |
- **Text Recognizer:** Employs a Transformer-based sequence recognizer to transcribe text from detected regions, supporting variable word and line lengths.
|
|
|
|
| 230 |
### Software Integration
|
| 231 |
|
| 232 |
**Runtime Engine(s):**
|
| 233 |
+
- **NeMo Nemotron OCR V1** NIM
|
| 234 |
|
| 235 |
|
| 236 |
**Supported Hardware Microarchitecture Compatibility [List in Alphabetic Order]:**
|
|
|
|
| 267 |
|
| 268 |
### **Evaluation Datasets**
|
| 269 |
|
| 270 |
+
The Nemotron OCR v1 model is evaluated on several NVIDIA internal datasets for various tasks, such as pure OCR, table content extraction, and document retrieval.
|
| 271 |
|
| 272 |
**Data Collection Method:** Hybrid (Automated, Human, Synthetic)<br>
|
| 273 |
**Labeling Method:** Hybrid (Automated, Human, Synthetic)<br>
|
|
|
|
| 275 |
|
| 276 |
### **Evaluation Results**
|
| 277 |
|
| 278 |
+
We benchmarked Nemotron OCR v1 on internal evaluation datasets against PaddleOCR on various tasks, such as pure OCR (Character Error Rate), table content extraction (TEDS), and document retrieval (Recall@5).
|
| 279 |
|
| 280 |
+
| Metric | Nemotron OCR v1 | PaddleOCR | Net change |
|
| 281 |
|-------------------------------------------|--------------------|-----------|-----------------|
|
| 282 |
| Character Error Rate | 0.1633 | 0.2029 | -19.5% ✔️ |
|
| 283 |
| Bag-of-character Error Rate | 0.0453 | 0.0512 | -11.5% ✔️ |
|
nemotron-ocr/pyproject.toml
CHANGED
|
@@ -1,8 +1,8 @@
|
|
| 1 |
[project]
|
| 2 |
name = "nemotron-ocr"
|
| 3 |
version = "1.0.0"
|
| 4 |
-
description = "
|
| 5 |
-
authors = [{ name = "NVIDIA
|
| 6 |
requires-python = ">=3.12,<3.13"
|
| 7 |
dependencies = [
|
| 8 |
"pandas>=2.3.3",
|
|
|
|
| 1 |
[project]
|
| 2 |
name = "nemotron-ocr"
|
| 3 |
version = "1.0.0"
|
| 4 |
+
description = "Nemoton OCR"
|
| 5 |
+
authors = [{ name = "NVIDIA Nemotron" }]
|
| 6 |
requires-python = ">=3.12,<3.13"
|
| 7 |
dependencies = [
|
| 8 |
"pandas>=2.3.3",
|