nvidia-oliver-holworthy commited on
Commit
b0b0d24
·
unverified ·
1 Parent(s): fb6df58

Rename NeMo Retriever OCR to Nemotron OCR

Browse files
Files changed (2) hide show
  1. README.md +10 -10
  2. nemotron-ocr/pyproject.toml +2 -2
README.md CHANGED
@@ -16,7 +16,7 @@ tags:
16
  - ingestion
17
  ---
18
 
19
- # NeMo Retriever OCR v1
20
 
21
  ## **Model Overview**
22
 
@@ -27,11 +27,11 @@ tags:
27
 
28
  ### **Description**
29
 
30
- The NeMo Retriever OCR v1 model is a state-of-the-art text recognition model designed for robust end-to-end optical character recognition (OCR) on complex real-world images. It integrates three core neural network modules: a detector for text region localization, a recognizer for transcription of detected regions, and a relational model for layout and structure analysis.
31
 
32
- This model is optimized for a wide variety of OCR tasks, including multi-line, multi-block, and natural scene text, and it supports advanced reading order analysis via its relational model component. NeMo Retriever OCR v1 has been developed to be production-ready and commercially usable, with a focus on speed and accuracy on both document and natural scene images.
33
 
34
- The NeMo Retriever OCR v1 model is part of the NVIDIA NeMo Retriever collection of NIM microservices, which provides state-of-the-art, commercially-ready models and microservices optimized for the lowest latency and highest throughput. It features a production-ready information retrieval pipeline with enterprise support. The models that form the core of this solution have been trained using responsibly selected, auditable data sources. With multiple pre-trained models available as starting points, developers can readily customize them for domain-specific use cases, such as information technology, human resource help assistants, and research & development research assistants.
35
 
36
  This model is ready for commercial use.
37
 
@@ -57,7 +57,7 @@ Global
57
 
58
  ### Use Case
59
 
60
- The **NeMo Retriever OCR v1** model is designed for high-accuracy and high-speed extraction of textual information from images, making it ideal for powering multimodal retrieval systems, Retrieval-Augmented Generation (RAG) pipelines, and agentic applications that require seamless integration of visual and language understanding. Its robust performance and efficiency make it an excellent choice for next-generation AI systems that demand both precision and scalability across diverse real-world content.
61
 
62
  ### Release Date
63
 
@@ -71,7 +71,7 @@ The **NeMo Retriever OCR v1** model is designed for high-accuracy and high-speed
71
 
72
  **Architecture Type:** Hybrid detector–recognizer with document-level relational modeling
73
 
74
- The NeMo Retriever OCR v1 model integrates three specialized neural components:
75
 
76
  - **Text Detector:** Utilizes a RegNetY-8GF convolutional backbone for high-accuracy localization of text regions within images.
77
  - **Text Recognizer:** Employs a Transformer-based sequence recognizer to transcribe text from detected regions, supporting variable word and line lengths.
@@ -230,7 +230,7 @@ for pred in predictions:
230
  ### Software Integration
231
 
232
  **Runtime Engine(s):**
233
- - **NeMo Retriever Page Elements v3** NIM
234
 
235
 
236
  **Supported Hardware Microarchitecture Compatibility [List in Alphabetic Order]:**
@@ -267,7 +267,7 @@ The model is trained on a large-scale, curated mix of public and proprietary OCR
267
 
268
  ### **Evaluation Datasets**
269
 
270
- The NeMo Retriever OCR v1 model is evaluated on several NVIDIA internal datasets for various tasks, such as pure OCR, table content extraction, and document retrieval.
271
 
272
  **Data Collection Method:** Hybrid (Automated, Human, Synthetic)<br>
273
  **Labeling Method:** Hybrid (Automated, Human, Synthetic)<br>
@@ -275,9 +275,9 @@ The NeMo Retriever OCR v1 model is evaluated on several NVIDIA internal datasets
275
 
276
  ### **Evaluation Results**
277
 
278
- We benchmarked NeMo Retriever OCR v1 on internal evaluation datasets against PaddleOCR on various tasks, such as pure OCR (Character Error Rate), table content extraction (TEDS), and document retrieval (Recall@5).
279
 
280
- | Metric | NeMo Retriever OCR v1 | PaddleOCR | Net change |
281
  |-------------------------------------------|--------------------|-----------|-----------------|
282
  | Character Error Rate | 0.1633 | 0.2029 | -19.5% ✔️ |
283
  | Bag-of-character Error Rate | 0.0453 | 0.0512 | -11.5% ✔️ |
 
16
  - ingestion
17
  ---
18
 
19
+ # Nemotron OCR v1
20
 
21
  ## **Model Overview**
22
 
 
27
 
28
  ### **Description**
29
 
30
+ The Nemotron OCR v1 model is a state-of-the-art text recognition model designed for robust end-to-end optical character recognition (OCR) on complex real-world images. It integrates three core neural network modules: a detector for text region localization, a recognizer for transcription of detected regions, and a relational model for layout and structure analysis.
31
 
32
+ This model is optimized for a wide variety of OCR tasks, including multi-line, multi-block, and natural scene text, and it supports advanced reading order analysis via its relational model component. Nemotron OCR v1 has been developed to be production-ready and commercially usable, with a focus on speed and accuracy on both document and natural scene images.
33
 
34
+ The Nemotron OCR v1 model is part of the NVIDIA NeMo Retriever collection of NIM microservices, which provides state-of-the-art, commercially-ready models and microservices optimized for the lowest latency and highest throughput. It features a production-ready information retrieval pipeline with enterprise support. The models that form the core of this solution have been trained using responsibly selected, auditable data sources. With multiple pre-trained models available as starting points, developers can readily customize them for domain-specific use cases, such as information technology, human resource help assistants, and research & development research assistants.
35
 
36
  This model is ready for commercial use.
37
 
 
57
 
58
  ### Use Case
59
 
60
+ The **Nemotron OCR v1** model is designed for high-accuracy and high-speed extraction of textual information from images, making it ideal for powering multimodal retrieval systems, Retrieval-Augmented Generation (RAG) pipelines, and agentic applications that require seamless integration of visual and language understanding. Its robust performance and efficiency make it an excellent choice for next-generation AI systems that demand both precision and scalability across diverse real-world content.
61
 
62
  ### Release Date
63
 
 
71
 
72
  **Architecture Type:** Hybrid detector–recognizer with document-level relational modeling
73
 
74
+ The Nemotron OCR v1 model integrates three specialized neural components:
75
 
76
  - **Text Detector:** Utilizes a RegNetY-8GF convolutional backbone for high-accuracy localization of text regions within images.
77
  - **Text Recognizer:** Employs a Transformer-based sequence recognizer to transcribe text from detected regions, supporting variable word and line lengths.
 
230
  ### Software Integration
231
 
232
  **Runtime Engine(s):**
233
+ - **NeMo Nemotron OCR V1** NIM
234
 
235
 
236
  **Supported Hardware Microarchitecture Compatibility [List in Alphabetic Order]:**
 
267
 
268
  ### **Evaluation Datasets**
269
 
270
+ The Nemotron OCR v1 model is evaluated on several NVIDIA internal datasets for various tasks, such as pure OCR, table content extraction, and document retrieval.
271
 
272
  **Data Collection Method:** Hybrid (Automated, Human, Synthetic)<br>
273
  **Labeling Method:** Hybrid (Automated, Human, Synthetic)<br>
 
275
 
276
  ### **Evaluation Results**
277
 
278
+ We benchmarked Nemotron OCR v1 on internal evaluation datasets against PaddleOCR on various tasks, such as pure OCR (Character Error Rate), table content extraction (TEDS), and document retrieval (Recall@5).
279
 
280
+ | Metric | Nemotron OCR v1 | PaddleOCR | Net change |
281
  |-------------------------------------------|--------------------|-----------|-----------------|
282
  | Character Error Rate | 0.1633 | 0.2029 | -19.5% ✔️ |
283
  | Bag-of-character Error Rate | 0.0453 | 0.0512 | -11.5% ✔️ |
nemotron-ocr/pyproject.toml CHANGED
@@ -1,8 +1,8 @@
1
  [project]
2
  name = "nemotron-ocr"
3
  version = "1.0.0"
4
- description = "NeMo Retriever OCR"
5
- authors = [{ name = "NVIDIA NeMo Retriever" }]
6
  requires-python = ">=3.12,<3.13"
7
  dependencies = [
8
  "pandas>=2.3.3",
 
1
  [project]
2
  name = "nemotron-ocr"
3
  version = "1.0.0"
4
+ description = "Nemoton OCR"
5
+ authors = [{ name = "NVIDIA Nemotron" }]
6
  requires-python = ">=3.12,<3.13"
7
  dependencies = [
8
  "pandas>=2.3.3",