nvidia
/

nemotron-page-elements-v3

Object Detection

Model card Files Files and versions

TheoViel commited on Oct 17, 2025

Commit

3eef38a

·

verified ·

1 Parent(s): d45b8b3

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -240,7 +240,7 @@ Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.
 | Intended Task/Domain: | Document Understanding |
 | Model Type: | YOLOX Object Detection for Charts, Tables, Infographics, Header/footers, Texts, and Titles |
 | Intended User: | Enterprise developers, data scientists, and other technical users who need to extract structural elements from documents. |
-| Output: | A List of dictionaries containing lists of dictionaries of floating point numbers (representing bounding box information). <br> **Example**: `{"data": [{"index": 0,"bounding_boxes": {"table": [{"x_min": 0.6503,"y_min": 0.2161,"x_max": 0.7835,"y_max": 0.3236,"confidence": 0.9306}]}}]}` |
 | Describe how the model works: | The model identifies objects in an image by first dividing the image into a grid. For each grid cell, it extracts visual features and simultaneously predicts which objects are present (for example, 'chart' or 'table') and where they are located in that cell, all in a single pass through the image. |
 | Name the adversely impacted groups this has been tested to deliver comparable outcomes regardless of: | Not Applicable |
 | Technical Limitations & Mitigation: | The model may not generalize to unknown document types/formats not commonly found on the web. Further fine-tuning might be required for such documents. |

 | Intended Task/Domain: | Document Understanding |
 | Model Type: | YOLOX Object Detection for Charts, Tables, Infographics, Header/footers, Texts, and Titles |
 | Intended User: | Enterprise developers, data scientists, and other technical users who need to extract structural elements from documents. |
+| Output: | After post-processing, the output is three numpy array that contains the detections: `boxes [N x 4]` (format is normalized `(x_min, y_min, x_max, y_max)`), associated classes: `labels [N]` and confidence scores: `scores [N]`.|
 | Describe how the model works: | The model identifies objects in an image by first dividing the image into a grid. For each grid cell, it extracts visual features and simultaneously predicts which objects are present (for example, 'chart' or 'table') and where they are located in that cell, all in a single pass through the image. |
 | Name the adversely impacted groups this has been tested to deliver comparable outcomes regardless of: | Not Applicable |
 | Technical Limitations & Mitigation: | The model may not generalize to unknown document types/formats not commonly found on the web. Further fine-tuning might be required for such documents. |