Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,33 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Table-Transformer-Detection
|
| 2 |
+
|
| 3 |
+
## Model Description
|
| 4 |
+
**Table-Transformer-Detection** is a 28.8-million-parameter object detection model from Microsoft Research, fine-tuned specifically for table detection in documents.
|
| 5 |
+
Built on the DETR (DEtection TRansformer) architecture, it locates and identifies tables within unstructured document images such as PDFs and scanned pages.
|
| 6 |
+
|
| 7 |
+
Trained on PubTables-1M — a large-scale dataset containing nearly one million fully annotated tables from scientific articles — Table-Transformer-Detection delivers strong performance for document table extraction without requiring task-specific architectural customization.
|
| 8 |
+
|
| 9 |
+
## Features
|
| 10 |
+
- **Table detection**: accurately locates tables in document images, PDFs, and scanned pages.
|
| 11 |
+
- **DETR-based architecture**: leverages a Transformer encoder-decoder on top of a CNN backbone (ResNet) for end-to-end object detection.
|
| 12 |
+
- **Pre-normalization**: uses the "normalize before" setting, applying LayerNorm before self- and cross-attention for improved training stability.
|
| 13 |
+
- **Lightweight**: at only 28.8M parameters (F32), the model is efficient to deploy and run inference on.
|
| 14 |
+
- **Fine-tunable**: can be further fine-tuned on domain-specific document datasets for improved accuracy.
|
| 15 |
+
|
| 16 |
+
## Use Cases
|
| 17 |
+
- Automated document processing and digitization pipelines
|
| 18 |
+
- Table extraction from academic papers and research articles
|
| 19 |
+
- Invoice and financial document parsing
|
| 20 |
+
- Legal and regulatory document analysis
|
| 21 |
+
- Healthcare and clinical report table extraction
|
| 22 |
+
- Preprocessing step for downstream table structure recognition
|
| 23 |
+
|
| 24 |
+
## Inputs and Outputs
|
| 25 |
+
**Input**:
|
| 26 |
+
- Document images (JPEG, PNG, etc.) containing one or more tables.
|
| 27 |
+
|
| 28 |
+
**Output**:
|
| 29 |
+
- Bounding box predictions with confidence scores for each detected table in the image.
|
| 30 |
+
- Class labels identifying detected objects as tables.
|
| 31 |
+
|
| 32 |
+
## License
|
| 33 |
+
This repo is licensed under the Creative Commons Attribution–NonCommercial 4.0 (CC BY-NC 4.0) license, which allows use, sharing, and modification only for non-commercial purposes with proper attribution. All NPU-related models, runtimes, and code in this project are protected under this non-commercial license and cannot be used in any commercial or revenue-generating applications. Commercial licensing or enterprise usage requires a separate agreement. For inquiries, please contact `dev@nexa.ai`
|