| # Table-Transformer-Detection |
|
|
| ## Model Description |
| **Table-Transformer-Detection** is a 28.8-million-parameter object detection model from Microsoft Research, fine-tuned specifically for table detection in documents. |
| Built on the DETR (DEtection TRansformer) architecture, it locates and identifies tables within unstructured document images such as PDFs and scanned pages. |
|
|
| Trained on PubTables-1M — a large-scale dataset containing nearly one million fully annotated tables from scientific articles — Table-Transformer-Detection delivers strong performance for document table extraction without requiring task-specific architectural customization. |
|
|
| ## Quickstart |
|
|
| Follow the instructions [here](https://sdk.nexa.ai/model/Table-transformer-detection). Start with 3 simple steps. |
|
|
| ## Features |
| - **Table detection**: accurately locates tables in document images, PDFs, and scanned pages. |
| - **DETR-based architecture**: leverages a Transformer encoder-decoder on top of a CNN backbone (ResNet) for end-to-end object detection. |
| - **Pre-normalization**: uses the "normalize before" setting, applying LayerNorm before self- and cross-attention for improved training stability. |
| - **Lightweight**: at only 28.8M parameters (F32), the model is efficient to deploy and run inference on. |
| - **Fine-tunable**: can be further fine-tuned on domain-specific document datasets for improved accuracy. |
|
|
| ## Use Cases |
| - Automated document processing and digitization pipelines |
| - Table extraction from academic papers and research articles |
| - Invoice and financial document parsing |
| - Legal and regulatory document analysis |
| - Healthcare and clinical report table extraction |
| - Preprocessing step for downstream table structure recognition |
|
|
| ## Inputs and Outputs |
| **Input**: |
| - Document images (JPEG, PNG, etc.) containing one or more tables. |
|
|
| **Output**: |
| - Bounding box predictions with confidence scores for each detected table in the image. |
| - Class labels identifying detected objects as tables. |
|
|
| ## License |
| This repo is licensed under the Creative Commons Attribution–NonCommercial 4.0 (CC BY-NC 4.0) license, which allows use, sharing, and modification only for non-commercial purposes with proper attribution. All NPU-related models, runtimes, and code in this project are protected under this non-commercial license and cannot be used in any commercial or revenue-generating applications. Commercial licensing or enterprise usage requires a separate agreement. For inquiries, please contact `dev@nexa.ai` |