NexaAI
/

table-transformer-detection-npu

Model card Files Files and versions

table-transformer-detection-npu / README.md

nexaml's picture

Update README.md

0af4bdc verified 26 days ago

|

history blame contribute delete

2.49 kB

	# Table-Transformer-Detection

	## Model Description
	Table-Transformer-Detection is a 28.8-million-parameter object detection model from Microsoft Research, fine-tuned specifically for table detection in documents.
	Built on the DETR (DEtection TRansformer) architecture, it locates and identifies tables within unstructured document images such as PDFs and scanned pages.

	Trained on PubTables-1M — a large-scale dataset containing nearly one million fully annotated tables from scientific articles — Table-Transformer-Detection delivers strong performance for document table extraction without requiring task-specific architectural customization.

	## Quickstart

	Follow the instructions [here](https://sdk.nexa.ai/model/Table-transformer-detection). Start with 3 simple steps.

	## Features
	- Table detection: accurately locates tables in document images, PDFs, and scanned pages.
	- DETR-based architecture: leverages a Transformer encoder-decoder on top of a CNN backbone (ResNet) for end-to-end object detection.
	- Pre-normalization: uses the "normalize before" setting, applying LayerNorm before self- and cross-attention for improved training stability.
	- Lightweight: at only 28.8M parameters (F32), the model is efficient to deploy and run inference on.
	- Fine-tunable: can be further fine-tuned on domain-specific document datasets for improved accuracy.

	## Use Cases
	- Automated document processing and digitization pipelines
	- Table extraction from academic papers and research articles
	- Invoice and financial document parsing
	- Legal and regulatory document analysis
	- Healthcare and clinical report table extraction
	- Preprocessing step for downstream table structure recognition

	## Inputs and Outputs
	Input:
	- Document images (JPEG, PNG, etc.) containing one or more tables.

	Output:
	- Bounding box predictions with confidence scores for each detected table in the image.
	- Class labels identifying detected objects as tables.

	## License
	This repo is licensed under the Creative Commons Attribution–NonCommercial 4.0 (CC BY-NC 4.0) license, which allows use, sharing, and modification only for non-commercial purposes with proper attribution. All NPU-related models, runtimes, and code in this project are protected under this non-commercial license and cannot be used in any commercial or revenue-generating applications. Commercial licensing or enterprise usage requires a separate agreement. For inquiries, please contact `dev@nexa.ai`