Kreuzberg
/

layout-models

@@ -29,18 +29,27 @@ ONNX models used by [Kreuzberg](https://kreuzberg.dev) for document layout detec
 **Layout Classes:** Caption, Footnote, Formula, ListItem, PageFooter, PageHeader, Picture, SectionHeader, Table, Text, Title, DocumentIndex, Code, CheckboxSelected, CheckboxUnselected, Form, KeyValueRegion
-### SLANet-plus (Table Structure Recognition)
 | Property | Value |
 |----------|-------|
-| **Path** | `slanet-plus/model.onnx` |
-| **Size** | 7.8 MB |
-| **Precision** | FP32 |
-| **Architecture** | SLANet-plus (Sequence-to-Sequence table decoder) |
-| **Input** | `x`: `[1, 3, 488, 488]` f32 (BGR channel order, ImageNet-normalized) |
-| **Outputs** | `[1, seq_len, 8]` cell bbox corners, `[1, seq_len, 50]` HTML token probabilities |
-| **Vocabulary** | 50 tokens (HTML structure tags, rowspan/colspan 1-20, sos/eos) |
-| **SHA256** | `e0bff8da087f9b83629f1e1a6e0f8252fc2de85a7d80415b3510fc521338da3d` |
 ## Attribution & Provenance
@@ -53,14 +62,16 @@ This model is mirrored from [docling-project/docling-layout-heron-onnx](https://
 - **Architecture paper:** Zhao et al., "DETRs Beat YOLOs on Real-time Object Detection" ([arXiv:2304.08069](https://arxiv.org/abs/2304.08069))
 - **Training data:** DocLayNet and internal IBM document datasets
-### SLANet-plus
-This model was converted from PaddlePaddle format to ONNX using [Paddle2ONNX](https://github.com/PaddlePaddle/Paddle2ONNX). The original model is from the [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR) project by PaddlePaddle.
-- **Original repository:** [PaddlePaddle/SLANet_plus](https://huggingface.co/PaddlePaddle/SLANet_plus)
-- **License:** Apache-2.0
-- **Architecture paper:** "PP-StructureV2: A Stronger Document Analysis System" ([arXiv:2210.05391](https://arxiv.org/abs/2210.05391))
-- **Conversion:** PaddlePaddle inference format → ONNX via Paddle2ONNX (opset 17)
 ## Usage
@@ -68,4 +79,5 @@ These models are automatically downloaded and cached by the [Kreuzberg](https://
 ## License
-All models in this repository are distributed under the [Apache-2.0 License](https://www.apache.org/licenses/LICENSE-2.0), consistent with the licenses of the original models.

 **Layout Classes:** Caption, Footnote, Formula, ListItem, PageFooter, PageHeader, Picture, SectionHeader, Table, Text, Title, DocumentIndex, Code, CheckboxSelected, CheckboxUnselected, Form, KeyValueRegion
+### TATR (Table Structure Recognition)
 | Property | Value |
 |----------|-------|
+| **Path** | `tatr/model.onnx` |
+| **Size** | 29 MB |
+| **Precision** | INT8 quantized |
+| **Architecture** | DETR (DEtection TRansformer) — non-autoregressive object detection |
+| **Input** | `pixel_values`: `[batch, 3, H, W]` f32 (variable size, typically 800×800) |
+| **Outputs** | `logits` f32 `[batch, 125, 7]` (class probabilities), `pred_boxes` f32 `[batch, 125, 4]` (normalized cx/cy/w/h) |
+| **Classes** | 7 classes (see below) |
+| **SHA256** | see release commit |
+**Table Structure Classes:**
+0. `table` — entire table region
+1. `table column` — column span
+2. `table row` — row span
+3. `table column header` — header row cells
+4. `table projected row header` — projected row header
+5. `table spanning cell` — cells spanning multiple rows/columns
+6. `no object` — background
 ## Attribution & Provenance
 - **Architecture paper:** Zhao et al., "DETRs Beat YOLOs on Real-time Object Detection" ([arXiv:2304.08069](https://arxiv.org/abs/2304.08069))
 - **Training data:** DocLayNet and internal IBM document datasets
+### TATR (Table Transformer)
+This model is based on [microsoft/table-transformer-structure-recognition](https://huggingface.co/microsoft/table-transformer-structure-recognition) by Microsoft Research. The ONNX conversion was produced by [Xenova/table-transformer-structure-recognition](https://huggingface.co/Xenova/table-transformer-structure-recognition) using HuggingFace Optimum. Quantized to INT8 for inference efficiency.
+- **Original repository:** [microsoft/table-transformer-structure-recognition](https://huggingface.co/microsoft/table-transformer-structure-recognition)
+- **ONNX source:** [Xenova/table-transformer-structure-recognition](https://huggingface.co/Xenova/table-transformer-structure-recognition)
+- **License:** MIT
+- **Architecture paper:** Smock et al., "PubTables-1M: Towards comprehensive table extraction from unstructured documents" ([arXiv:2110.00061](https://arxiv.org/abs/2110.00061))
+- **Training data:** PubTables-1M dataset
+- **Quantization:** INT8 (dynamic quantization via ONNX Runtime)
 ## Usage
 ## License
+- RT-DETR: [Apache-2.0 License](https://www.apache.org/licenses/LICENSE-2.0)
+- TATR: [MIT License](https://opensource.org/licenses/MIT)