naamanhirschfeld commited on
Commit
cd2cb70
·
verified ·
1 Parent(s): 0c140bd

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +28 -16
README.md CHANGED
@@ -29,18 +29,27 @@ ONNX models used by [Kreuzberg](https://kreuzberg.dev) for document layout detec
29
 
30
  **Layout Classes:** Caption, Footnote, Formula, ListItem, PageFooter, PageHeader, Picture, SectionHeader, Table, Text, Title, DocumentIndex, Code, CheckboxSelected, CheckboxUnselected, Form, KeyValueRegion
31
 
32
- ### SLANet-plus (Table Structure Recognition)
33
 
34
  | Property | Value |
35
  |----------|-------|
36
- | **Path** | `slanet-plus/model.onnx` |
37
- | **Size** | 7.8 MB |
38
- | **Precision** | FP32 |
39
- | **Architecture** | SLANet-plus (Sequence-to-Sequence table decoder) |
40
- | **Input** | `x`: `[1, 3, 488, 488]` f32 (BGR channel order, ImageNet-normalized) |
41
- | **Outputs** | `[1, seq_len, 8]` cell bbox corners, `[1, seq_len, 50]` HTML token probabilities |
42
- | **Vocabulary** | 50 tokens (HTML structure tags, rowspan/colspan 1-20, sos/eos) |
43
- | **SHA256** | `e0bff8da087f9b83629f1e1a6e0f8252fc2de85a7d80415b3510fc521338da3d` |
 
 
 
 
 
 
 
 
 
44
 
45
  ## Attribution & Provenance
46
 
@@ -53,14 +62,16 @@ This model is mirrored from [docling-project/docling-layout-heron-onnx](https://
53
  - **Architecture paper:** Zhao et al., "DETRs Beat YOLOs on Real-time Object Detection" ([arXiv:2304.08069](https://arxiv.org/abs/2304.08069))
54
  - **Training data:** DocLayNet and internal IBM document datasets
55
 
56
- ### SLANet-plus
57
 
58
- This model was converted from PaddlePaddle format to ONNX using [Paddle2ONNX](https://github.com/PaddlePaddle/Paddle2ONNX). The original model is from the [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR) project by PaddlePaddle.
59
 
60
- - **Original repository:** [PaddlePaddle/SLANet_plus](https://huggingface.co/PaddlePaddle/SLANet_plus)
61
- - **License:** Apache-2.0
62
- - **Architecture paper:** "PP-StructureV2: A Stronger Document Analysis System" ([arXiv:2210.05391](https://arxiv.org/abs/2210.05391))
63
- - **Conversion:** PaddlePaddle inference format → ONNX via Paddle2ONNX (opset 17)
 
 
64
 
65
  ## Usage
66
 
@@ -68,4 +79,5 @@ These models are automatically downloaded and cached by the [Kreuzberg](https://
68
 
69
  ## License
70
 
71
- All models in this repository are distributed under the [Apache-2.0 License](https://www.apache.org/licenses/LICENSE-2.0), consistent with the licenses of the original models.
 
 
29
 
30
  **Layout Classes:** Caption, Footnote, Formula, ListItem, PageFooter, PageHeader, Picture, SectionHeader, Table, Text, Title, DocumentIndex, Code, CheckboxSelected, CheckboxUnselected, Form, KeyValueRegion
31
 
32
+ ### TATR (Table Structure Recognition)
33
 
34
  | Property | Value |
35
  |----------|-------|
36
+ | **Path** | `tatr/model.onnx` |
37
+ | **Size** | 29 MB |
38
+ | **Precision** | INT8 quantized |
39
+ | **Architecture** | DETR (DEtection TRansformer) — non-autoregressive object detection |
40
+ | **Input** | `pixel_values`: `[batch, 3, H, W]` f32 (variable size, typically 800×800) |
41
+ | **Outputs** | `logits` f32 `[batch, 125, 7]` (class probabilities), `pred_boxes` f32 `[batch, 125, 4]` (normalized cx/cy/w/h) |
42
+ | **Classes** | 7 classes (see below) |
43
+ | **SHA256** | see release commit |
44
+
45
+ **Table Structure Classes:**
46
+ 0. `table` — entire table region
47
+ 1. `table column` — column span
48
+ 2. `table row` — row span
49
+ 3. `table column header` — header row cells
50
+ 4. `table projected row header` — projected row header
51
+ 5. `table spanning cell` — cells spanning multiple rows/columns
52
+ 6. `no object` — background
53
 
54
  ## Attribution & Provenance
55
 
 
62
  - **Architecture paper:** Zhao et al., "DETRs Beat YOLOs on Real-time Object Detection" ([arXiv:2304.08069](https://arxiv.org/abs/2304.08069))
63
  - **Training data:** DocLayNet and internal IBM document datasets
64
 
65
+ ### TATR (Table Transformer)
66
 
67
+ This model is based on [microsoft/table-transformer-structure-recognition](https://huggingface.co/microsoft/table-transformer-structure-recognition) by Microsoft Research. The ONNX conversion was produced by [Xenova/table-transformer-structure-recognition](https://huggingface.co/Xenova/table-transformer-structure-recognition) using HuggingFace Optimum. Quantized to INT8 for inference efficiency.
68
 
69
+ - **Original repository:** [microsoft/table-transformer-structure-recognition](https://huggingface.co/microsoft/table-transformer-structure-recognition)
70
+ - **ONNX source:** [Xenova/table-transformer-structure-recognition](https://huggingface.co/Xenova/table-transformer-structure-recognition)
71
+ - **License:** MIT
72
+ - **Architecture paper:** Smock et al., "PubTables-1M: Towards comprehensive table extraction from unstructured documents" ([arXiv:2110.00061](https://arxiv.org/abs/2110.00061))
73
+ - **Training data:** PubTables-1M dataset
74
+ - **Quantization:** INT8 (dynamic quantization via ONNX Runtime)
75
 
76
  ## Usage
77
 
 
79
 
80
  ## License
81
 
82
+ - RT-DETR: [Apache-2.0 License](https://www.apache.org/licenses/LICENSE-2.0)
83
+ - TATR: [MIT License](https://opensource.org/licenses/MIT)