Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -29,18 +29,27 @@ ONNX models used by [Kreuzberg](https://kreuzberg.dev) for document layout detec
|
|
| 29 |
|
| 30 |
**Layout Classes:** Caption, Footnote, Formula, ListItem, PageFooter, PageHeader, Picture, SectionHeader, Table, Text, Title, DocumentIndex, Code, CheckboxSelected, CheckboxUnselected, Form, KeyValueRegion
|
| 31 |
|
| 32 |
-
###
|
| 33 |
|
| 34 |
| Property | Value |
|
| 35 |
|----------|-------|
|
| 36 |
-
| **Path** | `
|
| 37 |
-
| **Size** |
|
| 38 |
-
| **Precision** |
|
| 39 |
-
| **Architecture** |
|
| 40 |
-
| **Input** | `
|
| 41 |
-
| **Outputs** | `[
|
| 42 |
-
| **
|
| 43 |
-
| **SHA256** |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 44 |
|
| 45 |
## Attribution & Provenance
|
| 46 |
|
|
@@ -53,14 +62,16 @@ This model is mirrored from [docling-project/docling-layout-heron-onnx](https://
|
|
| 53 |
- **Architecture paper:** Zhao et al., "DETRs Beat YOLOs on Real-time Object Detection" ([arXiv:2304.08069](https://arxiv.org/abs/2304.08069))
|
| 54 |
- **Training data:** DocLayNet and internal IBM document datasets
|
| 55 |
|
| 56 |
-
###
|
| 57 |
|
| 58 |
-
This model
|
| 59 |
|
| 60 |
-
- **Original repository:** [
|
| 61 |
-
- **
|
| 62 |
-
- **
|
| 63 |
-
- **
|
|
|
|
|
|
|
| 64 |
|
| 65 |
## Usage
|
| 66 |
|
|
@@ -68,4 +79,5 @@ These models are automatically downloaded and cached by the [Kreuzberg](https://
|
|
| 68 |
|
| 69 |
## License
|
| 70 |
|
| 71 |
-
|
|
|
|
|
|
| 29 |
|
| 30 |
**Layout Classes:** Caption, Footnote, Formula, ListItem, PageFooter, PageHeader, Picture, SectionHeader, Table, Text, Title, DocumentIndex, Code, CheckboxSelected, CheckboxUnselected, Form, KeyValueRegion
|
| 31 |
|
| 32 |
+
### TATR (Table Structure Recognition)
|
| 33 |
|
| 34 |
| Property | Value |
|
| 35 |
|----------|-------|
|
| 36 |
+
| **Path** | `tatr/model.onnx` |
|
| 37 |
+
| **Size** | 29 MB |
|
| 38 |
+
| **Precision** | INT8 quantized |
|
| 39 |
+
| **Architecture** | DETR (DEtection TRansformer) — non-autoregressive object detection |
|
| 40 |
+
| **Input** | `pixel_values`: `[batch, 3, H, W]` f32 (variable size, typically 800×800) |
|
| 41 |
+
| **Outputs** | `logits` f32 `[batch, 125, 7]` (class probabilities), `pred_boxes` f32 `[batch, 125, 4]` (normalized cx/cy/w/h) |
|
| 42 |
+
| **Classes** | 7 classes (see below) |
|
| 43 |
+
| **SHA256** | see release commit |
|
| 44 |
+
|
| 45 |
+
**Table Structure Classes:**
|
| 46 |
+
0. `table` — entire table region
|
| 47 |
+
1. `table column` — column span
|
| 48 |
+
2. `table row` — row span
|
| 49 |
+
3. `table column header` — header row cells
|
| 50 |
+
4. `table projected row header` — projected row header
|
| 51 |
+
5. `table spanning cell` — cells spanning multiple rows/columns
|
| 52 |
+
6. `no object` — background
|
| 53 |
|
| 54 |
## Attribution & Provenance
|
| 55 |
|
|
|
|
| 62 |
- **Architecture paper:** Zhao et al., "DETRs Beat YOLOs on Real-time Object Detection" ([arXiv:2304.08069](https://arxiv.org/abs/2304.08069))
|
| 63 |
- **Training data:** DocLayNet and internal IBM document datasets
|
| 64 |
|
| 65 |
+
### TATR (Table Transformer)
|
| 66 |
|
| 67 |
+
This model is based on [microsoft/table-transformer-structure-recognition](https://huggingface.co/microsoft/table-transformer-structure-recognition) by Microsoft Research. The ONNX conversion was produced by [Xenova/table-transformer-structure-recognition](https://huggingface.co/Xenova/table-transformer-structure-recognition) using HuggingFace Optimum. Quantized to INT8 for inference efficiency.
|
| 68 |
|
| 69 |
+
- **Original repository:** [microsoft/table-transformer-structure-recognition](https://huggingface.co/microsoft/table-transformer-structure-recognition)
|
| 70 |
+
- **ONNX source:** [Xenova/table-transformer-structure-recognition](https://huggingface.co/Xenova/table-transformer-structure-recognition)
|
| 71 |
+
- **License:** MIT
|
| 72 |
+
- **Architecture paper:** Smock et al., "PubTables-1M: Towards comprehensive table extraction from unstructured documents" ([arXiv:2110.00061](https://arxiv.org/abs/2110.00061))
|
| 73 |
+
- **Training data:** PubTables-1M dataset
|
| 74 |
+
- **Quantization:** INT8 (dynamic quantization via ONNX Runtime)
|
| 75 |
|
| 76 |
## Usage
|
| 77 |
|
|
|
|
| 79 |
|
| 80 |
## License
|
| 81 |
|
| 82 |
+
- RT-DETR: [Apache-2.0 License](https://www.apache.org/licenses/LICENSE-2.0)
|
| 83 |
+
- TATR: [MIT License](https://opensource.org/licenses/MIT)
|