File size: 5,071 Bytes
f18ee32
 
 
 
 
 
 
 
 
 
 
 
 
 
 
105a2c7
a2b81e0
 
 
c40e359
 
 
 
5addea1
c40e359
 
 
 
 
105a2c7
 
f18ee32
5addea1
c40e359
f18ee32
 
c40e359
 
f18ee32
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c40e359
f18ee32
 
 
 
 
 
 
 
 
0f6ea68
 
f18ee32
 
 
0f6ea68
 
 
 
 
 
 
 
 
 
f18ee32
0f6ea68
f18ee32
0f6ea68
 
 
 
 
f17bbd4
0f6ea68
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f18ee32
0f6ea68
 
 
 
 
f18ee32
c40e359
 
 
 
a2b81e0
 
 
c40e359
 
 
 
 
 
f18ee32
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
<!-- ---

# <div align="center">

# <p align="center">
#     <img src="https://storage.googleapis.com/mle-courses-prod/users/61b6fa1ba83a7e37c8309756/private-files/678dadd0-603b-11ef-b0a7-998b84b38d43-ProtonX_logo_horizontally__1_.png" width="260"/>
# </p>

# <h1 align="center">
# ProtonX OCR tool: Table Detector
# </h1>

# </div> -->
---

<div align="center">
<p align="center">
    <img src="https://storage.googleapis.com/mle-courses-prod/users/61b6fa1ba83a7e37c8309756/private-files/ff27e200-e181-11f0-b179-8566ca0312de-Untitled_design_(3).png" width="400"/>
</p>

<h1 align="center">
ProtonX OCR tool: Table Detector
</h1>
<h3 align="center">Only 11MB size</h3>

[![GitHub](https://img.shields.io/badge/ProtonX-GitHub-black?logo=github)](https://github.com/protonx-engineering/protonx-text-correction)
[![HuggingFace](https://img.shields.io/badge/HuggingFace-Model-black?logo=huggingface)](https://huggingface.co/protonx-models/protonx-tc)
[![Website](https://img.shields.io/badge/protonx.co-Website-blue)](https://protonx.co)
[![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1V9B38kbQP17RR0-WqVcPt0R7C5RiZ1_x?usp=sharing)
</div>

## **Introduction**
This model helps ProtonX support customers in reducing OCR processing costs. For documents that do not contain tables, ProtonX routes them to open-source OCR models such as Dots OCR or DeepSeek OCR. For documents with complex tables, ProtonX routes them to more powerful OCR models such as Gemini OCR, ensuring high accuracy where it matters most.

This model is a **binary image classification model** designed to determine **whether an input document image contains at least one table**.

![](https://storage.googleapis.com/mle-courses-prod/users/61b6fa1ba83a7e37c8309756/private-files/db73d3b0-e180-11f0-b179-8566ca0312de-table_detection_examples.png)

Built on MobileNetV2 architecture, the model is optimized for **document images and scanned PDFs**, especially **Vietnamese documents**, and is intended to be used as a **fast pre-filtering step** in OCR and document understanding pipelines.

---

## **Task Definition**

**Task**: Binary image classification  
**Objective**: Detect **table presence** in an image  

### **Labels**
| ID | Label     | Meaning |
|--|--|--|
| 0 | `no_table` | Image contains **no tables** |
| 1 | `table`    | Image contains **one or more tables** |

> ⚠️ The model detects **presence**, not the number or location of tables.

--- 

## **Training Data**

The model is trained using a combination of:

### **DocLayNet Dataset**
- Public document layout dataset
- High-quality annotations
- Diverse document layouts

### **In-house Labeled Vietnamese Document Dataset**
- Scanned PDFs from Vietnamese legal documents
- Mixed-quality OCR inputs
- Real-world layouts:
  - Contracts
  - Administrative forms
  - Reports
  - Tables embedded in text-heavy pages

This combination improves **generalization** across both clean and noisy document images.

## **Quick Usage**



### Using torchvision
```python
import torch
import torch.nn as nn
import torchvision
from torchvision import transforms
from torchvision import models as pretrained_models
from PIL import Image
from huggingface_hub import hf_hub_download

class TableDetector:
    def __init__(self, model_name: str, device: str = 'cpu'):
        self.device = torch.device(device)
        self.model_path = hf_hub_download(repo_id=model_name, filename="model/table_detector.pth")
        self.model = self.load_model(self.model_path)
        self.model.to(self.device)
        self.model.eval()

    def load_model(self, model_path: str):
        model = pretrained_models.mobilenet_v2(weights=None)
        model.classifier[1]  = nn.Linear(in_features=model.classifier[1].in_features, out_features=2)
        model.load_state_dict(torch.load(model_path, map_location=self.device))
        return model

    def preprocess_image(self, image_path: str):
        transform = transforms.Compose([
                    transforms.Resize((224, 224)),
                    transforms.ToTensor(),
                ])
        image = Image.open(image_path).convert('RGB')
        image = transform(image).unsqueeze(0)  # Add batch dimension
        return image.to(self.device)

    def predict(self, image_path: str):
        image = self.preprocess_image(image_path)
        with torch.no_grad():
            outputs = self.model(image)
            _, preds = torch.max(outputs, 1)
        return 'have_table' if preds.item() == 1 else 'no_table'

if __name__ == "__main__":
    model = TableDetector(model_name='protonx-models/table-detector', device='cpu')

    prediction = model.predict("images/document_page_01.png")

    print(prediction)
```

### Using ProtonX library
```python
from protonx import ProtonX

client = ProtonX(
    mode="offline"
)
prediction = client.ocr.detect_table(image_path="images/document_page_01.png")

print(prediction)

```

## **Acknowledgments**

Thanks to:

* [DocLayNet](https://huggingface.co/datasets/docling-project/DocLayNet)