hoangth2k4 commited on
Commit
f18ee32
·
verified ·
1 Parent(s): 0f6ea68

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +83 -3
README.md CHANGED
@@ -1,5 +1,79 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ## **Quick Usage**
2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ```python
4
  import torch
5
  import torch.nn as nn
@@ -10,9 +84,9 @@ from PIL import Image
10
  from huggingface_hub import hf_hub_download
11
 
12
  class TableDetector:
13
- def __init__(self, device: str = 'cpu'):
14
  self.device = torch.device(device)
15
- self.model_path = hf_hub_download(repo_id="protonx-models/protonx-ocr-tool-table-detector", filename="model/table_detector.pth")
16
  self.model = self.load_model(self.model_path)
17
  self.model.to(self.device)
18
  self.model.eval()
@@ -40,9 +114,15 @@ class TableDetector:
40
  return 'have_table' if preds.item() == 1 else 'no_table'
41
 
42
  if __name__ == "__main__":
43
- model = TableDetector(device='cpu')
44
 
45
  prediction = model.predict("images/document_page_01.png")
46
 
47
  print(prediction)
48
  ```
 
 
 
 
 
 
 
1
+ <!-- ---
2
+
3
+ # <div align="center">
4
+
5
+ # <p align="center">
6
+ # <img src="https://storage.googleapis.com/mle-courses-prod/users/61b6fa1ba83a7e37c8309756/private-files/678dadd0-603b-11ef-b0a7-998b84b38d43-ProtonX_logo_horizontally__1_.png" width="260"/>
7
+ # </p>
8
+
9
+ # <h1 align="center">
10
+ # ProtonX OCR tool: Table Detector
11
+ # </h1>
12
+
13
+ # </div> -->
14
+ ---
15
+
16
+ ## **Introduction**
17
+ ### **ProtonX OCR tool: Table Detector**
18
+ This model is a **binary image classification model** designed to determine **whether an input document image contains at least one table**.
19
+
20
+ Built on MobileNetV2 architecture, the model is optimized for **document images and scanned PDFs**, especially **Vietnamese documents**, and is intended to be used as a **fast pre-filtering step** in OCR and document understanding pipelines.
21
+
22
+ ---
23
+
24
+ ## **Task Definition**
25
+
26
+ **Task**: Binary image classification
27
+ **Objective**: Detect **table presence** in an image
28
+
29
+ ### **Labels**
30
+ | ID | Label | Meaning |
31
+ |--|--|--|
32
+ | 0 | `no_table` | Image contains **no tables** |
33
+ | 1 | `table` | Image contains **one or more tables** |
34
+
35
+ > ⚠️ The model detects **presence**, not the number or location of tables.
36
+
37
+ ---
38
+
39
+ ## **Training Data**
40
+
41
+ The model is trained using a combination of:
42
+
43
+ ### **DocLayNet Dataset**
44
+ - Public document layout dataset
45
+ - High-quality annotations
46
+ - Diverse document layouts
47
+
48
+ ### **In-house Labeled Vietnamese Document Dataset**
49
+ - Scanned PDFs from Vietnamese documents
50
+ - Mixed-quality OCR inputs
51
+ - Real-world layouts:
52
+ - Contracts
53
+ - Administrative forms
54
+ - Reports
55
+ - Tables embedded in text-heavy pages
56
+
57
+ This combination improves **generalization** across both clean and noisy document images.
58
+
59
  ## **Quick Usage**
60
 
61
+ ### Using ProtonX library
62
+ ```python
63
+ import os
64
+ import unittest
65
+ import torch
66
+ import torchvision
67
+ from protonx import ProtonX
68
+
69
+ client = ProtonX()
70
+ prediction = client.ocr.detect_table(image_path="images/document_page_01.png")
71
+
72
+ print(prediction)
73
+
74
+ ```
75
+
76
+ ### Using torchvision
77
  ```python
78
  import torch
79
  import torch.nn as nn
 
84
  from huggingface_hub import hf_hub_download
85
 
86
  class TableDetector:
87
+ def __init__(self, model_name: str, device: str = 'cpu'):
88
  self.device = torch.device(device)
89
+ self.model_path = hf_hub_download(repo_id=model_name, filename="model/table_detector.pth")
90
  self.model = self.load_model(self.model_path)
91
  self.model.to(self.device)
92
  self.model.eval()
 
114
  return 'have_table' if preds.item() == 1 else 'no_table'
115
 
116
  if __name__ == "__main__":
117
+ model = TableDetector(model_name='protonx-models/table-detector', device='cpu')
118
 
119
  prediction = model.predict("images/document_page_01.png")
120
 
121
  print(prediction)
122
  ```
123
+
124
+ ## **Acknowledgments**
125
+
126
+ Thanks to:
127
+
128
+ * [DocLayNet](https://huggingface.co/datasets/docling-project/DocLayNet)