phonsobon commited on
Commit
b7ff647
Β·
verified Β·
1 Parent(s): 231bc18

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +188 -0
README.md ADDED
@@ -0,0 +1,188 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - km
4
+ - en
5
+ tags:
6
+ - object-detection
7
+ - text-detection
8
+ - yolo
9
+ - yolo11
10
+ - khmer
11
+ - ultralytics
12
+ - pytorch
13
+ license: mit
14
+ ---
15
+
16
+ # mini-text-detection β€” Khmer & English Text Detection
17
+
18
+ A **YOLO11n**-based text detection model fine-tuned to locate and classify text regions in images containing **Khmer and English** content.
19
+ It detects 3 types of text blocks and can be used as the first stage before passing crops to an OCR model (e.g. [phonsobon/mini-ocr](https://huggingface.co/phonsobon/mini-ocr)).
20
+
21
+ ---
22
+
23
+ ## Model Details
24
+
25
+ | Property | Value |
26
+ |----------|-------|
27
+ | Architecture | YOLO11n (nano) |
28
+ | Task | Object Detection β€” 3 classes |
29
+ | Weights file | `khmer-text-detection-mini.pt` |
30
+ | Framework | Ultralytics / PyTorch |
31
+ | Input | RGB image, any size (auto-resized internally) |
32
+
33
+ ---
34
+
35
+ ## Classes
36
+
37
+ | ID | Name | Description |
38
+ |----|------|-------------|
39
+ | `0` | `subject` | Title or heading text |
40
+ | `1` | `reference` | Reference, label, or metadata text |
41
+ | `2` | `content` | Main body / paragraph text |
42
+
43
+ ---
44
+
45
+ ## Files
46
+
47
+ | File | Description |
48
+ |------|-------------|
49
+ | `khmer-text-detection-mini.pt` | Full Ultralytics YOLO model (weights + config) |
50
+
51
+ ---
52
+
53
+ ## Quick Start
54
+
55
+ ### Install dependencies
56
+
57
+ ```bash
58
+ pip install ultralytics huggingface_hub
59
+ ```
60
+
61
+ ### Run inference
62
+
63
+ ```python
64
+ from ultralytics import YOLO
65
+ from huggingface_hub import hf_hub_download
66
+
67
+ # ── Download model ────────────────────────────────────────────────────────────
68
+ model_path = hf_hub_download(
69
+ repo_id="phonsobon/mini-text-detection",
70
+ filename="khmer-text-detection-mini.pt",
71
+ )
72
+
73
+ # ── Class names ───────────────────────────────────────────────────────────────
74
+ CLASS_NAMES = {0: "subject", 1: "reference", 2: "content"}
75
+
76
+ # ── Load & predict ────────────────────────────────────────────────────────────
77
+ model = YOLO(model_path)
78
+
79
+ results = model.predict(
80
+ source="your_image.jpg", # path, URL, or numpy array
81
+ conf=0.25, # confidence threshold
82
+ iou=0.45, # NMS IoU threshold
83
+ imgsz=640,
84
+ )
85
+
86
+ # ── Print results ─────────────────────────────────────────────────────────────
87
+ for r in results:
88
+ r.show() # display with bounding boxes
89
+ for box in r.boxes:
90
+ cls_id = int(box.cls)
91
+ label = CLASS_NAMES[cls_id]
92
+ conf = float(box.conf)
93
+ x1, y1, x2, y2 = box.xyxy[0].tolist()
94
+ print(f"[{label}] conf={conf:.2f} box=({x1:.0f},{y1:.0f},{x2:.0f},{y2:.0f})")
95
+ ```
96
+
97
+ ### Filter by class
98
+
99
+ ```python
100
+ # Get only subject (heading) boxes
101
+ subject_boxes = [b for b in results[0].boxes if int(b.cls) == 0]
102
+
103
+ # Get only content (body) boxes
104
+ content_boxes = [b for b in results[0].boxes if int(b.cls) == 2]
105
+ ```
106
+
107
+ ### Save annotated images
108
+
109
+ ```python
110
+ results = model.predict(source="your_image.jpg", save=True, project="runs/detect")
111
+ # Saved to runs/detect/predict/
112
+ ```
113
+
114
+ ### Batch inference on a folder
115
+
116
+ ```python
117
+ results = model.predict(source="path/to/images/", conf=0.25, imgsz=640)
118
+ for r in results:
119
+ counts = {name: 0 for name in CLASS_NAMES.values()}
120
+ for box in r.boxes:
121
+ counts[CLASS_NAMES[int(box.cls)]] += 1
122
+ print(r.path, "β†’", counts)
123
+ ```
124
+
125
+ ---
126
+
127
+ ## Crop + OCR Pipeline
128
+
129
+ Combine this model with [phonsobon/mini-ocr](https://huggingface.co/phonsobon/mini-ocr) for full end-to-end document reading, with each region labelled by type:
130
+
131
+ ```python
132
+ from ultralytics import YOLO
133
+ from huggingface_hub import hf_hub_download
134
+ from PIL import Image
135
+
136
+ CLASS_NAMES = {0: "subject", 1: "reference", 2: "content"}
137
+
138
+ # ── Load detection model ──────────────────────────────────────────────────────
139
+ det_path = hf_hub_download("phonsobon/mini-text-detection", "khmer-text-detection-mini.pt")
140
+ detector = YOLO(det_path)
141
+
142
+ # ── Detect text regions ───────────────────────────────────────────────────────
143
+ image_path = "your_image.jpg"
144
+ results = detector.predict(source=image_path, conf=0.25, imgsz=640)
145
+
146
+ img = Image.open(image_path).convert("RGB")
147
+
148
+ # ── Crop each region sorted by class ─────────────────────────────────────────
149
+ for i, box in enumerate(results[0].boxes):
150
+ cls_id = int(box.cls)
151
+ label = CLASS_NAMES[cls_id]
152
+ x1,y1,x2,y2 = map(int, box.xyxy[0].tolist())
153
+
154
+ crop = img.crop((x1, y1, x2, y2))
155
+ crop.save(f"crop_{i}_{label}.png")
156
+ print(f"Saved crop {i} β†’ class: {label}")
157
+ # β†’ feed each crop to phonsobon/mini-ocr for text recognition
158
+ ```
159
+
160
+ ---
161
+
162
+ ## Input Tips
163
+
164
+ - Works on **any image size** β€” YOLO resizes internally to 640 px by default.
165
+ - Best results on **document photos, screenshots, and scanned pages**.
166
+ - Adjust `conf` (0.1 – 0.5) to trade recall vs. precision depending on your use case.
167
+
168
+ ---
169
+
170
+ ## Limitations
171
+
172
+ - May miss very small text (< ~8 px height in the original image).
173
+ - Not designed for handwritten or heavily stylised/artistic fonts.
174
+ - Performance is best on document-style layouts similar to training data.
175
+
176
+ ---
177
+
178
+ ## Related Model
179
+
180
+ | Model | Task |
181
+ |-------|------|
182
+ | [phonsobon/mini-ocr](https://huggingface.co/phonsobon/mini-ocr) | Text recognition (CRNN + CTC) for Khmer & English |
183
+
184
+ ---
185
+
186
+ ## License
187
+
188
+ MIT