Instructions to use phonsobon/mini-text-detection with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- ultralytics
How to use phonsobon/mini-text-detection with ultralytics:
# Couldn't find a valid YOLO version tag. # Replace XX with the correct version. from ultralytics import YOLOvXX model = YOLOvXX.from_pretrained("phonsobon/mini-text-detection") source = 'http://images.cocodataset.org/val2017/000000039769.jpg' model.predict(source=source, save=True) - Notebooks
- Google Colab
- Kaggle
File size: 5,915 Bytes
231bc18 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 | ---
language:
- km
- en
tags:
- object-detection
- text-detection
- yolo
- yolo11
- khmer
- ultralytics
- pytorch
license: mit
---
# mini-text-detection β Khmer & English Text Detection
A **YOLO11n**-based text detection model fine-tuned to locate and classify text regions in images containing **Khmer and English** content.
It detects 3 types of text blocks and can be used as the first stage before passing crops to an OCR model (e.g. [phonsobon/mini-ocr](https://huggingface.co/phonsobon/mini-ocr)).
---
## Model Details
| Property | Value |
|----------|-------|
| Architecture | YOLO11n (nano) |
| Task | Object Detection β 3 classes |
| Weights file | `khmer-text-detection-mini.pt` |
| Framework | Ultralytics / PyTorch |
| Input | RGB image, any size (auto-resized internally) |
---
## Classes
| ID | Name | Description |
|----|------|-------------|
| `0` | `subject` | Title or heading text |
| `1` | `reference` | Reference, label, or metadata text |
| `2` | `content` | Main body / paragraph text |
---
## Files
| File | Description |
|------|-------------|
| `khmer-text-detection-mini.pt` | Full Ultralytics YOLO model (weights + config) |
---
## Quick Start
### Install dependencies
```bash
pip install ultralytics huggingface_hub
```
### Run inference
```python
from ultralytics import YOLO
from huggingface_hub import hf_hub_download
# ββ Download model ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
model_path = hf_hub_download(
repo_id="phonsobon/mini-text-detection",
filename="khmer-text-detection-mini.pt",
)
# ββ Class names βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
CLASS_NAMES = {0: "subject", 1: "reference", 2: "content"}
# ββ Load & predict ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
model = YOLO(model_path)
results = model.predict(
source="your_image.jpg", # path, URL, or numpy array
conf=0.25, # confidence threshold
iou=0.45, # NMS IoU threshold
imgsz=640,
)
# ββ Print results βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
for r in results:
r.show() # display with bounding boxes
for box in r.boxes:
cls_id = int(box.cls)
label = CLASS_NAMES[cls_id]
conf = float(box.conf)
x1, y1, x2, y2 = box.xyxy[0].tolist()
print(f"[{label}] conf={conf:.2f} box=({x1:.0f},{y1:.0f},{x2:.0f},{y2:.0f})")
```
### Filter by class
```python
# Get only subject (heading) boxes
subject_boxes = [b for b in results[0].boxes if int(b.cls) == 0]
# Get only content (body) boxes
content_boxes = [b for b in results[0].boxes if int(b.cls) == 2]
```
### Save annotated images
```python
results = model.predict(source="your_image.jpg", save=True, project="runs/detect")
# Saved to runs/detect/predict/
```
### Batch inference on a folder
```python
results = model.predict(source="path/to/images/", conf=0.25, imgsz=640)
for r in results:
counts = {name: 0 for name in CLASS_NAMES.values()}
for box in r.boxes:
counts[CLASS_NAMES[int(box.cls)]] += 1
print(r.path, "β", counts)
```
---
## Crop + OCR Pipeline
Combine this model with [phonsobon/mini-ocr](https://huggingface.co/phonsobon/mini-ocr) for full end-to-end document reading, with each region labelled by type:
```python
from ultralytics import YOLO
from huggingface_hub import hf_hub_download
from PIL import Image
CLASS_NAMES = {0: "subject", 1: "reference", 2: "content"}
# ββ Load detection model ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
det_path = hf_hub_download("phonsobon/mini-text-detection", "khmer-text-detection-mini.pt")
detector = YOLO(det_path)
# ββ Detect text regions βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
image_path = "your_image.jpg"
results = detector.predict(source=image_path, conf=0.25, imgsz=640)
img = Image.open(image_path).convert("RGB")
# ββ Crop each region sorted by class βββββββββββββββββββββββββββββββββββββββββ
for i, box in enumerate(results[0].boxes):
cls_id = int(box.cls)
label = CLASS_NAMES[cls_id]
x1,y1,x2,y2 = map(int, box.xyxy[0].tolist())
crop = img.crop((x1, y1, x2, y2))
crop.save(f"crop_{i}_{label}.png")
print(f"Saved crop {i} β class: {label}")
# β feed each crop to phonsobon/mini-ocr for text recognition
```
---
## Input Tips
- Works on **any image size** β YOLO resizes internally to 640 px by default.
- Best results on **document photos, screenshots, and scanned pages**.
- Adjust `conf` (0.1 β 0.5) to trade recall vs. precision depending on your use case.
---
## Limitations
- May miss very small text (< ~8 px height in the original image).
- Not designed for handwritten or heavily stylised/artistic fonts.
- Performance is best on document-style layouts similar to training data.
---
## Related Model
| Model | Task |
|-------|------|
| [phonsobon/mini-ocr](https://huggingface.co/phonsobon/mini-ocr) | Text recognition (CRNN + CTC) for Khmer & English |
---
## License
MIT
|