phonsobon
/

mini-text-detection

+---
+language:
+  - km
+  - en
+tags:
+  - object-detection
+  - text-detection
+  - yolo
+  - yolo11
+  - khmer
+  - ultralytics
+  - pytorch
+license: mit
+---
+# mini-text-detection — Khmer & English Text Detection
+A **YOLO11n**-based text detection model fine-tuned to locate and classify text regions in images containing **Khmer and English** content.
+It detects 3 types of text blocks and can be used as the first stage before passing crops to an OCR model (e.g. [phonsobon/mini-ocr](https://huggingface.co/phonsobon/mini-ocr)).
+---
+## Model Details
+| Property | Value |
+|----------|-------|
+| Architecture | YOLO11n (nano) |
+| Task | Object Detection — 3 classes |
+| Weights file | `khmer-text-detection-mini.pt` |
+| Framework | Ultralytics / PyTorch |
+| Input | RGB image, any size (auto-resized internally) |
+---
+## Classes
+| ID | Name | Description |
+|----|------|-------------|
+| `0` | `subject` | Title or heading text |
+| `1` | `reference` | Reference, label, or metadata text |
+| `2` | `content` | Main body / paragraph text |
+---
+## Files
+| File | Description |
+|------|-------------|
+| `khmer-text-detection-mini.pt` | Full Ultralytics YOLO model (weights + config) |
+---
+## Quick Start
+### Install dependencies
+```bash
+pip install ultralytics huggingface_hub
+```
+### Run inference
+```python
+from ultralytics import YOLO
+from huggingface_hub import hf_hub_download
+# ── Download model ────────────────────────────────────────────────────────────
+model_path = hf_hub_download(
+    repo_id="phonsobon/mini-text-detection",
+    filename="khmer-text-detection-mini.pt",
+)
+# ── Class names ───────────────────────────────────────────────────────────────
+CLASS_NAMES = {0: "subject", 1: "reference", 2: "content"}
+# ── Load & predict ────────────────────────────────────────────────────────────
+model = YOLO(model_path)
+results = model.predict(
+    source="your_image.jpg",   # path, URL, or numpy array
+    conf=0.25,                 # confidence threshold
+    iou=0.45,                  # NMS IoU threshold
+    imgsz=640,
+)
+# ── Print results ─────────────────────────────────────────────────────────────
+for r in results:
+    r.show()                                        # display with bounding boxes
+    for box in r.boxes:
+        cls_id = int(box.cls)
+        label  = CLASS_NAMES[cls_id]
+        conf   = float(box.conf)
+        x1, y1, x2, y2 = box.xyxy[0].tolist()
+        print(f"[{label}] conf={conf:.2f}  box=({x1:.0f},{y1:.0f},{x2:.0f},{y2:.0f})")
+```
+### Filter by class
+```python
+# Get only subject (heading) boxes
+subject_boxes = [b for b in results[0].boxes if int(b.cls) == 0]
+# Get only content (body) boxes
+content_boxes = [b for b in results[0].boxes if int(b.cls) == 2]
+```
+### Save annotated images
+```python
+results = model.predict(source="your_image.jpg", save=True, project="runs/detect")
+# Saved to runs/detect/predict/
+```
+### Batch inference on a folder
+```python
+results = model.predict(source="path/to/images/", conf=0.25, imgsz=640)
+for r in results:
+    counts = {name: 0 for name in CLASS_NAMES.values()}
+    for box in r.boxes:
+        counts[CLASS_NAMES[int(box.cls)]] += 1
+    print(r.path, "→", counts)
+```
+---
+## Crop + OCR Pipeline
+Combine this model with [phonsobon/mini-ocr](https://huggingface.co/phonsobon/mini-ocr) for full end-to-end document reading, with each region labelled by type:
+```python
+from ultralytics import YOLO
+from huggingface_hub import hf_hub_download
+from PIL import Image
+CLASS_NAMES = {0: "subject", 1: "reference", 2: "content"}
+# ── Load detection model ──────────────────────────────────────────────────────
+det_path = hf_hub_download("phonsobon/mini-text-detection", "khmer-text-detection-mini.pt")
+detector = YOLO(det_path)
+# ── Detect text regions ───────────────────────────────────────────────────────
+image_path = "your_image.jpg"
+results = detector.predict(source=image_path, conf=0.25, imgsz=640)
+img = Image.open(image_path).convert("RGB")
+# ── Crop each region sorted by class ─────────────────────────────────────────
+for i, box in enumerate(results[0].boxes):
+    cls_id        = int(box.cls)
+    label         = CLASS_NAMES[cls_id]
+    x1,y1,x2,y2  = map(int, box.xyxy[0].tolist())
+    crop = img.crop((x1, y1, x2, y2))
+    crop.save(f"crop_{i}_{label}.png")
+    print(f"Saved crop {i} → class: {label}")
+    # → feed each crop to phonsobon/mini-ocr for text recognition
+```
+---
+## Input Tips
+- Works on **any image size** — YOLO resizes internally to 640 px by default.
+- Best results on **document photos, screenshots, and scanned pages**.
+- Adjust `conf` (0.1 – 0.5) to trade recall vs. precision depending on your use case.
+---
+## Limitations
+- May miss very small text (< ~8 px height in the original image).
+- Not designed for handwritten or heavily stylised/artistic fonts.
+- Performance is best on document-style layouts similar to training data.
+---
+## Related Model
+| Model | Task |
+|-------|------|
+| [phonsobon/mini-ocr](https://huggingface.co/phonsobon/mini-ocr) | Text recognition (CRNN + CTC) for Khmer & English |
+---
+## License
+MIT