Update README.md

ac2d595 verified 7 days ago

5.98 kB

	---
	language:
	- km
	- en
	tags:
	- object-detection
	- text-detection
	- yolo
	- yolo11
	- khmer
	- ultralytics
	- pytorch
	license: mit
	---

	# mini-text-detection — Khmer & English Text Detection

	A YOLO11n-based text detection model fine-tuned to locate and classify text regions in images containing Khmer and English content.
	It detects 3 types of text blocks and can be used as the first stage before passing crops to an OCR model (e.g. [phonsobon/mini-ocr](https://huggingface.co/phonsobon/mini-ocr)).

	---

	## Model Details

	\| Property \| Value \|
	\|----------\|-------\|
	\| Architecture \| YOLO11n (nano) \|
	\| Task \| Object Detection — 3 classes \|
	\| Weights file \| `khmer-text-detection-mini.pt` \|
	\| Framework \| Ultralytics / PyTorch \|
	\| Input \| RGB image, any size (auto-resized internally) \|

	---

	## Classes

	\| ID \| Name \| Khmer \| Description \|
	\|----\|------\|-------\|-------------\|
	\| `0` \| `subject` \| កម្មវត្ថុ \| Title or subject heading \|
	\| `1` \| `reference` \| យោង \| Reference or citation \|
	\| `2` \| `content` \| អត្ថបទ \| Main body / paragraph text \|

	---

	## Files

	\| File \| Description \|
	\|------\|-------------\|
	\| `khmer-text-detection-mini.pt` \| Full Ultralytics YOLO model (weights + config) \|

	---

	## Quick Start
	### Install dependencies

	```bash
	pip install ultralytics huggingface_hub
	```

	### Run inference

	```python
	from ultralytics import YOLO
	from huggingface_hub import hf_hub_download

	# ── Download model ────────────────────────────────────────────────────────────
	model_path = hf_hub_download(
	repo_id="phonsobon/mini-text-detection",
	filename="khmer-text-detection-mini.pt",
	)

	# ── Class names ───────────────────────────────────────────────────────────────
	CLASS_NAMES = {0: "subject", 1: "reference", 2: "content"}

	# ── Load & predict ────────────────────────────────────────────────────────────
	model = YOLO(model_path)

	results = model.predict(
	source="your_image.jpg", # path, URL, or numpy array
	conf=0.25, # confidence threshold
	iou=0.45, # NMS IoU threshold
	imgsz=640,
	)

	# ── Print results ─────────────────────────────────────────────────────────────
	for r in results:
	r.show() # display with bounding boxes
	for box in r.boxes:
	cls_id = int(box.cls)
	label = CLASS_NAMES[cls_id]
	conf = float(box.conf)
	x1, y1, x2, y2 = box.xyxy[0].tolist()
	print(f"[{label}] conf={conf:.2f} box=({x1:.0f},{y1:.0f},{x2:.0f},{y2:.0f})")
	```

	### Filter by class

	```python
	# Get only subject (heading) boxes
	subject_boxes = [b for b in results[0].boxes if int(b.cls) == 0]

	# Get only content (body) boxes
	content_boxes = [b for b in results[0].boxes if int(b.cls) == 2]
	```

	### Save annotated images

	```python
	results = model.predict(source="your_image.jpg", save=True, project="runs/detect")
	# Saved to runs/detect/predict/
	```

	### Batch inference on a folder

	```python
	results = model.predict(source="path/to/images/", conf=0.25, imgsz=640)
	for r in results:
	counts = {name: 0 for name in CLASS_NAMES.values()}
	for box in r.boxes:
	counts[CLASS_NAMES[int(box.cls)]] += 1
	print(r.path, "→", counts)
	```

	---

	## Crop + OCR Pipeline

	Combine this model with [phonsobon/mini-ocr](https://huggingface.co/phonsobon/mini-ocr) for full end-to-end document reading, with each region labelled by type:

	```python
	from ultralytics import YOLO
	from huggingface_hub import hf_hub_download
	from PIL import Image

	CLASS_NAMES = {0: "subject", 1: "reference", 2: "content"}

	# ── Load detection model ──────────────────────────────────────────────────────
	det_path = hf_hub_download("phonsobon/mini-text-detection", "khmer-text-detection-mini.pt")
	detector = YOLO(det_path)

	# ── Detect text regions ───────────────────────────────────────────────────────
	image_path = "your_image.jpg"
	results = detector.predict(source=image_path, conf=0.25, imgsz=640)

	img = Image.open(image_path).convert("RGB")

	# ── Crop each region sorted by class ─────────────────────────────────────────
	for i, box in enumerate(results[0].boxes):
	cls_id = int(box.cls)
	label = CLASS_NAMES[cls_id]
	x1,y1,x2,y2 = map(int, box.xyxy[0].tolist())

	crop = img.crop((x1, y1, x2, y2))
	crop.save(f"crop_{i}_{label}.png")
	print(f"Saved crop {i} → class: {label}")
	# → feed each crop to phonsobon/mini-ocr for text recognition
	```

	---

	## Input Tips

	- Works on any image size — YOLO resizes internally to 640 px by default.
	- Best results on document photos, screenshots, and scanned pages.
	- Adjust `conf` (0.1 – 0.5) to trade recall vs. precision depending on your use case.

	---

	## Limitations

	- May miss very small text (< ~8 px height in the original image).
	- Not designed for handwritten or heavily stylised/artistic fonts.
	- Performance is best on document-style layouts similar to training data.

	---

	## Related Model

	\| Model \| Task \|
	\|-------\|------\|
	\| [phonsobon/mini-ocr](https://huggingface.co/phonsobon/mini-ocr) \| Text recognition (CRNN + CTC) for Khmer & English \|

	---

	## License

	MIT