Instructions to use phonsobon/mini-text-detection with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- ultralytics
How to use phonsobon/mini-text-detection with ultralytics:
# Couldn't find a valid YOLO version tag. # Replace XX with the correct version. from ultralytics import YOLOvXX model = YOLOvXX.from_pretrained("phonsobon/mini-text-detection") source = 'http://images.cocodataset.org/val2017/000000039769.jpg' model.predict(source=source, save=True) - Notebooks
- Google Colab
- Kaggle
| language: | |
| - km | |
| - en | |
| tags: | |
| - object-detection | |
| - text-detection | |
| - yolo | |
| - yolo11 | |
| - khmer | |
| - ultralytics | |
| - pytorch | |
| license: mit | |
| # mini-text-detection β Khmer & English Text Detection | |
| A **YOLO11n**-based text detection model fine-tuned to locate and classify text regions in images containing **Khmer and English** content. | |
| It detects 3 types of text blocks and can be used as the first stage before passing crops to an OCR model (e.g. [phonsobon/mini-ocr](https://huggingface.co/phonsobon/mini-ocr)). | |
| --- | |
| ## Model Details | |
| | Property | Value | | |
| |----------|-------| | |
| | Architecture | YOLO11n (nano) | | |
| | Task | Object Detection β 3 classes | | |
| | Weights file | `khmer-text-detection-mini.pt` | | |
| | Framework | Ultralytics / PyTorch | | |
| | Input | RGB image, any size (auto-resized internally) | | |
| --- | |
| ## Classes | |
| | ID | Name | Khmer | Description | | |
| |----|------|-------|-------------| | |
| | `0` | `subject` | ααααααααα» | Title or subject heading | | |
| | `1` | `reference` | ααα | Reference or citation | | |
| | `2` | `content` | α’ααααα | Main body / paragraph text | | |
| --- | |
| ## Files | |
| | File | Description | | |
| |------|-------------| | |
| | `khmer-text-detection-mini.pt` | Full Ultralytics YOLO model (weights + config) | | |
| --- | |
| ## Quick Start | |
| ### Install dependencies | |
| ```bash | |
| pip install ultralytics huggingface_hub | |
| ``` | |
| ### Run inference | |
| ```python | |
| from ultralytics import YOLO | |
| from huggingface_hub import hf_hub_download | |
| # ββ Download model ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| model_path = hf_hub_download( | |
| repo_id="phonsobon/mini-text-detection", | |
| filename="khmer-text-detection-mini.pt", | |
| ) | |
| # ββ Class names βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| CLASS_NAMES = {0: "subject", 1: "reference", 2: "content"} | |
| # ββ Load & predict ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| model = YOLO(model_path) | |
| results = model.predict( | |
| source="your_image.jpg", # path, URL, or numpy array | |
| conf=0.25, # confidence threshold | |
| iou=0.45, # NMS IoU threshold | |
| imgsz=640, | |
| ) | |
| # ββ Print results βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| for r in results: | |
| r.show() # display with bounding boxes | |
| for box in r.boxes: | |
| cls_id = int(box.cls) | |
| label = CLASS_NAMES[cls_id] | |
| conf = float(box.conf) | |
| x1, y1, x2, y2 = box.xyxy[0].tolist() | |
| print(f"[{label}] conf={conf:.2f} box=({x1:.0f},{y1:.0f},{x2:.0f},{y2:.0f})") | |
| ``` | |
| ### Filter by class | |
| ```python | |
| # Get only subject (heading) boxes | |
| subject_boxes = [b for b in results[0].boxes if int(b.cls) == 0] | |
| # Get only content (body) boxes | |
| content_boxes = [b for b in results[0].boxes if int(b.cls) == 2] | |
| ``` | |
| ### Save annotated images | |
| ```python | |
| results = model.predict(source="your_image.jpg", save=True, project="runs/detect") | |
| # Saved to runs/detect/predict/ | |
| ``` | |
| ### Batch inference on a folder | |
| ```python | |
| results = model.predict(source="path/to/images/", conf=0.25, imgsz=640) | |
| for r in results: | |
| counts = {name: 0 for name in CLASS_NAMES.values()} | |
| for box in r.boxes: | |
| counts[CLASS_NAMES[int(box.cls)]] += 1 | |
| print(r.path, "β", counts) | |
| ``` | |
| --- | |
| ## Crop + OCR Pipeline | |
| Combine this model with [phonsobon/mini-ocr](https://huggingface.co/phonsobon/mini-ocr) for full end-to-end document reading, with each region labelled by type: | |
| ```python | |
| from ultralytics import YOLO | |
| from huggingface_hub import hf_hub_download | |
| from PIL import Image | |
| CLASS_NAMES = {0: "subject", 1: "reference", 2: "content"} | |
| # ββ Load detection model ββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| det_path = hf_hub_download("phonsobon/mini-text-detection", "khmer-text-detection-mini.pt") | |
| detector = YOLO(det_path) | |
| # ββ Detect text regions βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| image_path = "your_image.jpg" | |
| results = detector.predict(source=image_path, conf=0.25, imgsz=640) | |
| img = Image.open(image_path).convert("RGB") | |
| # ββ Crop each region sorted by class βββββββββββββββββββββββββββββββββββββββββ | |
| for i, box in enumerate(results[0].boxes): | |
| cls_id = int(box.cls) | |
| label = CLASS_NAMES[cls_id] | |
| x1,y1,x2,y2 = map(int, box.xyxy[0].tolist()) | |
| crop = img.crop((x1, y1, x2, y2)) | |
| crop.save(f"crop_{i}_{label}.png") | |
| print(f"Saved crop {i} β class: {label}") | |
| # β feed each crop to phonsobon/mini-ocr for text recognition | |
| ``` | |
| --- | |
| ## Input Tips | |
| - Works on **any image size** β YOLO resizes internally to 640 px by default. | |
| - Best results on **document photos, screenshots, and scanned pages**. | |
| - Adjust `conf` (0.1 β 0.5) to trade recall vs. precision depending on your use case. | |
| --- | |
| ## Limitations | |
| - May miss very small text (< ~8 px height in the original image). | |
| - Not designed for handwritten or heavily stylised/artistic fonts. | |
| - Performance is best on document-style layouts similar to training data. | |
| --- | |
| ## Related Model | |
| | Model | Task | | |
| |-------|------| | |
| | [phonsobon/mini-ocr](https://huggingface.co/phonsobon/mini-ocr) | Text recognition (CRNN + CTC) for Khmer & English | | |
| --- | |
| ## License | |
| MIT |