layout / MODEL_COMBINATION_GUIDE.md
hassanshka's picture
Add missing important files: _app_.py, utils/, CVAT_download/, manifest.json, and documentation
989ec3c

A newer version of the Gradio SDK is available: 6.5.1

Upgrade

Model Combination Guide

Overview

This guide explains how to combine predictions from three YOLO models to produce a unified COCO-format output with only the classes defined in coco_class_mapping.

The Three Models

1. best_emanuskript_segmentation.pt

  • Type: Segmentation model
  • Classes: 21 classes including:
    • Border, Table, Diagram, Music
    • Main script black/coloured
    • Variant script black/coloured
    • Plain initial (coloured/highlighted/black)
    • Historiated, Inhabited, Embellished
    • Page Number, Quire Mark, Running header, Catchword, Gloss, Illustrations

2. best_catmus.pt

  • Type: Segmentation model
  • Classes: 19 classes including:
    • DefaultLine, InterlinearLine
    • MainZone, MarginTextZone
    • DropCapitalZone, GraphicZone, MusicZone
    • NumberingZone, QuireMarksZone, RunningTitleZone
    • StampZone, TitlePageZone

3. best_zone_detection.pt

  • Type: Detection model
  • Classes: 11 zone classes:
    • MainZone, MarginTextZone
    • DropCapitalZone, GraphicZone, MusicZone
    • NumberingZone, QuireMarksZone, RunningTitleZone
    • StampZone, TitlePageZone, DigitizationArtefactZone

How It Works

Step 1: Run Model Predictions

Each model is run independently on the input image:

# Emanuskript model
emanuskript_results = model.predict(image_path, classes=[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,20])

# Catmus model  
catmus_results = model.predict(image_path, classes=[1,7])  # DefaultLine and InterlinearLine

# Zone model
zone_results = model.predict(image_path)  # All classes

Predictions are saved to JSON files in separate folders.

Step 2: Combine Predictions (ImageBatch Class)

The ImageBatch class (utils/image_batch_classes.py) handles:

  1. Loading Images: Loads the image and gets dimensions

  2. Loading Annotations: Loads predictions from all 3 JSON files

  3. Unifying Names: Maps class names using catmus_zones_mapping:

    • DefaultLine β†’ Main script black
    • InterlinearLine β†’ Gloss
    • MainZone β†’ Column
    • DropCapitalZone β†’ Plain initial- coloured
    • etc.
  4. Filtering Annotations:

    • Removes overlapping annotations based on spatial indexing
    • Uses overlap thresholds (0.3-0.8 depending on class)
    • Handles conflicts between different model predictions
  5. COCO Format Conversion: Converts to COCO JSON format

Step 3: Filter to coco_class_mapping

Only annotations with classes in coco_class_mapping are kept (25 classes total).

Key Functions

predict_annotations() (in utils/data.py)

  • Runs a single model on an image
  • Saves predictions to JSON
  • Used by Celery tasks for async processing

unify_predictions() (in utils/data.py)

  • Combines predictions from all three models
  • Uses ImageBatch to process and filter
  • Returns COCO format JSON
  • Imports annotations into database

ImageBatch class (in utils/image_batch_classes.py)

  • Main class for combining predictions
  • Methods:
    • load_images(): Load image files
    • load_annotations(): Load predictions from JSON files
    • unify_names(): Map class names to coco_class_mapping
    • filter_annotations(): Remove overlapping annotations
    • return_coco_file(): Generate COCO JSON

Usage Example

from ultralytics import YOLO
from utils.image_batch_classes import ImageBatch

# 1. Run models (or use predict_annotations function)
# ... save predictions to JSON files ...

# 2. Combine predictions
image_batch = ImageBatch(
    image_folder="path/to/images",
    catmus_labels_folder="path/to/catmus/predictions",
    emanuskript_labels_folder="path/to/emanuskript/predictions",
    zone_labels_folder="path/to/zone/predictions"
)

image_batch.load_images()
image_batch.load_annotations()
image_batch.unify_names()

# 3. Get COCO format
coco_json = image_batch.return_coco_file()

Running the Test Script

python3 test_combined_models.py

This will:

  1. Run all three models on bnf-naf-10039__page-001-of-004.jpg
  2. Combine and filter predictions
  3. Save results to combined_predictions.json
  4. Print a summary of detected classes

Output Format

The final output is a COCO-format JSON file with:

  • images: Image metadata (id, width, height, filename)
  • categories: List of category definitions (25 classes from coco_class_mapping)
  • annotations: List of annotations with:
    • id: Annotation ID
    • image_id: Associated image ID
    • category_id: Class ID from coco_class_mapping
    • segmentation: Polygon coordinates
    • bbox: Bounding box [x, y, width, height]
    • area: Polygon area

Class Mapping

The catmus_zones_mapping in image_batch_classes.py maps:

  • Catmus/Zone model classes β†’ coco_class_mapping classes
  • Example: DefaultLine β†’ Main script black
  • Example: MainZone β†’ Column

Only classes that map to coco_class_mapping are included in the final output.