Spaces:

hassanshka
/

layout

Runtime error

File size: 4,942 Bytes

989ec3c

# Model Combination Guide

## Overview

This guide explains how to combine predictions from three YOLO models to produce a unified COCO-format output with only the classes defined in `coco_class_mapping`.

## The Three Models

### 1. **best_emanuskript_segmentation.pt**
- **Type**: Segmentation model
- **Classes**: 21 classes including:
  - Border, Table, Diagram, Music
  - Main script black/coloured
  - Variant script black/coloured
  - Plain initial (coloured/highlighted/black)
  - Historiated, Inhabited, Embellished
  - Page Number, Quire Mark, Running header, Catchword, Gloss, Illustrations

### 2. **best_catmus.pt**
- **Type**: Segmentation model
- **Classes**: 19 classes including:
  - DefaultLine, InterlinearLine
  - MainZone, MarginTextZone
  - DropCapitalZone, GraphicZone, MusicZone
  - NumberingZone, QuireMarksZone, RunningTitleZone
  - StampZone, TitlePageZone

### 3. **best_zone_detection.pt**
- **Type**: Detection model
- **Classes**: 11 zone classes:
  - MainZone, MarginTextZone
  - DropCapitalZone, GraphicZone, MusicZone
  - NumberingZone, QuireMarksZone, RunningTitleZone
  - StampZone, TitlePageZone, DigitizationArtefactZone

## How It Works

### Step 1: Run Model Predictions
Each model is run independently on the input image:
```python
# Emanuskript model
emanuskript_results = model.predict(image_path, classes=[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,20])

# Catmus model  
catmus_results = model.predict(image_path, classes=[1,7])  # DefaultLine and InterlinearLine

# Zone model
zone_results = model.predict(image_path)  # All classes
```

Predictions are saved to JSON files in separate folders.

### Step 2: Combine Predictions (ImageBatch Class)

The `ImageBatch` class (`utils/image_batch_classes.py`) handles:

1. **Loading Images**: Loads the image and gets dimensions
2. **Loading Annotations**: Loads predictions from all 3 JSON files
3. **Unifying Names**: Maps class names using `catmus_zones_mapping`:
   - `DefaultLine` → `Main script black`
   - `InterlinearLine` → `Gloss`
   - `MainZone` → `Column`
   - `DropCapitalZone` → `Plain initial- coloured`
   - etc.

4. **Filtering Annotations**: 
   - Removes overlapping annotations based on spatial indexing
   - Uses overlap thresholds (0.3-0.8 depending on class)
   - Handles conflicts between different model predictions

5. **COCO Format Conversion**: Converts to COCO JSON format

### Step 3: Filter to coco_class_mapping

Only annotations with classes in `coco_class_mapping` are kept (25 classes total).

## Key Functions

### `predict_annotations()` (in `utils/data.py`)
- Runs a single model on an image
- Saves predictions to JSON
- Used by Celery tasks for async processing

### `unify_predictions()` (in `utils/data.py`)
- Combines predictions from all three models
- Uses `ImageBatch` to process and filter
- Returns COCO format JSON
- Imports annotations into database

### `ImageBatch` class (in `utils/image_batch_classes.py`)
- Main class for combining predictions
- Methods:
  - `load_images()`: Load image files
  - `load_annotations()`: Load predictions from JSON files
  - `unify_names()`: Map class names to coco_class_mapping
  - `filter_annotations()`: Remove overlapping annotations
  - `return_coco_file()`: Generate COCO JSON

## Usage Example

```python
from ultralytics import YOLO
from utils.image_batch_classes import ImageBatch

# 1. Run models (or use predict_annotations function)
# ... save predictions to JSON files ...

# 2. Combine predictions
image_batch = ImageBatch(
    image_folder="path/to/images",
    catmus_labels_folder="path/to/catmus/predictions",
    emanuskript_labels_folder="path/to/emanuskript/predictions",
    zone_labels_folder="path/to/zone/predictions"
)

image_batch.load_images()
image_batch.load_annotations()
image_batch.unify_names()

# 3. Get COCO format
coco_json = image_batch.return_coco_file()
```

## Running the Test Script

```bash
python3 test_combined_models.py
```

This will:
1. Run all three models on `bnf-naf-10039__page-001-of-004.jpg`
2. Combine and filter predictions
3. Save results to `combined_predictions.json`
4. Print a summary of detected classes

## Output Format

The final output is a COCO-format JSON file with:
- **images**: Image metadata (id, width, height, filename)
- **categories**: List of category definitions (25 classes from coco_class_mapping)
- **annotations**: List of annotations with:
  - `id`: Annotation ID
  - `image_id`: Associated image ID
  - `category_id`: Class ID from coco_class_mapping
  - `segmentation`: Polygon coordinates
  - `bbox`: Bounding box [x, y, width, height]
  - `area`: Polygon area

## Class Mapping

The `catmus_zones_mapping` in `image_batch_classes.py` maps:
- Catmus/Zone model classes → coco_class_mapping classes
- Example: `DefaultLine` → `Main script black`
- Example: `MainZone` → `Column`

Only classes that map to `coco_class_mapping` are included in the final output.