Spaces:
Runtime error
Runtime error
File size: 4,942 Bytes
989ec3c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 |
# Model Combination Guide
## Overview
This guide explains how to combine predictions from three YOLO models to produce a unified COCO-format output with only the classes defined in `coco_class_mapping`.
## The Three Models
### 1. **best_emanuskript_segmentation.pt**
- **Type**: Segmentation model
- **Classes**: 21 classes including:
- Border, Table, Diagram, Music
- Main script black/coloured
- Variant script black/coloured
- Plain initial (coloured/highlighted/black)
- Historiated, Inhabited, Embellished
- Page Number, Quire Mark, Running header, Catchword, Gloss, Illustrations
### 2. **best_catmus.pt**
- **Type**: Segmentation model
- **Classes**: 19 classes including:
- DefaultLine, InterlinearLine
- MainZone, MarginTextZone
- DropCapitalZone, GraphicZone, MusicZone
- NumberingZone, QuireMarksZone, RunningTitleZone
- StampZone, TitlePageZone
### 3. **best_zone_detection.pt**
- **Type**: Detection model
- **Classes**: 11 zone classes:
- MainZone, MarginTextZone
- DropCapitalZone, GraphicZone, MusicZone
- NumberingZone, QuireMarksZone, RunningTitleZone
- StampZone, TitlePageZone, DigitizationArtefactZone
## How It Works
### Step 1: Run Model Predictions
Each model is run independently on the input image:
```python
# Emanuskript model
emanuskript_results = model.predict(image_path, classes=[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,20])
# Catmus model
catmus_results = model.predict(image_path, classes=[1,7]) # DefaultLine and InterlinearLine
# Zone model
zone_results = model.predict(image_path) # All classes
```
Predictions are saved to JSON files in separate folders.
### Step 2: Combine Predictions (ImageBatch Class)
The `ImageBatch` class (`utils/image_batch_classes.py`) handles:
1. **Loading Images**: Loads the image and gets dimensions
2. **Loading Annotations**: Loads predictions from all 3 JSON files
3. **Unifying Names**: Maps class names using `catmus_zones_mapping`:
- `DefaultLine` β `Main script black`
- `InterlinearLine` β `Gloss`
- `MainZone` β `Column`
- `DropCapitalZone` β `Plain initial- coloured`
- etc.
4. **Filtering Annotations**:
- Removes overlapping annotations based on spatial indexing
- Uses overlap thresholds (0.3-0.8 depending on class)
- Handles conflicts between different model predictions
5. **COCO Format Conversion**: Converts to COCO JSON format
### Step 3: Filter to coco_class_mapping
Only annotations with classes in `coco_class_mapping` are kept (25 classes total).
## Key Functions
### `predict_annotations()` (in `utils/data.py`)
- Runs a single model on an image
- Saves predictions to JSON
- Used by Celery tasks for async processing
### `unify_predictions()` (in `utils/data.py`)
- Combines predictions from all three models
- Uses `ImageBatch` to process and filter
- Returns COCO format JSON
- Imports annotations into database
### `ImageBatch` class (in `utils/image_batch_classes.py`)
- Main class for combining predictions
- Methods:
- `load_images()`: Load image files
- `load_annotations()`: Load predictions from JSON files
- `unify_names()`: Map class names to coco_class_mapping
- `filter_annotations()`: Remove overlapping annotations
- `return_coco_file()`: Generate COCO JSON
## Usage Example
```python
from ultralytics import YOLO
from utils.image_batch_classes import ImageBatch
# 1. Run models (or use predict_annotations function)
# ... save predictions to JSON files ...
# 2. Combine predictions
image_batch = ImageBatch(
image_folder="path/to/images",
catmus_labels_folder="path/to/catmus/predictions",
emanuskript_labels_folder="path/to/emanuskript/predictions",
zone_labels_folder="path/to/zone/predictions"
)
image_batch.load_images()
image_batch.load_annotations()
image_batch.unify_names()
# 3. Get COCO format
coco_json = image_batch.return_coco_file()
```
## Running the Test Script
```bash
python3 test_combined_models.py
```
This will:
1. Run all three models on `bnf-naf-10039__page-001-of-004.jpg`
2. Combine and filter predictions
3. Save results to `combined_predictions.json`
4. Print a summary of detected classes
## Output Format
The final output is a COCO-format JSON file with:
- **images**: Image metadata (id, width, height, filename)
- **categories**: List of category definitions (25 classes from coco_class_mapping)
- **annotations**: List of annotations with:
- `id`: Annotation ID
- `image_id`: Associated image ID
- `category_id`: Class ID from coco_class_mapping
- `segmentation`: Polygon coordinates
- `bbox`: Bounding box [x, y, width, height]
- `area`: Polygon area
## Class Mapping
The `catmus_zones_mapping` in `image_batch_classes.py` maps:
- Catmus/Zone model classes β coco_class_mapping classes
- Example: `DefaultLine` β `Main script black`
- Example: `MainZone` β `Column`
Only classes that map to `coco_class_mapping` are included in the final output.
|