layout / compare /data /README.md
hassanshka's picture
Add test_combined_models.py and compare/ folder (excluding cvat_project_7_export and Annika 2 folders)
0a216c0

A newer version of the Gradio SDK is available: 6.5.1

Upgrade

Model Comparison Scripts

This directory contains scripts to compare old models vs new models vs ground truth annotations.

Files

  • original_annotations.py: Parses CVAT XML annotations and converts to COCO format
  • old_models.py: Runs old models (Line, Border, Zones) and converts predictions to COCO
  • new_models.py: Runs new models (emanuskript, catmus, zone) and converts predictions to COCO
  • compare.py: Main script that orchestrates the comparison and calculates metrics

Setup

  1. Install required dependencies:
pip install pycocotools numpy pillow matplotlib ultralytics
  1. Ensure model files are in the project root:
    • Old models: best_line_detection_yoloe (1).pt, border_model_weights.pt, zones_model_weights.pt
    • New models: best_emanuskript_segmentation.pt, best_catmus.pt, best_zone_detection.pt

Usage

Run the main comparison script:

cd /home/hasan/layout/compare/data
python compare.py

The script will:

  1. Load ground truth annotations from Aleyna 1 (2024)/Annotations/annotations.xml
  2. Run old models on all images in Aleyna 1 (2024)/Images
  3. Run new models on all images
  4. Calculate metrics (mAP@50, mAP@[.50:.95], Precision, Recall)
  5. Create side-by-side visualizations for each image

Output

Results are saved to results/ directory:

  • ground_truth.json: Ground truth in COCO format
  • old_models_merged.json: Old models predictions
  • new_models_merged.json: New models predictions
  • metrics.json: Calculated metrics for both model sets
  • visualizations/: Side-by-side comparison images

Metrics

The comparison calculates:

  • mAP@50: Mean Average Precision at IoU=0.50
  • mAP@[.50:.95]: Mean Average Precision averaged over IoU thresholds from 0.50 to 0.95
  • Precision: Approximated from mAP@50
  • Recall: Maximum recall with 100 detections
  • F1 Score: Harmonic mean of Precision and Recall

Notes

  • The CVAT XML parser handles RLE (Run-Length Encoding) format masks
  • Category alignment is performed automatically to match ground truth categories
  • Images are processed sequentially - batch processing may take time
  • Visualizations show: Original+GT | Old Models | New Models side-by-side