Update README.md

Browse files

Files changed (1) hide show

README.md +107 -120

README.md CHANGED Viewed

@@ -7,72 +7,121 @@ tags:
 - yolov10
 - faster-rcnn
 - pytorch
-- bdd100k
-- pascal-voc
-- kitti
 - autonomous-driving
 - hallucination-mitigation
 - out-of-distribution
-- BDD 100K
-- Pascal-VOC
 pipeline_tag: object-detection
 datasets:
 - bdd100k
 - pascal-voc
-- kitti
-widget:
-- src: https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bounding-boxes-sample.png
-  example_title: "Sample Image"
 model-index:
-- name: m-hood
   results:
   - task:
       type: object-detection
     dataset:
-      type: multi-dataset
-      name: BDD 100K, Pascal VOC, KITTI
     metrics:
-    - type: mean_average_precision
-      name: mAP
-      value: "TBD"
 ---
-# M-Hood: Multi-Dataset Object Detection Model Collection
-**M-Hood** is a comprehensive collection of object detection models trained on multiple datasets using different architectures and training strategies. This unified repository contains both **YOLOv10** and **Faster R-CNN** models trained on **BDD 100K**, **Pascal VOC**, and **KITTI** datasets.
-The collection includes both **vanilla models** (trained from scratch) and **fine-tuned models** specifically designed to **mitigate hallucination on out-of-distribution data**.
 ## 🎯 Key Features
-- **Dual Architecture Support**: Both YOLOv10 and Faster R-CNN models
-- **Multi-Dataset Training**: BDD 100K, Pascal VOC, and KITTI datasets
-- **Hallucination Mitigation**: Fine-tuned models for robust out-of-distribution performance
-- **Real-world Applications**: Autonomous driving and general object detection
-## 📊 Model Performance Overview
 ### YOLOv10 Models
 | Model | Dataset | Training Type | Size | Description | Download |
 |-------|---------|---------------|------|-------------|----------|
-| **yolov10-bdd-vanilla.pt** | BDD 100K | Vanilla | 62MB | Real-time detection for autonomous driving | [Download](./yolov10-bdd-vanilla.pt) |
-| **yolov10-voc-vanilla.pt** | Pascal VOC | Vanilla | 63MB | General purpose object detection | [Download](./yolov10-voc-vanilla.pt) |
-| **yolov10-kitti-vanilla.pt** | KITTI | Vanilla | 16MB | Lightweight autonomous driving detection | [Download](./yolov10-kitti-vanilla.pt) |
-| **yolov10-bdd-finetune.pt** | BDD 100K | Fine-tuned | 62MB | OOD-robust autonomous driving detection | [Download](./yolov10-bdd-finetune.pt) |
-| **yolov10-voc-finetune.pt** | Pascal VOC | Fine-tuned | 94MB | OOD-robust general object detection | [Download](./yolov10-voc-finetune.pt) |
-| **yolov10-kitti-finetune.pt** | KITTI | Fine-tuned | 52MB | OOD-robust autonomous driving detection | [Download](./yolov10-kitti-finetune.pt) |
 ### Faster R-CNN Models
 | Model | Dataset | Training Type | Size | Description | Download |
 |-------|---------|---------------|------|-------------|----------|
-| **faster-rcnn-bdd-vanilla.pth** | BDD 100K | Vanilla | 315MB | High-accuracy autonomous driving detection | [Download](./faster-rcnn-bdd-vanilla.pth) |
-| **faster-rcnn-voc-vanilla.pth** | Pascal VOC | Vanilla | 315MB | High-accuracy general object detection | [Download](./faster-rcnn-voc-vanilla.pth) |
-| **faster-rcnn-kitti-vanilla.pth** | KITTI | Vanilla | 315MB | High-accuracy autonomous driving detection | [Download](./faster-rcnn-kitti-vanilla.pth) |
-| **faster-rcnn-bdd-finetune.pth** | BDD 100K | Fine-tuned | 158MB | OOD-robust high-accuracy detection | [Download](./faster-rcnn-bdd-finetune.pth) |
-| **faster-rcnn-voc-finetune.pth** | Pascal VOC | Fine-tuned | 158MB | OOD-robust high-accuracy detection | [Download](./faster-rcnn-voc-finetune.pth) |
-| **faster-rcnn-kitti-finetune.pth** | KITTI | Fine-tuned | 158MB | OOD-robust high-accuracy detection | [Download](./faster-rcnn-kitti-finetune.pth) |
 ## 🚀 Quick Start
@@ -81,11 +130,11 @@ The collection includes both **vanilla models** (trained from scratch) and **fin
 ```python
 from ultralytics import YOLO
-# Load a vanilla YOLOv10 model
-model = YOLO('yolov10-bdd-vanilla.pt')
 # Run inference
-results = model('path/to/image.jpg')
 # Process results
 for result in results:
@@ -98,12 +147,18 @@ for result in results:
 ```python
 import torch
-# Load a Faster R-CNN model
-model = torch.load('faster-rcnn-bdd-vanilla.pth')
 model.eval()
-# Run inference
 with torch.no_grad():
     predictions = model(image_tensor)
@@ -113,89 +168,21 @@ scores = predictions[0]['scores']
 labels = predictions[0]['labels']
 ```
-## 🎯 Fine-tuning Objective
-The **fine-tuned models** in this collection have been specifically trained to **mitigate hallucination on out-of-distribution (OOD) data**. This means:
-- **Improved Robustness**: Better performance when encountering images different from training distribution
-- **Reduced False Positives**: Lower tendency to detect objects that aren't actually present
-- **Enhanced Reliability**: More trustworthy predictions in real-world deployment scenarios
-## 📁 Dataset Information
-### BDD 100K (Berkeley DeepDrive)
-- **100,000+** driving images with diverse weather and lighting conditions
-- **Object Classes**: car, truck, bus, motorcycle, bicycle, person, traffic light, traffic sign, train, rider
-- **Application**: Autonomous driving scenarios
-### Pascal VOC (Visual Object Classes)
-- Standard benchmark dataset for object detection
-- **20 Object Classes**: aeroplane, bicycle, bird, boat, bottle, bus, car, cat, chair, cow, diningtable, dog, horse, motorbike, person, pottedplant, sheep, sofa, train, tvmonitor
-- **Application**: General computer vision applications
-### KITTI Object Detection
-- Real-world autonomous driving dataset
-- **Object Classes**: car, pedestrian, cyclist
-- **Application**: Autonomous driving with focus on urban scenarios
-## 🏗️ Architecture Comparison
-### YOLOv10 (Real-time Detection)
-- **Type**: Single-stage detector
-- **Speed**: High (real-time inference)
-- **Accuracy**: Good
-- **Use Case**: Real-time applications, edge deployment
-### Faster R-CNN (High-accuracy Detection)
-- **Type**: Two-stage detector
-- **Speed**: Moderate
-- **Accuracy**: High
-- **Use Case**: High-accuracy requirements, research applications
-## 📈 Model Selection Guide
-| Use Case | Recommended Model | Reason |
-|----------|-------------------|---------|
-| **Real-time autonomous driving** | `yolov10-bdd-finetune.pt` | Fast + OOD robust + driving-specific |
-| **High-accuracy autonomous driving** | `faster-rcnn-bdd-finetune.pth` | High accuracy + OOD robust + driving-specific |
-| **General object detection (fast)** | `yolov10-voc-finetune.pt` | Fast + OOD robust + general purpose |
-| **General object detection (accurate)** | `faster-rcnn-voc-finetune.pth` | High accuracy + OOD robust + general purpose |
-| **Research/Baseline** | Any vanilla model | Standard training baseline |
-## 🔬 Research Applications
-This model collection is particularly useful for research in:
-- **Out-of-distribution detection**
-- **Domain adaptation**
-- **Robust object detection**
-- **Autonomous driving perception**
-- **Multi-dataset learning**
-## 📄 Citations
-If you use these models in your research, please cite:
 ```bibtex
-@article{yolov10,
-  title={YOLOv10: Real-Time End-to-End Object Detection},
-  author={Wang, Ao and Chen, Hui and Liu, Lihao and Chen, Kai and Lin, Zijia and Han, Jungong and Ding, Guiguang},
-  journal={arXiv preprint arXiv:2405.14458},
-  year={2024}
-}
-@article{ren2015faster,
-  title={Faster r-cnn: Towards real-time object detection with region proposal networks},
-  author={Ren, Shaoqing and He, Kaiming and Girshick, Ross and Sun, Jian},
-  journal={Advances in neural information processing systems},
-  volume={28},
-  year={2015}
 }
 ```
-## 📜 License
-This model collection is released under the MIT License.
-## 🏷️ Keywords
-Object Detection, Computer Vision, YOLOv10, Faster R-CNN, BDD 100K, Pascal-VOC, KITTI, Autonomous Driving, Hallucination Mitigation, Out-of-Distribution, Deep Learning, PyTorch

 - yolov10
 - faster-rcnn
 - pytorch
 - autonomous-driving
 - hallucination-mitigation
 - out-of-distribution
+- ood-detection
+- proximal-ood
+- benchmark-analysis
+- bdd100k
+- pascal-voc
 pipeline_tag: object-detection
 datasets:
 - bdd100k
 - pascal-voc
+- openimages
 model-index:
+- name: m-hood-yolov10-bdd-finetuned
   results:
   - task:
       type: object-detection
     dataset:
+      type: bdd100k
+      name: BDD 100K
     metrics:
+    - type: mAP@50-95
+      value: 0.34
+    - type: hallucination_reduction_near_ood
+      name: Hallucination Reduction (Near-OoD)
+      value: "79.5%"
+- name: m-hood-faster-rcnn-bdd-finetuned
+  results:
+  - task:
+      type: object-detection
+    dataset:
+      type: bdd100k
+      name: BDD 100K
+    metrics:
+    - type: mAP@50
+      value: 0.252
+    - type: hallucination_reduction_near_ood
+      name: Hallucination Reduction (Near-OoD)
+      value: "84.8%"
 ---
+# M-Hood: Models for Mitigating Hallucinations in Object Detection
+[![Paper](https://img.shields.io/badge/Paper-PDF-red)](https://gricad-gitlab.univ-grenoble-alpes.fr/dnn-safety/m-hood) <!-- Replace with actual paper link when available -->
+[![Code & Datasets](https://img.shields.io/badge/Code%20%26%20Datasets-GitLab-orange)](https://gricad-gitlab.univ-grenoble-alpes.fr/dnn-safety/m-hood)
+This repository contains the official models from the paper **"Mitigating Hallucinations in Object Detection Models: A Revisit to Out-of-Distribution Detection"**. It provides a collection of YOLOv10 and Faster R-CNN models, including both standard ("vanilla") baselines and our novel **fine-tuned models** designed to significantly reduce false positive detections on out-of-distribution (OoD) data.
+Our work first identifies critical flaws in existing OoD benchmarks and then introduces a novel fine-tuning strategy that teaches detectors to be less overconfident. By training on specially curated **"proximal OoD"** samples, our models learn to suppress objectness scores for unfamiliar objects, leading to an **88% reduction in overall hallucination error** on the BDD-100K benchmark when combined with an OoD filter.
 ## 🎯 Key Features
+- **Novel Fine-Tuning for Hallucination Mitigation**: Models specifically fine-tuned to reduce false positives on novel objects, enhancing safety and reliability.
+- **Vanilla vs. Fine-tuned Comparison**: Provides both baseline and improved models to clearly demonstrate the effectiveness of our approach.
+- **Dual Architecture Support**: Includes both the real-time **YOLOv10** and the high-accuracy **Faster R-CNN**.
+- **Multi-Dataset Scope**: Models trained on **BDD 100K** for autonomous driving and **Pascal VOC** for general object detection.
+- **Corrected Benchmark Datasets**: Accompanied by new, carefully curated OoD test sets (**Near-OoD**, **Far-OoD**) that address the flaws in previous benchmarks.
+## 🔬 The M-Hood Approach: How It Works
+Object detectors often "hallucinate" and produce high-confidence false predictions when faced with novel objects not seen during training. Our research tackles this in two ways:
+1.  **Benchmarking Revisited**: We found existing OoD test sets were flawed, with ~13% of supposedly "OoD" images containing in-distribution (ID) objects. This leads to inaccurate performance evaluation. We created new, clean OoD benchmarks for reliable assessment.
+2.  **Proximal OoD Fine-Tuning**: Our core contribution is a fine-tuning strategy that makes the detector itself more robust.
+    - We create a dataset of **"proximal OoD"** samples—objects that are semantically similar to the training classes but are not part of them (e.g., using 'deer' as a proximal OoD sample for a model trained on 'horse' and 'cow').
+    - We fine-tune the original models on a combined dataset of original ID data and these new proximal OoD samples.
+    - During this process, the proximal OoD samples are treated as **background**, effectively teaching the model to suppress its predictions for these and other similar novel objects.
+The result is a model that is inherently more conservative and less likely to hallucinate, without significantly compromising its performance on the original task.
+## 📊 Performance Highlights
+Our fine-tuning method drastically reduces the number of hallucinated detections on OoD datasets. The tables below show the number of false predictions (lower is better) on our **Near-OoD** benchmark.
+#### YOLOv10 on BDD-100K (Near-OoD Hallucination Counts)
+| Model Configuration | Hallucination Count | Reduction |
+|---------------------|---------------------|-----------|
+| Original (Vanilla)  | 708                 | -         |
+| **Ours (Fine-tuned)** | **145**             | **-79.5%**|
+| Original + KNN Filter | 297                 | -58.1%    |
+| **Ours + KNN Filter** | **78**              | **-89.0%**|
+#### Faster R-CNN on BDD-100K (Near-OoD Hallucination Counts)
+| Model Configuration | Hallucination Count | Reduction |
+|---------------------|---------------------|-----------|
+| Original (Vanilla)  | 2,595               | -         |
+| **Ours (Fine-tuned)** | **395**             | **-84.8%**|
+| Original + KNN Filter | 1,272               | -51.0%    |
+| **Ours + KNN Filter** | **270**             | **-89.6%**|
+## 🗂️ Model Collection
 ### YOLOv10 Models
 | Model | Dataset | Training Type | Size | Description | Download |
 |-------|---------|---------------|------|-------------|----------|
+| **yolov10-bdd-vanilla.pt** | BDD 100K | Vanilla | 62MB | Baseline real-time model for autonomous driving. | [Download](./yolov10-bdd-vanilla.pt) |
+| **yolov10-bdd-finetune.pt** | BDD 100K | **Fine-tuned** | 62MB | **Robust** model with reduced OoD hallucinations. | [Download](./yolov10-bdd-finetune.pt) |
+| **yolov10-voc-vanilla.pt** | Pascal VOC | Vanilla | 63MB | Baseline model for general purpose object detection. | [Download](./yolov10-voc-vanilla.pt) |
+| **yolov10-voc-finetune.pt** | Pascal VOC | **Fine-tuned** | 94MB | **Robust** general purpose model for OoD scenarios. | [Download](./yolov10-voc-finetune.pt) |
 ### Faster R-CNN Models
 | Model | Dataset | Training Type | Size | Description | Download |
 |-------|---------|---------------|------|-------------|----------|
+| **faster-rcnn-bdd-vanilla.pth** | BDD 100K | Vanilla | 315MB | High-accuracy baseline for autonomous driving. | [Download](./faster-rcnn-bdd-vanilla.pth) |
+| **faster-rcnn-bdd-finetune.pth**| BDD 100K | **Fine-tuned** | 158MB | **Robust** high-accuracy model for OoD scenarios. | [Download](./faster-rcnn-bdd-finetune.pth) |
+| **faster-rcnn-voc-vanilla.pth** | Pascal VOC | Vanilla | 315MB | High-accuracy baseline for general object detection. | [Download](./faster-rcnn-voc-vanilla.pth) |
+| **faster-rcnn-voc-finetune.pth**| Pascal VOC | **Fine-tuned** | 158MB | **Robust** high-accuracy general purpose model. | [Download](./faster-rcnn-voc-finetune.pth) |
+*(Note: The KITTI models mentioned in the old README are not explicitly detailed in the provided paper text, so I have focused on BDD and VOC, which are central to the experiments shown.)*
 ## 🚀 Quick Start
 ```python
 from ultralytics import YOLO
+# Load our robust, fine-tuned YOLOv10 model
+model = YOLO('yolov10-bdd-finetune.pt')
 # Run inference
+results = model('path/to/your/image.jpg')
 # Process results
 for result in results:
 ```python
 import torch
+import torchvision
+from torchvision.models.detection import fasterrcnn_resnet50_fpn
+# NOTE: The provided .pth files are state_dicts.
+# You need to load them into a model instance.
+# Example for a vanilla VOC model:
+num_classes = 21 # 20 classes + background
+model = fasterrcnn_resnet50_fpn(weights=None, num_classes=num_classes)
+model.load_state_dict(torch.load('faster-rcnn-voc-vanilla.pth'))
 model.eval()
+# Run inference on a pre-processed image tensor
 with torch.no_grad():
     predictions = model(image_tensor)
 labels = predictions[0]['labels']
 ```
+## 📄 Citation
+If you use our models, datasets, or methodology in your research, please cite our paper:
 ```bibtex
+@inproceedings{he2025mitigating,
+  title={{Mitigating Hallucinations in Object Detection Models: A Revisit to Out-of-Distribution Detection}},
+  author={He, Weicheng and Wu, Changshun and Cheng, Chih-Hong and Huang, Xiaowei and Bensalem, Saddek},
+  booktitle={To Be Published},
+  year={2025}
 }
 ```
+Please also consider citing the original works for the model architectures and datasets used.
+## 📜 License
+This work is released under the MIT License.