|
|
--- |
|
|
license: mit |
|
|
library_name: ultralytics |
|
|
tags: |
|
|
- object-detection |
|
|
- computer-vision |
|
|
- yolov10 |
|
|
- faster-rcnn |
|
|
- pytorch |
|
|
- autonomous-driving |
|
|
- hallucination-mitigation |
|
|
- out-of-distribution |
|
|
- ood-detection |
|
|
- proximal-ood |
|
|
- benchmark-analysis |
|
|
- bdd100k |
|
|
- pascal-voc |
|
|
pipeline_tag: object-detection |
|
|
datasets: |
|
|
- bdd100k |
|
|
- pascal-voc |
|
|
- openimages |
|
|
model-index: |
|
|
- name: m-hood-yolov10-bdd-finetuned |
|
|
results: |
|
|
- task: |
|
|
type: object-detection |
|
|
dataset: |
|
|
type: bdd100k |
|
|
name: BDD 100K |
|
|
metrics: |
|
|
- type: mAP@50-95 |
|
|
value: 0.34 |
|
|
- type: hallucination_reduction_near_ood |
|
|
name: Hallucination Reduction (Near-OoD) |
|
|
value: "79.5%" |
|
|
- name: m-hood-faster-rcnn-bdd-finetuned |
|
|
results: |
|
|
- task: |
|
|
type: object-detection |
|
|
dataset: |
|
|
type: bdd100k |
|
|
name: BDD 100K |
|
|
metrics: |
|
|
- type: mAP@50 |
|
|
value: 0.252 |
|
|
- type: hallucination_reduction_near_ood |
|
|
name: Hallucination Reduction (Near-OoD) |
|
|
value: "84.8%" |
|
|
--- |
|
|
|
|
|
# M-Hood: Models for Mitigating Hallucinations in Object Detection |
|
|
|
|
|
[](https://arxiv.org/pdf/2503.07330) <!-- Replace with actual paper link when available --> |
|
|
[](https://gricad-gitlab.univ-grenoble-alpes.fr/dnn-safety/m-hood) |
|
|
[](https://huggingface.co/datasets/HugoHE/m-hood-dataset) |
|
|
|
|
|
This repository contains the official models from the paper **"Mitigating Hallucinations in Object Detection Models: A Revisit to Out-of-Distribution Detection"**. It provides a collection of YOLOv10 and Faster R-CNN models, including both standard ("vanilla") baselines and our novel **fine-tuned models** designed to significantly reduce false positive detections on out-of-distribution (OoD) data. |
|
|
|
|
|
Our work first identifies critical flaws in existing OoD benchmarks and then introduces a novel fine-tuning strategy that teaches detectors to be less overconfident. By training on specially curated **"proximal OoD"** samples, our models learn to suppress objectness scores for unfamiliar objects, leading to an **88% reduction in overall hallucination error** on the BDD-100K benchmark when combined with an OoD filter. |
|
|
|
|
|
## ๐ฏ Key Features |
|
|
|
|
|
- **Novel Fine-Tuning for Hallucination Mitigation**: Models specifically fine-tuned to reduce false positives on novel objects, enhancing safety and reliability. |
|
|
- **Vanilla vs. Fine-tuned Comparison**: Provides both baseline and improved models to clearly demonstrate the effectiveness of our approach. |
|
|
- **Dual Architecture Support**: Includes both the real-time **YOLOv10** and the high-accuracy **Faster R-CNN**. |
|
|
- **Multi-Dataset Scope**: Models trained on **BDD 100K** for autonomous driving and **Pascal VOC** for general object detection. |
|
|
- **Corrected Benchmark Datasets**: Accompanied by new, carefully curated OoD test sets (**Near-OoD**, **Far-OoD**) that address the flaws in previous benchmarks. |
|
|
|
|
|
## ๐ฌ The M-Hood Approach: How It Works |
|
|
|
|
|
Object detectors often "hallucinate" and produce high-confidence false predictions when faced with novel objects not seen during training. Our research tackles this in two ways: |
|
|
|
|
|
1. **Benchmarking Revisited**: We found existing OoD test sets were flawed, with ~13% of supposedly "OoD" images containing in-distribution (ID) objects. This leads to inaccurate performance evaluation. We created new, clean OoD benchmarks for reliable assessment. |
|
|
|
|
|
2. **Proximal OoD Fine-Tuning**: Our core contribution is a fine-tuning strategy that makes the detector itself more robust. |
|
|
- We create a dataset of **"proximal OoD"** samplesโobjects that are semantically similar to the training classes but are not part of them (e.g., using 'deer' as a proximal OoD sample for a model trained on 'horse' and 'cow'). |
|
|
- We fine-tune the original models on a combined dataset of original ID data and these new proximal OoD samples. |
|
|
- During this process, the proximal OoD samples are treated as **background**, effectively teaching the model to suppress its predictions for these and other similar novel objects. |
|
|
|
|
|
The result is a model that is inherently more conservative and less likely to hallucinate, without significantly compromising its performance on the original task. |
|
|
|
|
|
## ๐ Performance Highlights |
|
|
|
|
|
Our fine-tuning method drastically reduces the number of hallucinated detections on OoD datasets. The tables below show the number of false predictions (lower is better) on our **Near-OoD** benchmark. |
|
|
|
|
|
#### YOLOv10 on BDD-100K (Near-OoD Hallucination Counts) |
|
|
|
|
|
| Model Configuration | Hallucination Count | Reduction | |
|
|
|---------------------|---------------------|-----------| |
|
|
| Original (Vanilla) | 708 | - | |
|
|
| **Ours (Fine-tuned)** | **145** | **-79.5%**| |
|
|
| Original + KNN Filter | 297 | -58.1% | |
|
|
| **Ours + KNN Filter** | **78** | **-89.0%**| |
|
|
|
|
|
#### Faster R-CNN on BDD-100K (Near-OoD Hallucination Counts) |
|
|
|
|
|
| Model Configuration | Hallucination Count | Reduction | |
|
|
|---------------------|---------------------|-----------| |
|
|
| Original (Vanilla) | 2,595 | - | |
|
|
| **Ours (Fine-tuned)** | **395** | **-84.8%**| |
|
|
| Original + KNN Filter | 1,272 | -51.0% | |
|
|
| **Ours + KNN Filter** | **270** | **-89.6%**| |
|
|
|
|
|
## ๐๏ธ Model Collection |
|
|
|
|
|
### YOLOv10 Models |
|
|
|
|
|
| Model | Dataset | Training Type | Size | Description | Download | |
|
|
|-------|---------|---------------|------|-------------|----------| |
|
|
| **yolov10-bdd-vanilla.pt** | BDD 100K | Vanilla | 62MB | Baseline real-time model for autonomous driving. | [Download](./yolov10-bdd-vanilla.pt) | |
|
|
| **yolov10-bdd-finetune.pt** | BDD 100K | **Fine-tuned** | 62MB | **Robust** model with reduced OoD hallucinations. | [Download](./yolov10-bdd-finetune.pt) | |
|
|
| **yolov10-voc-vanilla.pt** | Pascal VOC | Vanilla | 63MB | Baseline model for general purpose object detection. | [Download](./yolov10-voc-vanilla.pt) | |
|
|
| **yolov10-voc-finetune.pt** | Pascal VOC | **Fine-tuned** | 94MB | **Robust** general purpose model for OoD scenarios. | [Download](./yolov10-voc-finetune.pt) | |
|
|
|
|
|
### Faster R-CNN Models |
|
|
|
|
|
| Model | Dataset | Training Type | Size | Description | Download | |
|
|
|-------|---------|---------------|------|-------------|----------| |
|
|
| **faster-rcnn-bdd-vanilla.pth** | BDD 100K | Vanilla | 315MB | High-accuracy baseline for autonomous driving. | [Download](./faster-rcnn-bdd-vanilla.pth) | |
|
|
| **faster-rcnn-bdd-finetune.pth**| BDD 100K | **Fine-tuned** | 158MB | **Robust** high-accuracy model for OoD scenarios. | [Download](./faster-rcnn-bdd-finetune.pth) | |
|
|
| **faster-rcnn-voc-vanilla.pth** | Pascal VOC | Vanilla | 315MB | High-accuracy baseline for general object detection. | [Download](./faster-rcnn-voc-vanilla.pth) | |
|
|
| **faster-rcnn-voc-finetune.pth**| Pascal VOC | **Fine-tuned** | 158MB | **Robust** high-accuracy general purpose model. | [Download](./faster-rcnn-voc-finetune.pth) | |
|
|
|
|
|
*(Note: The KITTI models mentioned in the old README are not explicitly detailed in the provided paper text, so I have focused on BDD and VOC, which are central to the experiments shown.)* |
|
|
|
|
|
## ๐ Quick Start |
|
|
|
|
|
### YOLOv10 Usage |
|
|
|
|
|
```python |
|
|
from ultralytics import YOLO |
|
|
|
|
|
# Load our robust, fine-tuned YOLOv10 model |
|
|
model = YOLO('yolov10-bdd-finetune.pt') |
|
|
|
|
|
# Run inference |
|
|
results = model('path/to/your/image.jpg') |
|
|
|
|
|
# Process results |
|
|
for result in results: |
|
|
boxes = result.boxes.xyxy # bounding boxes |
|
|
scores = result.boxes.conf # confidence scores |
|
|
classes = result.boxes.cls # class predictions |
|
|
``` |
|
|
|
|
|
### Faster R-CNN Usage |
|
|
|
|
|
```python |
|
|
import torch |
|
|
import torchvision |
|
|
from torchvision.models.detection import fasterrcnn_resnet50_fpn |
|
|
|
|
|
# NOTE: The provided .pth files are state_dicts. |
|
|
# You need to load them into a model instance. |
|
|
# Example for a vanilla VOC model: |
|
|
num_classes = 21 # 20 classes + background |
|
|
model = fasterrcnn_resnet50_fpn(weights=None, num_classes=num_classes) |
|
|
model.load_state_dict(torch.load('faster-rcnn-voc-vanilla.pth')) |
|
|
model.eval() |
|
|
|
|
|
# Run inference on a pre-processed image tensor |
|
|
with torch.no_grad(): |
|
|
predictions = model(image_tensor) |
|
|
|
|
|
# Process results |
|
|
boxes = predictions[0]['boxes'] |
|
|
scores = predictions[0]['scores'] |
|
|
labels = predictions[0]['labels'] |
|
|
``` |
|
|
|
|
|
## ๐ Citation |
|
|
|
|
|
If you use our models, datasets, or methodology in your research, please cite our papers. |
|
|
|
|
|
For the IROS 2025 conference version, which primarily focuses on YOLO models and represents an earlier conference publication, please cite: |
|
|
``` |
|
|
@inproceedings{he2025mitigating, |
|
|
title={Mitigating Hallucinations in YOLO-based Object Detection Models: A Revisit to Out-of-Distribution Detection}, |
|
|
author={Weicheng He and Changshun Wu and Chih-Hong Cheng and Xiaowei Huang and Saddek Bensalem}, |
|
|
booktitle={Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)}, |
|
|
year={2025}, |
|
|
note={Accepted to IROS 2025}, |
|
|
eprint={2503.07330v2}, |
|
|
archivePrefix={arXiv}, |
|
|
primaryClass={cs.CV}, |
|
|
url={https://arxiv.org/abs/2503.07330v2} |
|
|
} |
|
|
``` |
|
|
For the journal version, which expands the methodology to Faster-RCNN and RT-DETR, includes an automated data curation pipeline, and provides an in-depth analysis of the approach, please cite: |
|
|
``` |
|
|
@inproceedings{wu2025revisiting, |
|
|
title={Revisiting Out-of-Distribution Detection in Real-time Object Detection: From Benchmark Pitfalls to a New Mitigation Paradigm}, |
|
|
author={Changshun Wu and Weicheng He and Chih-Hong Cheng and Xiaowei Huang and Saddek Bensalem}, |
|
|
year={2025}, |
|
|
eprint={2503.07330v3}, |
|
|
archivePrefix={arXiv}, |
|
|
primaryClass={cs.CV}, |
|
|
url={https://arxiv.org/abs/2503.07330v3}, |
|
|
} |
|
|
``` |
|
|
|
|
|
Please also consider citing the original works for the model architectures and datasets used. |
|
|
|
|
|
## ๐ License |
|
|
|
|
|
This work is released under the MIT License. |