File size: 9,921 Bytes
7699613
 
 
 
 
 
 
 
 
 
 
 
90628da
 
 
 
 
7699613
 
 
 
90628da
7699613
90628da
7699613
 
 
 
90628da
 
7699613
90628da
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7699613
 
90628da
7699613
bf3bca6
 
 
7699613
90628da
 
 
7699613
 
 
90628da
 
 
 
 
 
 
7699613
90628da
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7699613
 
 
 
 
90628da
 
 
 
7699613
 
 
 
 
90628da
 
 
 
 
 
7699613
 
 
 
 
 
 
 
90628da
 
7699613
 
90628da
7699613
 
 
 
 
 
 
 
 
 
 
 
90628da
 
 
 
 
 
 
 
 
7699613
 
90628da
7699613
 
 
 
 
 
 
 
 
90628da
7699613
3317898
7699613
3317898
 
433e5ef
3317898
 
 
 
 
 
 
 
 
 
 
 
 
433e5ef
9a9c9e7
 
 
3317898
9a9c9e7
 
3317898
7699613
 
 
90628da
7699613
90628da
7699613
90628da
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
---
license: mit
library_name: ultralytics
tags:
- object-detection
- computer-vision
- yolov10
- faster-rcnn
- pytorch
- autonomous-driving
- hallucination-mitigation
- out-of-distribution
- ood-detection
- proximal-ood
- benchmark-analysis
- bdd100k
- pascal-voc
pipeline_tag: object-detection
datasets:
- bdd100k
- pascal-voc
- openimages
model-index:
- name: m-hood-yolov10-bdd-finetuned
  results:
  - task:
      type: object-detection
    dataset:
      type: bdd100k
      name: BDD 100K
    metrics:
    - type: mAP@50-95
      value: 0.34
    - type: hallucination_reduction_near_ood
      name: Hallucination Reduction (Near-OoD)
      value: "79.5%"
- name: m-hood-faster-rcnn-bdd-finetuned
  results:
  - task:
      type: object-detection
    dataset:
      type: bdd100k
      name: BDD 100K
    metrics:
    - type: mAP@50
      value: 0.252
    - type: hallucination_reduction_near_ood
      name: Hallucination Reduction (Near-OoD)
      value: "84.8%"
---

# M-Hood: Models for Mitigating Hallucinations in Object Detection

[![Paper](https://img.shields.io/badge/Paper-PDF-red)](https://arxiv.org/pdf/2503.07330) <!-- Replace with actual paper link when available -->
[![Code](https://img.shields.io/badge/Code-GitLab-orange)](https://gricad-gitlab.univ-grenoble-alpes.fr/dnn-safety/m-hood)
[![Dataset](https://img.shields.io/badge/Dataset-HuggingFace-blue)](https://huggingface.co/datasets/HugoHE/m-hood-dataset)

This repository contains the official models from the paper **"Mitigating Hallucinations in Object Detection Models: A Revisit to Out-of-Distribution Detection"**. It provides a collection of YOLOv10 and Faster R-CNN models, including both standard ("vanilla") baselines and our novel **fine-tuned models** designed to significantly reduce false positive detections on out-of-distribution (OoD) data.

Our work first identifies critical flaws in existing OoD benchmarks and then introduces a novel fine-tuning strategy that teaches detectors to be less overconfident. By training on specially curated **"proximal OoD"** samples, our models learn to suppress objectness scores for unfamiliar objects, leading to an **88% reduction in overall hallucination error** on the BDD-100K benchmark when combined with an OoD filter.

## 🎯 Key Features

- **Novel Fine-Tuning for Hallucination Mitigation**: Models specifically fine-tuned to reduce false positives on novel objects, enhancing safety and reliability.
- **Vanilla vs. Fine-tuned Comparison**: Provides both baseline and improved models to clearly demonstrate the effectiveness of our approach.
- **Dual Architecture Support**: Includes both the real-time **YOLOv10** and the high-accuracy **Faster R-CNN**.
- **Multi-Dataset Scope**: Models trained on **BDD 100K** for autonomous driving and **Pascal VOC** for general object detection.
- **Corrected Benchmark Datasets**: Accompanied by new, carefully curated OoD test sets (**Near-OoD**, **Far-OoD**) that address the flaws in previous benchmarks.

## 🔬 The M-Hood Approach: How It Works

Object detectors often "hallucinate" and produce high-confidence false predictions when faced with novel objects not seen during training. Our research tackles this in two ways:

1.  **Benchmarking Revisited**: We found existing OoD test sets were flawed, with ~13% of supposedly "OoD" images containing in-distribution (ID) objects. This leads to inaccurate performance evaluation. We created new, clean OoD benchmarks for reliable assessment.

2.  **Proximal OoD Fine-Tuning**: Our core contribution is a fine-tuning strategy that makes the detector itself more robust.
    - We create a dataset of **"proximal OoD"** samples—objects that are semantically similar to the training classes but are not part of them (e.g., using 'deer' as a proximal OoD sample for a model trained on 'horse' and 'cow').
    - We fine-tune the original models on a combined dataset of original ID data and these new proximal OoD samples.
    - During this process, the proximal OoD samples are treated as **background**, effectively teaching the model to suppress its predictions for these and other similar novel objects.

The result is a model that is inherently more conservative and less likely to hallucinate, without significantly compromising its performance on the original task.

## 📊 Performance Highlights

Our fine-tuning method drastically reduces the number of hallucinated detections on OoD datasets. The tables below show the number of false predictions (lower is better) on our **Near-OoD** benchmark.

#### YOLOv10 on BDD-100K (Near-OoD Hallucination Counts)

| Model Configuration | Hallucination Count | Reduction |
|---------------------|---------------------|-----------|
| Original (Vanilla)  | 708                 | -         |
| **Ours (Fine-tuned)** | **145**             | **-79.5%**|
| Original + KNN Filter | 297                 | -58.1%    |
| **Ours + KNN Filter** | **78**              | **-89.0%**|

#### Faster R-CNN on BDD-100K (Near-OoD Hallucination Counts)

| Model Configuration | Hallucination Count | Reduction |
|---------------------|---------------------|-----------|
| Original (Vanilla)  | 2,595               | -         |
| **Ours (Fine-tuned)** | **395**             | **-84.8%**|
| Original + KNN Filter | 1,272               | -51.0%    |
| **Ours + KNN Filter** | **270**             | **-89.6%**|

## 🗂️ Model Collection

### YOLOv10 Models

| Model | Dataset | Training Type | Size | Description | Download |
|-------|---------|---------------|------|-------------|----------|
| **yolov10-bdd-vanilla.pt** | BDD 100K | Vanilla | 62MB | Baseline real-time model for autonomous driving. | [Download](./yolov10-bdd-vanilla.pt) |
| **yolov10-bdd-finetune.pt** | BDD 100K | **Fine-tuned** | 62MB | **Robust** model with reduced OoD hallucinations. | [Download](./yolov10-bdd-finetune.pt) |
| **yolov10-voc-vanilla.pt** | Pascal VOC | Vanilla | 63MB | Baseline model for general purpose object detection. | [Download](./yolov10-voc-vanilla.pt) |
| **yolov10-voc-finetune.pt** | Pascal VOC | **Fine-tuned** | 94MB | **Robust** general purpose model for OoD scenarios. | [Download](./yolov10-voc-finetune.pt) |

### Faster R-CNN Models

| Model | Dataset | Training Type | Size | Description | Download |
|-------|---------|---------------|------|-------------|----------|
| **faster-rcnn-bdd-vanilla.pth** | BDD 100K | Vanilla | 315MB | High-accuracy baseline for autonomous driving. | [Download](./faster-rcnn-bdd-vanilla.pth) |
| **faster-rcnn-bdd-finetune.pth**| BDD 100K | **Fine-tuned** | 158MB | **Robust** high-accuracy model for OoD scenarios. | [Download](./faster-rcnn-bdd-finetune.pth) |
| **faster-rcnn-voc-vanilla.pth** | Pascal VOC | Vanilla | 315MB | High-accuracy baseline for general object detection. | [Download](./faster-rcnn-voc-vanilla.pth) |
| **faster-rcnn-voc-finetune.pth**| Pascal VOC | **Fine-tuned** | 158MB | **Robust** high-accuracy general purpose model. | [Download](./faster-rcnn-voc-finetune.pth) |

*(Note: The KITTI models mentioned in the old README are not explicitly detailed in the provided paper text, so I have focused on BDD and VOC, which are central to the experiments shown.)*

## 🚀 Quick Start

### YOLOv10 Usage

```python
from ultralytics import YOLO

# Load our robust, fine-tuned YOLOv10 model
model = YOLO('yolov10-bdd-finetune.pt')

# Run inference
results = model('path/to/your/image.jpg')

# Process results
for result in results:
    boxes = result.boxes.xyxy   # bounding boxes
    scores = result.boxes.conf  # confidence scores
    classes = result.boxes.cls  # class predictions
```

### Faster R-CNN Usage

```python
import torch
import torchvision
from torchvision.models.detection import fasterrcnn_resnet50_fpn

# NOTE: The provided .pth files are state_dicts.
# You need to load them into a model instance.
# Example for a vanilla VOC model:
num_classes = 21 # 20 classes + background
model = fasterrcnn_resnet50_fpn(weights=None, num_classes=num_classes)
model.load_state_dict(torch.load('faster-rcnn-voc-vanilla.pth'))
model.eval()

# Run inference on a pre-processed image tensor
with torch.no_grad():
    predictions = model(image_tensor)

# Process results
boxes = predictions[0]['boxes']
scores = predictions[0]['scores']
labels = predictions[0]['labels']
```

## 📄 Citation

If you use our models, datasets, or methodology in your research, please cite our papers.

For the IROS 2025 conference version, which primarily focuses on YOLO models and represents an earlier conference publication, please cite:
```
@inproceedings{he2025mitigating,
      title={Mitigating Hallucinations in YOLO-based Object Detection Models: A Revisit to Out-of-Distribution Detection},
      author={Weicheng He and Changshun Wu and Chih-Hong Cheng and Xiaowei Huang and Saddek Bensalem},
      booktitle={Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
      year={2025},
      note={Accepted to IROS 2025},
      eprint={2503.07330v2},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2503.07330v2}
}
```
For the journal version, which expands the methodology to Faster-RCNN and RT-DETR, includes an automated data curation pipeline, and provides an in-depth analysis of the approach, please cite:
```
@inproceedings{wu2025revisiting,
      title={Revisiting Out-of-Distribution Detection in Real-time Object Detection: From Benchmark Pitfalls to a New Mitigation Paradigm}, 
      author={Changshun Wu and Weicheng He and Chih-Hong Cheng and Xiaowei Huang and Saddek Bensalem},
      year={2025},
      eprint={2503.07330v3},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2503.07330v3}, 
}
```

Please also consider citing the original works for the model architectures and datasets used.

## 📜 License

This work is released under the MIT License.