Update README.md
Browse files
README.md
CHANGED
|
@@ -7,72 +7,121 @@ tags:
|
|
| 7 |
- yolov10
|
| 8 |
- faster-rcnn
|
| 9 |
- pytorch
|
| 10 |
-
- bdd100k
|
| 11 |
-
- pascal-voc
|
| 12 |
-
- kitti
|
| 13 |
- autonomous-driving
|
| 14 |
- hallucination-mitigation
|
| 15 |
- out-of-distribution
|
| 16 |
-
-
|
| 17 |
-
-
|
|
|
|
|
|
|
|
|
|
| 18 |
pipeline_tag: object-detection
|
| 19 |
datasets:
|
| 20 |
- bdd100k
|
| 21 |
- pascal-voc
|
| 22 |
-
-
|
| 23 |
-
widget:
|
| 24 |
-
- src: https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bounding-boxes-sample.png
|
| 25 |
-
example_title: "Sample Image"
|
| 26 |
model-index:
|
| 27 |
-
- name: m-hood
|
| 28 |
results:
|
| 29 |
- task:
|
| 30 |
type: object-detection
|
| 31 |
dataset:
|
| 32 |
-
type:
|
| 33 |
-
name: BDD 100K
|
| 34 |
metrics:
|
| 35 |
-
- type:
|
| 36 |
-
|
| 37 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 38 |
---
|
| 39 |
|
| 40 |
-
# M-Hood:
|
| 41 |
|
| 42 |
-
|
|
|
|
| 43 |
|
| 44 |
-
|
|
|
|
|
|
|
| 45 |
|
| 46 |
## 🎯 Key Features
|
| 47 |
|
| 48 |
-
- **
|
| 49 |
-
- **
|
| 50 |
-
- **
|
| 51 |
-
- **
|
|
|
|
|
|
|
|
|
|
| 52 |
|
| 53 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 54 |
|
| 55 |
### YOLOv10 Models
|
| 56 |
|
| 57 |
| Model | Dataset | Training Type | Size | Description | Download |
|
| 58 |
|-------|---------|---------------|------|-------------|----------|
|
| 59 |
-
| **yolov10-bdd-vanilla.pt** | BDD 100K | Vanilla | 62MB |
|
| 60 |
-
| **yolov10-
|
| 61 |
-
| **yolov10-
|
| 62 |
-
| **yolov10-
|
| 63 |
-
| **yolov10-voc-finetune.pt** | Pascal VOC | Fine-tuned | 94MB | OOD-robust general object detection | [Download](./yolov10-voc-finetune.pt) |
|
| 64 |
-
| **yolov10-kitti-finetune.pt** | KITTI | Fine-tuned | 52MB | OOD-robust autonomous driving detection | [Download](./yolov10-kitti-finetune.pt) |
|
| 65 |
|
| 66 |
### Faster R-CNN Models
|
| 67 |
|
| 68 |
| Model | Dataset | Training Type | Size | Description | Download |
|
| 69 |
|-------|---------|---------------|------|-------------|----------|
|
| 70 |
-
| **faster-rcnn-bdd-vanilla.pth** | BDD 100K | Vanilla | 315MB | High-accuracy autonomous driving
|
| 71 |
-
| **faster-rcnn-
|
| 72 |
-
| **faster-rcnn-
|
| 73 |
-
| **faster-rcnn-
|
| 74 |
-
|
| 75 |
-
|
| 76 |
|
| 77 |
## 🚀 Quick Start
|
| 78 |
|
|
@@ -81,11 +130,11 @@ The collection includes both **vanilla models** (trained from scratch) and **fin
|
|
| 81 |
```python
|
| 82 |
from ultralytics import YOLO
|
| 83 |
|
| 84 |
-
# Load
|
| 85 |
-
model = YOLO('yolov10-bdd-
|
| 86 |
|
| 87 |
# Run inference
|
| 88 |
-
results = model('path/to/image.jpg')
|
| 89 |
|
| 90 |
# Process results
|
| 91 |
for result in results:
|
|
@@ -98,12 +147,18 @@ for result in results:
|
|
| 98 |
|
| 99 |
```python
|
| 100 |
import torch
|
| 101 |
-
|
| 102 |
-
|
| 103 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 104 |
model.eval()
|
| 105 |
|
| 106 |
-
# Run inference
|
| 107 |
with torch.no_grad():
|
| 108 |
predictions = model(image_tensor)
|
| 109 |
|
|
@@ -113,89 +168,21 @@ scores = predictions[0]['scores']
|
|
| 113 |
labels = predictions[0]['labels']
|
| 114 |
```
|
| 115 |
|
| 116 |
-
##
|
| 117 |
-
|
| 118 |
-
The **fine-tuned models** in this collection have been specifically trained to **mitigate hallucination on out-of-distribution (OOD) data**. This means:
|
| 119 |
-
|
| 120 |
-
- **Improved Robustness**: Better performance when encountering images different from training distribution
|
| 121 |
-
- **Reduced False Positives**: Lower tendency to detect objects that aren't actually present
|
| 122 |
-
- **Enhanced Reliability**: More trustworthy predictions in real-world deployment scenarios
|
| 123 |
-
|
| 124 |
-
## 📁 Dataset Information
|
| 125 |
-
|
| 126 |
-
### BDD 100K (Berkeley DeepDrive)
|
| 127 |
-
- **100,000+** driving images with diverse weather and lighting conditions
|
| 128 |
-
- **Object Classes**: car, truck, bus, motorcycle, bicycle, person, traffic light, traffic sign, train, rider
|
| 129 |
-
- **Application**: Autonomous driving scenarios
|
| 130 |
-
|
| 131 |
-
### Pascal VOC (Visual Object Classes)
|
| 132 |
-
- Standard benchmark dataset for object detection
|
| 133 |
-
- **20 Object Classes**: aeroplane, bicycle, bird, boat, bottle, bus, car, cat, chair, cow, diningtable, dog, horse, motorbike, person, pottedplant, sheep, sofa, train, tvmonitor
|
| 134 |
-
- **Application**: General computer vision applications
|
| 135 |
-
|
| 136 |
-
### KITTI Object Detection
|
| 137 |
-
- Real-world autonomous driving dataset
|
| 138 |
-
- **Object Classes**: car, pedestrian, cyclist
|
| 139 |
-
- **Application**: Autonomous driving with focus on urban scenarios
|
| 140 |
-
|
| 141 |
-
## 🏗️ Architecture Comparison
|
| 142 |
-
|
| 143 |
-
### YOLOv10 (Real-time Detection)
|
| 144 |
-
- **Type**: Single-stage detector
|
| 145 |
-
- **Speed**: High (real-time inference)
|
| 146 |
-
- **Accuracy**: Good
|
| 147 |
-
- **Use Case**: Real-time applications, edge deployment
|
| 148 |
-
|
| 149 |
-
### Faster R-CNN (High-accuracy Detection)
|
| 150 |
-
- **Type**: Two-stage detector
|
| 151 |
-
- **Speed**: Moderate
|
| 152 |
-
- **Accuracy**: High
|
| 153 |
-
- **Use Case**: High-accuracy requirements, research applications
|
| 154 |
|
| 155 |
-
|
| 156 |
-
|
| 157 |
-
| Use Case | Recommended Model | Reason |
|
| 158 |
-
|----------|-------------------|---------|
|
| 159 |
-
| **Real-time autonomous driving** | `yolov10-bdd-finetune.pt` | Fast + OOD robust + driving-specific |
|
| 160 |
-
| **High-accuracy autonomous driving** | `faster-rcnn-bdd-finetune.pth` | High accuracy + OOD robust + driving-specific |
|
| 161 |
-
| **General object detection (fast)** | `yolov10-voc-finetune.pt` | Fast + OOD robust + general purpose |
|
| 162 |
-
| **General object detection (accurate)** | `faster-rcnn-voc-finetune.pth` | High accuracy + OOD robust + general purpose |
|
| 163 |
-
| **Research/Baseline** | Any vanilla model | Standard training baseline |
|
| 164 |
-
|
| 165 |
-
## 🔬 Research Applications
|
| 166 |
-
|
| 167 |
-
This model collection is particularly useful for research in:
|
| 168 |
-
- **Out-of-distribution detection**
|
| 169 |
-
- **Domain adaptation**
|
| 170 |
-
- **Robust object detection**
|
| 171 |
-
- **Autonomous driving perception**
|
| 172 |
-
- **Multi-dataset learning**
|
| 173 |
-
|
| 174 |
-
## 📄 Citations
|
| 175 |
-
|
| 176 |
-
If you use these models in your research, please cite:
|
| 177 |
|
| 178 |
```bibtex
|
| 179 |
-
@
|
| 180 |
-
title={
|
| 181 |
-
author={
|
| 182 |
-
|
| 183 |
-
year={
|
| 184 |
-
}
|
| 185 |
-
|
| 186 |
-
@article{ren2015faster,
|
| 187 |
-
title={Faster r-cnn: Towards real-time object detection with region proposal networks},
|
| 188 |
-
author={Ren, Shaoqing and He, Kaiming and Girshick, Ross and Sun, Jian},
|
| 189 |
-
journal={Advances in neural information processing systems},
|
| 190 |
-
volume={28},
|
| 191 |
-
year={2015}
|
| 192 |
}
|
| 193 |
```
|
| 194 |
|
| 195 |
-
|
| 196 |
|
| 197 |
-
|
| 198 |
-
|
| 199 |
-
## 🏷️ Keywords
|
| 200 |
|
| 201 |
-
|
|
|
|
| 7 |
- yolov10
|
| 8 |
- faster-rcnn
|
| 9 |
- pytorch
|
|
|
|
|
|
|
|
|
|
| 10 |
- autonomous-driving
|
| 11 |
- hallucination-mitigation
|
| 12 |
- out-of-distribution
|
| 13 |
+
- ood-detection
|
| 14 |
+
- proximal-ood
|
| 15 |
+
- benchmark-analysis
|
| 16 |
+
- bdd100k
|
| 17 |
+
- pascal-voc
|
| 18 |
pipeline_tag: object-detection
|
| 19 |
datasets:
|
| 20 |
- bdd100k
|
| 21 |
- pascal-voc
|
| 22 |
+
- openimages
|
|
|
|
|
|
|
|
|
|
| 23 |
model-index:
|
| 24 |
+
- name: m-hood-yolov10-bdd-finetuned
|
| 25 |
results:
|
| 26 |
- task:
|
| 27 |
type: object-detection
|
| 28 |
dataset:
|
| 29 |
+
type: bdd100k
|
| 30 |
+
name: BDD 100K
|
| 31 |
metrics:
|
| 32 |
+
- type: mAP@50-95
|
| 33 |
+
value: 0.34
|
| 34 |
+
- type: hallucination_reduction_near_ood
|
| 35 |
+
name: Hallucination Reduction (Near-OoD)
|
| 36 |
+
value: "79.5%"
|
| 37 |
+
- name: m-hood-faster-rcnn-bdd-finetuned
|
| 38 |
+
results:
|
| 39 |
+
- task:
|
| 40 |
+
type: object-detection
|
| 41 |
+
dataset:
|
| 42 |
+
type: bdd100k
|
| 43 |
+
name: BDD 100K
|
| 44 |
+
metrics:
|
| 45 |
+
- type: mAP@50
|
| 46 |
+
value: 0.252
|
| 47 |
+
- type: hallucination_reduction_near_ood
|
| 48 |
+
name: Hallucination Reduction (Near-OoD)
|
| 49 |
+
value: "84.8%"
|
| 50 |
---
|
| 51 |
|
| 52 |
+
# M-Hood: Models for Mitigating Hallucinations in Object Detection
|
| 53 |
|
| 54 |
+
[](https://gricad-gitlab.univ-grenoble-alpes.fr/dnn-safety/m-hood) <!-- Replace with actual paper link when available -->
|
| 55 |
+
[](https://gricad-gitlab.univ-grenoble-alpes.fr/dnn-safety/m-hood)
|
| 56 |
|
| 57 |
+
This repository contains the official models from the paper **"Mitigating Hallucinations in Object Detection Models: A Revisit to Out-of-Distribution Detection"**. It provides a collection of YOLOv10 and Faster R-CNN models, including both standard ("vanilla") baselines and our novel **fine-tuned models** designed to significantly reduce false positive detections on out-of-distribution (OoD) data.
|
| 58 |
+
|
| 59 |
+
Our work first identifies critical flaws in existing OoD benchmarks and then introduces a novel fine-tuning strategy that teaches detectors to be less overconfident. By training on specially curated **"proximal OoD"** samples, our models learn to suppress objectness scores for unfamiliar objects, leading to an **88% reduction in overall hallucination error** on the BDD-100K benchmark when combined with an OoD filter.
|
| 60 |
|
| 61 |
## 🎯 Key Features
|
| 62 |
|
| 63 |
+
- **Novel Fine-Tuning for Hallucination Mitigation**: Models specifically fine-tuned to reduce false positives on novel objects, enhancing safety and reliability.
|
| 64 |
+
- **Vanilla vs. Fine-tuned Comparison**: Provides both baseline and improved models to clearly demonstrate the effectiveness of our approach.
|
| 65 |
+
- **Dual Architecture Support**: Includes both the real-time **YOLOv10** and the high-accuracy **Faster R-CNN**.
|
| 66 |
+
- **Multi-Dataset Scope**: Models trained on **BDD 100K** for autonomous driving and **Pascal VOC** for general object detection.
|
| 67 |
+
- **Corrected Benchmark Datasets**: Accompanied by new, carefully curated OoD test sets (**Near-OoD**, **Far-OoD**) that address the flaws in previous benchmarks.
|
| 68 |
+
|
| 69 |
+
## 🔬 The M-Hood Approach: How It Works
|
| 70 |
|
| 71 |
+
Object detectors often "hallucinate" and produce high-confidence false predictions when faced with novel objects not seen during training. Our research tackles this in two ways:
|
| 72 |
+
|
| 73 |
+
1. **Benchmarking Revisited**: We found existing OoD test sets were flawed, with ~13% of supposedly "OoD" images containing in-distribution (ID) objects. This leads to inaccurate performance evaluation. We created new, clean OoD benchmarks for reliable assessment.
|
| 74 |
+
|
| 75 |
+
2. **Proximal OoD Fine-Tuning**: Our core contribution is a fine-tuning strategy that makes the detector itself more robust.
|
| 76 |
+
- We create a dataset of **"proximal OoD"** samples—objects that are semantically similar to the training classes but are not part of them (e.g., using 'deer' as a proximal OoD sample for a model trained on 'horse' and 'cow').
|
| 77 |
+
- We fine-tune the original models on a combined dataset of original ID data and these new proximal OoD samples.
|
| 78 |
+
- During this process, the proximal OoD samples are treated as **background**, effectively teaching the model to suppress its predictions for these and other similar novel objects.
|
| 79 |
+
|
| 80 |
+
The result is a model that is inherently more conservative and less likely to hallucinate, without significantly compromising its performance on the original task.
|
| 81 |
+
|
| 82 |
+
## 📊 Performance Highlights
|
| 83 |
+
|
| 84 |
+
Our fine-tuning method drastically reduces the number of hallucinated detections on OoD datasets. The tables below show the number of false predictions (lower is better) on our **Near-OoD** benchmark.
|
| 85 |
+
|
| 86 |
+
#### YOLOv10 on BDD-100K (Near-OoD Hallucination Counts)
|
| 87 |
+
|
| 88 |
+
| Model Configuration | Hallucination Count | Reduction |
|
| 89 |
+
|---------------------|---------------------|-----------|
|
| 90 |
+
| Original (Vanilla) | 708 | - |
|
| 91 |
+
| **Ours (Fine-tuned)** | **145** | **-79.5%**|
|
| 92 |
+
| Original + KNN Filter | 297 | -58.1% |
|
| 93 |
+
| **Ours + KNN Filter** | **78** | **-89.0%**|
|
| 94 |
+
|
| 95 |
+
#### Faster R-CNN on BDD-100K (Near-OoD Hallucination Counts)
|
| 96 |
+
|
| 97 |
+
| Model Configuration | Hallucination Count | Reduction |
|
| 98 |
+
|---------------------|---------------------|-----------|
|
| 99 |
+
| Original (Vanilla) | 2,595 | - |
|
| 100 |
+
| **Ours (Fine-tuned)** | **395** | **-84.8%**|
|
| 101 |
+
| Original + KNN Filter | 1,272 | -51.0% |
|
| 102 |
+
| **Ours + KNN Filter** | **270** | **-89.6%**|
|
| 103 |
+
|
| 104 |
+
## 🗂️ Model Collection
|
| 105 |
|
| 106 |
### YOLOv10 Models
|
| 107 |
|
| 108 |
| Model | Dataset | Training Type | Size | Description | Download |
|
| 109 |
|-------|---------|---------------|------|-------------|----------|
|
| 110 |
+
| **yolov10-bdd-vanilla.pt** | BDD 100K | Vanilla | 62MB | Baseline real-time model for autonomous driving. | [Download](./yolov10-bdd-vanilla.pt) |
|
| 111 |
+
| **yolov10-bdd-finetune.pt** | BDD 100K | **Fine-tuned** | 62MB | **Robust** model with reduced OoD hallucinations. | [Download](./yolov10-bdd-finetune.pt) |
|
| 112 |
+
| **yolov10-voc-vanilla.pt** | Pascal VOC | Vanilla | 63MB | Baseline model for general purpose object detection. | [Download](./yolov10-voc-vanilla.pt) |
|
| 113 |
+
| **yolov10-voc-finetune.pt** | Pascal VOC | **Fine-tuned** | 94MB | **Robust** general purpose model for OoD scenarios. | [Download](./yolov10-voc-finetune.pt) |
|
|
|
|
|
|
|
| 114 |
|
| 115 |
### Faster R-CNN Models
|
| 116 |
|
| 117 |
| Model | Dataset | Training Type | Size | Description | Download |
|
| 118 |
|-------|---------|---------------|------|-------------|----------|
|
| 119 |
+
| **faster-rcnn-bdd-vanilla.pth** | BDD 100K | Vanilla | 315MB | High-accuracy baseline for autonomous driving. | [Download](./faster-rcnn-bdd-vanilla.pth) |
|
| 120 |
+
| **faster-rcnn-bdd-finetune.pth**| BDD 100K | **Fine-tuned** | 158MB | **Robust** high-accuracy model for OoD scenarios. | [Download](./faster-rcnn-bdd-finetune.pth) |
|
| 121 |
+
| **faster-rcnn-voc-vanilla.pth** | Pascal VOC | Vanilla | 315MB | High-accuracy baseline for general object detection. | [Download](./faster-rcnn-voc-vanilla.pth) |
|
| 122 |
+
| **faster-rcnn-voc-finetune.pth**| Pascal VOC | **Fine-tuned** | 158MB | **Robust** high-accuracy general purpose model. | [Download](./faster-rcnn-voc-finetune.pth) |
|
| 123 |
+
|
| 124 |
+
*(Note: The KITTI models mentioned in the old README are not explicitly detailed in the provided paper text, so I have focused on BDD and VOC, which are central to the experiments shown.)*
|
| 125 |
|
| 126 |
## 🚀 Quick Start
|
| 127 |
|
|
|
|
| 130 |
```python
|
| 131 |
from ultralytics import YOLO
|
| 132 |
|
| 133 |
+
# Load our robust, fine-tuned YOLOv10 model
|
| 134 |
+
model = YOLO('yolov10-bdd-finetune.pt')
|
| 135 |
|
| 136 |
# Run inference
|
| 137 |
+
results = model('path/to/your/image.jpg')
|
| 138 |
|
| 139 |
# Process results
|
| 140 |
for result in results:
|
|
|
|
| 147 |
|
| 148 |
```python
|
| 149 |
import torch
|
| 150 |
+
import torchvision
|
| 151 |
+
from torchvision.models.detection import fasterrcnn_resnet50_fpn
|
| 152 |
+
|
| 153 |
+
# NOTE: The provided .pth files are state_dicts.
|
| 154 |
+
# You need to load them into a model instance.
|
| 155 |
+
# Example for a vanilla VOC model:
|
| 156 |
+
num_classes = 21 # 20 classes + background
|
| 157 |
+
model = fasterrcnn_resnet50_fpn(weights=None, num_classes=num_classes)
|
| 158 |
+
model.load_state_dict(torch.load('faster-rcnn-voc-vanilla.pth'))
|
| 159 |
model.eval()
|
| 160 |
|
| 161 |
+
# Run inference on a pre-processed image tensor
|
| 162 |
with torch.no_grad():
|
| 163 |
predictions = model(image_tensor)
|
| 164 |
|
|
|
|
| 168 |
labels = predictions[0]['labels']
|
| 169 |
```
|
| 170 |
|
| 171 |
+
## 📄 Citation
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 172 |
|
| 173 |
+
If you use our models, datasets, or methodology in your research, please cite our paper:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 174 |
|
| 175 |
```bibtex
|
| 176 |
+
@inproceedings{he2025mitigating,
|
| 177 |
+
title={{Mitigating Hallucinations in Object Detection Models: A Revisit to Out-of-Distribution Detection}},
|
| 178 |
+
author={He, Weicheng and Wu, Changshun and Cheng, Chih-Hong and Huang, Xiaowei and Bensalem, Saddek},
|
| 179 |
+
booktitle={To Be Published},
|
| 180 |
+
year={2025}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 181 |
}
|
| 182 |
```
|
| 183 |
|
| 184 |
+
Please also consider citing the original works for the model architectures and datasets used.
|
| 185 |
|
| 186 |
+
## 📜 License
|
|
|
|
|
|
|
| 187 |
|
| 188 |
+
This work is released under the MIT License.
|