HugoHE commited on
Commit
90628da
·
verified ·
1 Parent(s): 9f28304

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +107 -120
README.md CHANGED
@@ -7,72 +7,121 @@ tags:
7
  - yolov10
8
  - faster-rcnn
9
  - pytorch
10
- - bdd100k
11
- - pascal-voc
12
- - kitti
13
  - autonomous-driving
14
  - hallucination-mitigation
15
  - out-of-distribution
16
- - BDD 100K
17
- - Pascal-VOC
 
 
 
18
  pipeline_tag: object-detection
19
  datasets:
20
  - bdd100k
21
  - pascal-voc
22
- - kitti
23
- widget:
24
- - src: https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bounding-boxes-sample.png
25
- example_title: "Sample Image"
26
  model-index:
27
- - name: m-hood
28
  results:
29
  - task:
30
  type: object-detection
31
  dataset:
32
- type: multi-dataset
33
- name: BDD 100K, Pascal VOC, KITTI
34
  metrics:
35
- - type: mean_average_precision
36
- name: mAP
37
- value: "TBD"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
38
  ---
39
 
40
- # M-Hood: Multi-Dataset Object Detection Model Collection
41
 
42
- **M-Hood** is a comprehensive collection of object detection models trained on multiple datasets using different architectures and training strategies. This unified repository contains both **YOLOv10** and **Faster R-CNN** models trained on **BDD 100K**, **Pascal VOC**, and **KITTI** datasets.
 
43
 
44
- The collection includes both **vanilla models** (trained from scratch) and **fine-tuned models** specifically designed to **mitigate hallucination on out-of-distribution data**.
 
 
45
 
46
  ## 🎯 Key Features
47
 
48
- - **Dual Architecture Support**: Both YOLOv10 and Faster R-CNN models
49
- - **Multi-Dataset Training**: BDD 100K, Pascal VOC, and KITTI datasets
50
- - **Hallucination Mitigation**: Fine-tuned models for robust out-of-distribution performance
51
- - **Real-world Applications**: Autonomous driving and general object detection
 
 
 
52
 
53
- ## 📊 Model Performance Overview
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
54
 
55
  ### YOLOv10 Models
56
 
57
  | Model | Dataset | Training Type | Size | Description | Download |
58
  |-------|---------|---------------|------|-------------|----------|
59
- | **yolov10-bdd-vanilla.pt** | BDD 100K | Vanilla | 62MB | Real-time detection for autonomous driving | [Download](./yolov10-bdd-vanilla.pt) |
60
- | **yolov10-voc-vanilla.pt** | Pascal VOC | Vanilla | 63MB | General purpose object detection | [Download](./yolov10-voc-vanilla.pt) |
61
- | **yolov10-kitti-vanilla.pt** | KITTI | Vanilla | 16MB | Lightweight autonomous driving detection | [Download](./yolov10-kitti-vanilla.pt) |
62
- | **yolov10-bdd-finetune.pt** | BDD 100K | Fine-tuned | 62MB | OOD-robust autonomous driving detection | [Download](./yolov10-bdd-finetune.pt) |
63
- | **yolov10-voc-finetune.pt** | Pascal VOC | Fine-tuned | 94MB | OOD-robust general object detection | [Download](./yolov10-voc-finetune.pt) |
64
- | **yolov10-kitti-finetune.pt** | KITTI | Fine-tuned | 52MB | OOD-robust autonomous driving detection | [Download](./yolov10-kitti-finetune.pt) |
65
 
66
  ### Faster R-CNN Models
67
 
68
  | Model | Dataset | Training Type | Size | Description | Download |
69
  |-------|---------|---------------|------|-------------|----------|
70
- | **faster-rcnn-bdd-vanilla.pth** | BDD 100K | Vanilla | 315MB | High-accuracy autonomous driving detection | [Download](./faster-rcnn-bdd-vanilla.pth) |
71
- | **faster-rcnn-voc-vanilla.pth** | Pascal VOC | Vanilla | 315MB | High-accuracy general object detection | [Download](./faster-rcnn-voc-vanilla.pth) |
72
- | **faster-rcnn-kitti-vanilla.pth** | KITTI | Vanilla | 315MB | High-accuracy autonomous driving detection | [Download](./faster-rcnn-kitti-vanilla.pth) |
73
- | **faster-rcnn-bdd-finetune.pth** | BDD 100K | Fine-tuned | 158MB | OOD-robust high-accuracy detection | [Download](./faster-rcnn-bdd-finetune.pth) |
74
- | **faster-rcnn-voc-finetune.pth** | Pascal VOC | Fine-tuned | 158MB | OOD-robust high-accuracy detection | [Download](./faster-rcnn-voc-finetune.pth) |
75
- | **faster-rcnn-kitti-finetune.pth** | KITTI | Fine-tuned | 158MB | OOD-robust high-accuracy detection | [Download](./faster-rcnn-kitti-finetune.pth) |
76
 
77
  ## 🚀 Quick Start
78
 
@@ -81,11 +130,11 @@ The collection includes both **vanilla models** (trained from scratch) and **fin
81
  ```python
82
  from ultralytics import YOLO
83
 
84
- # Load a vanilla YOLOv10 model
85
- model = YOLO('yolov10-bdd-vanilla.pt')
86
 
87
  # Run inference
88
- results = model('path/to/image.jpg')
89
 
90
  # Process results
91
  for result in results:
@@ -98,12 +147,18 @@ for result in results:
98
 
99
  ```python
100
  import torch
101
-
102
- # Load a Faster R-CNN model
103
- model = torch.load('faster-rcnn-bdd-vanilla.pth')
 
 
 
 
 
 
104
  model.eval()
105
 
106
- # Run inference
107
  with torch.no_grad():
108
  predictions = model(image_tensor)
109
 
@@ -113,89 +168,21 @@ scores = predictions[0]['scores']
113
  labels = predictions[0]['labels']
114
  ```
115
 
116
- ## 🎯 Fine-tuning Objective
117
-
118
- The **fine-tuned models** in this collection have been specifically trained to **mitigate hallucination on out-of-distribution (OOD) data**. This means:
119
-
120
- - **Improved Robustness**: Better performance when encountering images different from training distribution
121
- - **Reduced False Positives**: Lower tendency to detect objects that aren't actually present
122
- - **Enhanced Reliability**: More trustworthy predictions in real-world deployment scenarios
123
-
124
- ## 📁 Dataset Information
125
-
126
- ### BDD 100K (Berkeley DeepDrive)
127
- - **100,000+** driving images with diverse weather and lighting conditions
128
- - **Object Classes**: car, truck, bus, motorcycle, bicycle, person, traffic light, traffic sign, train, rider
129
- - **Application**: Autonomous driving scenarios
130
-
131
- ### Pascal VOC (Visual Object Classes)
132
- - Standard benchmark dataset for object detection
133
- - **20 Object Classes**: aeroplane, bicycle, bird, boat, bottle, bus, car, cat, chair, cow, diningtable, dog, horse, motorbike, person, pottedplant, sheep, sofa, train, tvmonitor
134
- - **Application**: General computer vision applications
135
-
136
- ### KITTI Object Detection
137
- - Real-world autonomous driving dataset
138
- - **Object Classes**: car, pedestrian, cyclist
139
- - **Application**: Autonomous driving with focus on urban scenarios
140
-
141
- ## 🏗️ Architecture Comparison
142
-
143
- ### YOLOv10 (Real-time Detection)
144
- - **Type**: Single-stage detector
145
- - **Speed**: High (real-time inference)
146
- - **Accuracy**: Good
147
- - **Use Case**: Real-time applications, edge deployment
148
-
149
- ### Faster R-CNN (High-accuracy Detection)
150
- - **Type**: Two-stage detector
151
- - **Speed**: Moderate
152
- - **Accuracy**: High
153
- - **Use Case**: High-accuracy requirements, research applications
154
 
155
- ## 📈 Model Selection Guide
156
-
157
- | Use Case | Recommended Model | Reason |
158
- |----------|-------------------|---------|
159
- | **Real-time autonomous driving** | `yolov10-bdd-finetune.pt` | Fast + OOD robust + driving-specific |
160
- | **High-accuracy autonomous driving** | `faster-rcnn-bdd-finetune.pth` | High accuracy + OOD robust + driving-specific |
161
- | **General object detection (fast)** | `yolov10-voc-finetune.pt` | Fast + OOD robust + general purpose |
162
- | **General object detection (accurate)** | `faster-rcnn-voc-finetune.pth` | High accuracy + OOD robust + general purpose |
163
- | **Research/Baseline** | Any vanilla model | Standard training baseline |
164
-
165
- ## 🔬 Research Applications
166
-
167
- This model collection is particularly useful for research in:
168
- - **Out-of-distribution detection**
169
- - **Domain adaptation**
170
- - **Robust object detection**
171
- - **Autonomous driving perception**
172
- - **Multi-dataset learning**
173
-
174
- ## 📄 Citations
175
-
176
- If you use these models in your research, please cite:
177
 
178
  ```bibtex
179
- @article{yolov10,
180
- title={YOLOv10: Real-Time End-to-End Object Detection},
181
- author={Wang, Ao and Chen, Hui and Liu, Lihao and Chen, Kai and Lin, Zijia and Han, Jungong and Ding, Guiguang},
182
- journal={arXiv preprint arXiv:2405.14458},
183
- year={2024}
184
- }
185
-
186
- @article{ren2015faster,
187
- title={Faster r-cnn: Towards real-time object detection with region proposal networks},
188
- author={Ren, Shaoqing and He, Kaiming and Girshick, Ross and Sun, Jian},
189
- journal={Advances in neural information processing systems},
190
- volume={28},
191
- year={2015}
192
  }
193
  ```
194
 
195
- ## 📜 License
196
 
197
- This model collection is released under the MIT License.
198
-
199
- ## 🏷️ Keywords
200
 
201
- Object Detection, Computer Vision, YOLOv10, Faster R-CNN, BDD 100K, Pascal-VOC, KITTI, Autonomous Driving, Hallucination Mitigation, Out-of-Distribution, Deep Learning, PyTorch
 
7
  - yolov10
8
  - faster-rcnn
9
  - pytorch
 
 
 
10
  - autonomous-driving
11
  - hallucination-mitigation
12
  - out-of-distribution
13
+ - ood-detection
14
+ - proximal-ood
15
+ - benchmark-analysis
16
+ - bdd100k
17
+ - pascal-voc
18
  pipeline_tag: object-detection
19
  datasets:
20
  - bdd100k
21
  - pascal-voc
22
+ - openimages
 
 
 
23
  model-index:
24
+ - name: m-hood-yolov10-bdd-finetuned
25
  results:
26
  - task:
27
  type: object-detection
28
  dataset:
29
+ type: bdd100k
30
+ name: BDD 100K
31
  metrics:
32
+ - type: mAP@50-95
33
+ value: 0.34
34
+ - type: hallucination_reduction_near_ood
35
+ name: Hallucination Reduction (Near-OoD)
36
+ value: "79.5%"
37
+ - name: m-hood-faster-rcnn-bdd-finetuned
38
+ results:
39
+ - task:
40
+ type: object-detection
41
+ dataset:
42
+ type: bdd100k
43
+ name: BDD 100K
44
+ metrics:
45
+ - type: mAP@50
46
+ value: 0.252
47
+ - type: hallucination_reduction_near_ood
48
+ name: Hallucination Reduction (Near-OoD)
49
+ value: "84.8%"
50
  ---
51
 
52
+ # M-Hood: Models for Mitigating Hallucinations in Object Detection
53
 
54
+ [![Paper](https://img.shields.io/badge/Paper-PDF-red)](https://gricad-gitlab.univ-grenoble-alpes.fr/dnn-safety/m-hood) <!-- Replace with actual paper link when available -->
55
+ [![Code & Datasets](https://img.shields.io/badge/Code%20%26%20Datasets-GitLab-orange)](https://gricad-gitlab.univ-grenoble-alpes.fr/dnn-safety/m-hood)
56
 
57
+ This repository contains the official models from the paper **"Mitigating Hallucinations in Object Detection Models: A Revisit to Out-of-Distribution Detection"**. It provides a collection of YOLOv10 and Faster R-CNN models, including both standard ("vanilla") baselines and our novel **fine-tuned models** designed to significantly reduce false positive detections on out-of-distribution (OoD) data.
58
+
59
+ Our work first identifies critical flaws in existing OoD benchmarks and then introduces a novel fine-tuning strategy that teaches detectors to be less overconfident. By training on specially curated **"proximal OoD"** samples, our models learn to suppress objectness scores for unfamiliar objects, leading to an **88% reduction in overall hallucination error** on the BDD-100K benchmark when combined with an OoD filter.
60
 
61
  ## 🎯 Key Features
62
 
63
+ - **Novel Fine-Tuning for Hallucination Mitigation**: Models specifically fine-tuned to reduce false positives on novel objects, enhancing safety and reliability.
64
+ - **Vanilla vs. Fine-tuned Comparison**: Provides both baseline and improved models to clearly demonstrate the effectiveness of our approach.
65
+ - **Dual Architecture Support**: Includes both the real-time **YOLOv10** and the high-accuracy **Faster R-CNN**.
66
+ - **Multi-Dataset Scope**: Models trained on **BDD 100K** for autonomous driving and **Pascal VOC** for general object detection.
67
+ - **Corrected Benchmark Datasets**: Accompanied by new, carefully curated OoD test sets (**Near-OoD**, **Far-OoD**) that address the flaws in previous benchmarks.
68
+
69
+ ## 🔬 The M-Hood Approach: How It Works
70
 
71
+ Object detectors often "hallucinate" and produce high-confidence false predictions when faced with novel objects not seen during training. Our research tackles this in two ways:
72
+
73
+ 1. **Benchmarking Revisited**: We found existing OoD test sets were flawed, with ~13% of supposedly "OoD" images containing in-distribution (ID) objects. This leads to inaccurate performance evaluation. We created new, clean OoD benchmarks for reliable assessment.
74
+
75
+ 2. **Proximal OoD Fine-Tuning**: Our core contribution is a fine-tuning strategy that makes the detector itself more robust.
76
+ - We create a dataset of **"proximal OoD"** samples—objects that are semantically similar to the training classes but are not part of them (e.g., using 'deer' as a proximal OoD sample for a model trained on 'horse' and 'cow').
77
+ - We fine-tune the original models on a combined dataset of original ID data and these new proximal OoD samples.
78
+ - During this process, the proximal OoD samples are treated as **background**, effectively teaching the model to suppress its predictions for these and other similar novel objects.
79
+
80
+ The result is a model that is inherently more conservative and less likely to hallucinate, without significantly compromising its performance on the original task.
81
+
82
+ ## 📊 Performance Highlights
83
+
84
+ Our fine-tuning method drastically reduces the number of hallucinated detections on OoD datasets. The tables below show the number of false predictions (lower is better) on our **Near-OoD** benchmark.
85
+
86
+ #### YOLOv10 on BDD-100K (Near-OoD Hallucination Counts)
87
+
88
+ | Model Configuration | Hallucination Count | Reduction |
89
+ |---------------------|---------------------|-----------|
90
+ | Original (Vanilla) | 708 | - |
91
+ | **Ours (Fine-tuned)** | **145** | **-79.5%**|
92
+ | Original + KNN Filter | 297 | -58.1% |
93
+ | **Ours + KNN Filter** | **78** | **-89.0%**|
94
+
95
+ #### Faster R-CNN on BDD-100K (Near-OoD Hallucination Counts)
96
+
97
+ | Model Configuration | Hallucination Count | Reduction |
98
+ |---------------------|---------------------|-----------|
99
+ | Original (Vanilla) | 2,595 | - |
100
+ | **Ours (Fine-tuned)** | **395** | **-84.8%**|
101
+ | Original + KNN Filter | 1,272 | -51.0% |
102
+ | **Ours + KNN Filter** | **270** | **-89.6%**|
103
+
104
+ ## 🗂️ Model Collection
105
 
106
  ### YOLOv10 Models
107
 
108
  | Model | Dataset | Training Type | Size | Description | Download |
109
  |-------|---------|---------------|------|-------------|----------|
110
+ | **yolov10-bdd-vanilla.pt** | BDD 100K | Vanilla | 62MB | Baseline real-time model for autonomous driving. | [Download](./yolov10-bdd-vanilla.pt) |
111
+ | **yolov10-bdd-finetune.pt** | BDD 100K | **Fine-tuned** | 62MB | **Robust** model with reduced OoD hallucinations. | [Download](./yolov10-bdd-finetune.pt) |
112
+ | **yolov10-voc-vanilla.pt** | Pascal VOC | Vanilla | 63MB | Baseline model for general purpose object detection. | [Download](./yolov10-voc-vanilla.pt) |
113
+ | **yolov10-voc-finetune.pt** | Pascal VOC | **Fine-tuned** | 94MB | **Robust** general purpose model for OoD scenarios. | [Download](./yolov10-voc-finetune.pt) |
 
 
114
 
115
  ### Faster R-CNN Models
116
 
117
  | Model | Dataset | Training Type | Size | Description | Download |
118
  |-------|---------|---------------|------|-------------|----------|
119
+ | **faster-rcnn-bdd-vanilla.pth** | BDD 100K | Vanilla | 315MB | High-accuracy baseline for autonomous driving. | [Download](./faster-rcnn-bdd-vanilla.pth) |
120
+ | **faster-rcnn-bdd-finetune.pth**| BDD 100K | **Fine-tuned** | 158MB | **Robust** high-accuracy model for OoD scenarios. | [Download](./faster-rcnn-bdd-finetune.pth) |
121
+ | **faster-rcnn-voc-vanilla.pth** | Pascal VOC | Vanilla | 315MB | High-accuracy baseline for general object detection. | [Download](./faster-rcnn-voc-vanilla.pth) |
122
+ | **faster-rcnn-voc-finetune.pth**| Pascal VOC | **Fine-tuned** | 158MB | **Robust** high-accuracy general purpose model. | [Download](./faster-rcnn-voc-finetune.pth) |
123
+
124
+ *(Note: The KITTI models mentioned in the old README are not explicitly detailed in the provided paper text, so I have focused on BDD and VOC, which are central to the experiments shown.)*
125
 
126
  ## 🚀 Quick Start
127
 
 
130
  ```python
131
  from ultralytics import YOLO
132
 
133
+ # Load our robust, fine-tuned YOLOv10 model
134
+ model = YOLO('yolov10-bdd-finetune.pt')
135
 
136
  # Run inference
137
+ results = model('path/to/your/image.jpg')
138
 
139
  # Process results
140
  for result in results:
 
147
 
148
  ```python
149
  import torch
150
+ import torchvision
151
+ from torchvision.models.detection import fasterrcnn_resnet50_fpn
152
+
153
+ # NOTE: The provided .pth files are state_dicts.
154
+ # You need to load them into a model instance.
155
+ # Example for a vanilla VOC model:
156
+ num_classes = 21 # 20 classes + background
157
+ model = fasterrcnn_resnet50_fpn(weights=None, num_classes=num_classes)
158
+ model.load_state_dict(torch.load('faster-rcnn-voc-vanilla.pth'))
159
  model.eval()
160
 
161
+ # Run inference on a pre-processed image tensor
162
  with torch.no_grad():
163
  predictions = model(image_tensor)
164
 
 
168
  labels = predictions[0]['labels']
169
  ```
170
 
171
+ ## 📄 Citation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
172
 
173
+ If you use our models, datasets, or methodology in your research, please cite our paper:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
174
 
175
  ```bibtex
176
+ @inproceedings{he2025mitigating,
177
+ title={{Mitigating Hallucinations in Object Detection Models: A Revisit to Out-of-Distribution Detection}},
178
+ author={He, Weicheng and Wu, Changshun and Cheng, Chih-Hong and Huang, Xiaowei and Bensalem, Saddek},
179
+ booktitle={To Be Published},
180
+ year={2025}
 
 
 
 
 
 
 
 
181
  }
182
  ```
183
 
184
+ Please also consider citing the original works for the model architectures and datasets used.
185
 
186
+ ## 📜 License
 
 
187
 
188
+ This work is released under the MIT License.