BMD-45 / README.md

Update README.md

ec06f3d verified about 2 months ago

8.63 kB

	---
	license: apache-2.0
	language: en
	tags:
	- object-detection
	- yolo
	- yolo11
	- rtdetr
	- rfdetr
	- pytorch
	- urban-traffic
	datasets:
	- BMD-45
	pipeline_tag: object-detection
	pretty_name: BMD-45
	---

	<div align="center">

	<!-- Banner Image -->
	<img width="50%" src="banner.png" alt="BMD-45 Banner">

	<div align="center">
	<a href="https://arxiv.org/abs/2511.02563" ><img src="assets/arxiv-logomark-small.svg" height="16" width="11.96" style="display: inline-block; vertical-align: middle; margin: 2px;"> <b style="display: inline-block;"> ArXiv </b></a> \|
	<a href="https://huggingface.co/datasets/iisc-aim/BMD-45"><img src="https://huggingface.co/front/assets/huggingface_logo-noborder.svg" height="16" width="16" style="display: inline-block; vertical-align: middle; margin: 2px;"><b style="display: inline-block;"> Dataset </b></a>
	</div>

	<!-- Performance Graphic -->
	<img width="95%" src="assets/Figure-3.png" alt="Performance on BMD-45">

	<p><em>
	Models trained on BMD-45 deliver up to <b>2.5x</b> performance improvement over UA-DETRAC baselines
	</em></p>

	</div>

	# BMD-45 Vehicle Detection Models — AIM@IISc

	High-quality object detection models built for Indian road traffic — where vehicle appearance, traffic density, and scene complexity differ significantly from Western datasets like COCO.

	These models are trained on the BMD-45 dataset, featuring:

	- 13 road-relevant vehicle categories
	- Real urban environments across India
	- Diverse viewpoints, lighting, occlusion & density variations
	- Multi-user labeled data with consensus filtering (MV / ST variants)

	We currently release six SOTA detector variants trained on the dataset:

	\| Model Family \| Sizes \| Strengths \|
	\| ------------- \| ----- \| ------------------------------------------------------- \|
	\| YOLOv12 \| S, X \| Fast + lightweight deployment \|
	\| RT-DETRv2 \| X \| High-accuracy, transformer-based real-time detection \|
	\| RF-DETR \| X \| Region-focused DETR with strong small-object detection \|
	\| D-FINE \| X \| Fine-grained detection with iterative refinement \|

	> Designed for Indian mobility — adaptable to real city surveillance, roadside cameras, safety monitoring, and ITS applications.

	Model Dataset -> [https://huggingface.co/datasets/iisc-aim/BMD-45](https://huggingface.co/datasets/iisc-aim/BMD-45)

	---

	## Attribution

	<!-- More technical details about the dataset and models are available in our [Technical Report available on arXiv](https://arxiv.org/abs/2511.02563).
	If you use these datasets or models, kindly cite the following:
	The BMD-45 Dataset and Models: Towards Image Annotations and Accurate Vision Models for Indian Traffic, Dataset Release, BMD-45-v1.0,
	Akash Sharma, Chinmay Mhatre, Sankalp Gawali, Ruthvik Bokkasam, Brij Kishore, Vishwajeet Pattanaik, Tarun Rambha, Abdul R. Pinjari, Vijay Kovvali, Anirban Chakraborty, Punit Rathore, Raghu Krishnapuram and Yogesh Simmhan,
	Technical Report, Indian Institute of Science, [arXiv:2511.02563](https://arxiv.org/abs/2511.02563), Nov, 2025. -->


	```bibtex
	to be added
	```

	---

	### Repository Structure

	- README.md – This file
	- bmd_classes.txt – 13 object classes (one per line)
	- configs/ – Model configuration files
	- YOLOv12-S/
	- `config.yaml` – Training hyperparameters
	- `data.yaml` – Dataset paths and class names
	- YOLOv12-X/
	- `config.yaml` – Training hyperparameters
	- `data.yaml` – Dataset paths and class names
	- RT-DETRv2/
	- `bmd-45-dataset.yaml` – Dataset configuration
	- `rtdetrv2_r101vd_6x_bmd-45.yaml` – Model + training configuration
	- RF-DETR/
	- `config.yaml` – Training hyperparameters
	- D-FINE/
	- `bmd-45-dataset.yaml` – Dataset configuration
	- `dfine_hgnetv2_x_bmd-45.yaml` – Model + training configuration
	- weights/ – Trained model weights
	- YOLOv12-S/ – `best.pt`
	- YOLOv12-X/ – `best.pt`
	- RT-DETRv2/ – `best.pth`
	- RF-DETR/ – `checkpoint_best_total.pth`
	- D-FINE/ – `best_stg1.pth`


	---

	## Classes

	The file `uvh_classes.txt` lists all 14 object categories, one per line:

	\| ID \| Class Name \| Description \|
	\| --- \| --------------- \| ------------------------------------------------------------------------------------------------------------------------------------------------------ \|
	\| 1 \| Hatchback \| Small passenger cars without a protruding rear boot (“dickey”). \|
	\| 2 \| Sedan \| Passenger cars with a low-slung design and a separate protruding rear boot (“dickey”). \|
	\| 3 \| SUV \| Car-like vehicles with high ground clearance, a sturdy body, and no protruding boot. \|
	\| 4 \| MUV \| Large vehicles with three seating rows, combining passenger and cargo functionality. \|
	\| 5 \| Bus \| Large passenger vehicles used for public or private transport, including office shuttles and intercity buses. \|
	\| 6 \| Truck \| Heavy goods carriers with a front cabin and a rear cargo compartment. \|
	\| 7 \| Three-wheeler \| Compact vehicles with one front wheel and two rear wheels, featuring a covered passenger cabin. \|
	\| 8 \| Two-wheeler \| Motorbikes and scooters for single or double riders. Bounding boxes include both vehicle and rider. \|
	\| 9 \| LCV \| Lightweight goods carriers used for short- to medium-distance transport. \|
	\| 10 \| Mini-bus \| Shorter, compact buses with fewer seats; larger than a Tempo Traveller, often featuring a flat front. \|
	\| 11 \| Tempo-traveller \| Medium-sized passenger vans with tall roofs and side windows; larger than vans but smaller than minibuses, with a protruding front. \|
	\| 12 \| Bicycle \| Non-motorized, manually pedalled vehicles including geared, non-geared, women’s, and children’s cycles. Bounding boxes include both vehicle and rider. \|
	\| 13 \| Van \| Medium-sized vehicles for transporting goods or people, typically with a flat front and sliding side doors; smaller than Tempo Travellers. \|

	---

	## Training Hyperparameters and Architecture

	_All models were trained on the BMD-45 dataset with identical batch sizes and consistent augmentation settings for fair comparison._

	\| Setting \| YOLOv12-S \| YOLOv12-X \| RT-DETRv2-X \| D-FINE-X \| RF-DETR-X \|
	\| --- \| --- \| --- \| --- \| --- \| --- \|
	\| Batch Size \| 16 \| 16 \| 16 \| 16 \| 16 \|
	\| Epochs \| 100 \| 100 \| 100 \| 100 \| 100 \|
	\| Learning Rate \| 0.01 \| 0.01 \| 1×10⁻⁴ \| 2.5×10⁻⁴ \| 1×10⁻⁴ \|
	\| Optimizer \| AdamW \| AdamW \| AdamW \| AdamW \| AdamW \|
	\| Weight Decay \| 5×10⁻⁴ \| 5×10⁻⁴ \| 1×10⁻⁴ \| 1.25×10⁻⁴ \| 1×10⁻⁴ \|
	\| AdamW Betas \| (0.937, 0.999) \| (0.937, 0.999) \| (0.9, 0.999) \| (0.9, 0.999) \| (0.9, 0.999) \|
	\| LR Policy \| Cosine \| Cosine \| MultiStep \| MultiStep \| Step LR \|
	\| Warmup \| 3 epochs \| 3 epochs \| 2000-iteration linear warmup \| 500-step linear warmup \| None \|
	\| Warmup Details \| momentum=0.8; bias LR=0.1 \| momentum=0.8; bias LR=0.1 \| momentum untouched; uniform LR ramp \| no bias/momentum overrides \| warmup disabled \|
	\| Augmentation Summary \| HSV, translate=0.1, scale=0.5, flip=0.5, erase=0.4; no mosaic/mixup \| HSV, translate=0.1, scale=0.5, flip=0.5, erase=0.4; no mosaic/mixup \| Photometric, ZoomOut, IoU crop; ops disabled after epoch 151 \| Photometric, ZoomOut, IoU crop, flip, sanitize, resize \| Flip + multi-scale RandomResize/Crop + normalize \|

	---

	## License

	- This repository (models, weights, configs) is released under the Apache License 2.0.
	- _Note:_ The underlying YOLO-family models (e.g., YOLOv12) from Ultralytics are distributed under the GNU AGPL v3.0 (or newer) license.

	---