Improve model card: Add pipeline tag, links, usage, and expand details (#3)
Browse files- Improve model card: Add pipeline tag, links, usage, and expand details (9d67d876d0fda7a54e542fce73039cb4bb304973)
Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>
README.md
CHANGED
|
@@ -1,36 +1,48 @@
|
|
| 1 |
---
|
| 2 |
-
library_name: transformers
|
| 3 |
-
license: apache-2.0
|
| 4 |
base_model: facebook/deformable-detr-box-supervised
|
| 5 |
-
tags:
|
| 6 |
-
- generated_from_trainer
|
| 7 |
datasets:
|
| 8 |
- Voxel51/fisheye8k
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
model-index:
|
| 10 |
- name: fisheye8k_facebook_deformable-detr-box-supervised
|
| 11 |
results: []
|
| 12 |
---
|
| 13 |
|
| 14 |
-
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
| 15 |
-
should probably proofread and complete it, then remove this comment. -->
|
| 16 |
-
|
| 17 |
# fisheye8k_facebook_deformable-detr-box-supervised
|
| 18 |
|
| 19 |
-
This model is a fine-tuned version of [facebook/deformable-detr-box-supervised](https://huggingface.co/facebook/deformable-detr-box-supervised) on the
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 20 |
It achieves the following results on the evaluation set:
|
| 21 |
- Loss: 3.5085
|
| 22 |
|
| 23 |
## Model description
|
| 24 |
|
| 25 |
-
|
| 26 |
|
| 27 |
## Intended uses & limitations
|
| 28 |
|
| 29 |
-
|
|
|
|
|
|
|
| 30 |
|
| 31 |
## Training and evaluation data
|
| 32 |
|
| 33 |
-
More
|
| 34 |
|
| 35 |
## Training procedure
|
| 36 |
|
|
@@ -49,17 +61,17 @@ The following hyperparameters were used during training:
|
|
| 49 |
### Training results
|
| 50 |
|
| 51 |
| Training Loss | Epoch | Step | Validation Loss |
|
| 52 |
-
|:-------------:|:-----:|:-----:|:---------------:|
|
| 53 |
-
| 2.551 | 1.0 | 5288 | 2.9515 |
|
| 54 |
-
| 2.4989 | 2.0 | 10576 | 2.9100 |
|
| 55 |
-
| 2.2642 | 3.0 | 15864 | 2.9280 |
|
| 56 |
-
| 5.2218 | 4.0 | 21152 | 7.3972 |
|
| 57 |
-
| 3.69 | 5.0 | 26440 | 2.8083 |
|
| 58 |
-
| 3.3462 | 6.0 | 31728 | 5.0976 |
|
| 59 |
-
| 2.5944 | 7.0 | 37016 | 4.1669 |
|
| 60 |
-
| 2.5709 | 8.0 | 42304 | 3.6812 |
|
| 61 |
-
| 2.6956 | 9.0 | 47592 | 4.0466 |
|
| 62 |
-
| 2.5195 | 10.0 | 52880 | 3.5085 |
|
| 63 |
|
| 64 |
|
| 65 |
### Framework versions
|
|
@@ -69,4 +81,56 @@ The following hyperparameters were used during training:
|
|
| 69 |
- Datasets 3.2.0
|
| 70 |
- Tokenizers 0.21.0
|
| 71 |
|
| 72 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
|
|
|
|
|
|
| 2 |
base_model: facebook/deformable-detr-box-supervised
|
|
|
|
|
|
|
| 3 |
datasets:
|
| 4 |
- Voxel51/fisheye8k
|
| 5 |
+
library_name: transformers
|
| 6 |
+
license: mit
|
| 7 |
+
tags:
|
| 8 |
+
- generated_from_trainer
|
| 9 |
+
- computer-vision
|
| 10 |
+
- autonomous-driving
|
| 11 |
+
- data-centric-ai
|
| 12 |
+
- open-vocabulary
|
| 13 |
+
- deformable-detr
|
| 14 |
+
pipeline_tag: object-detection
|
| 15 |
model-index:
|
| 16 |
- name: fisheye8k_facebook_deformable-detr-box-supervised
|
| 17 |
results: []
|
| 18 |
---
|
| 19 |
|
|
|
|
|
|
|
|
|
|
| 20 |
# fisheye8k_facebook_deformable-detr-box-supervised
|
| 21 |
|
| 22 |
+
This model is a fine-tuned version of [facebook/deformable-detr-box-supervised](https://huggingface.co/facebook/deformable-detr-box-supervised) on the [Fisheye8K dataset](https://huggingface.co/datasets/Voxel51/fisheye8k). It was developed within the framework of the **Mcity Data Engine** project.
|
| 23 |
+
|
| 24 |
+
The **Mcity Data Engine** provides modules for the complete data-based development cycle for AI algorithms, especially focusing on identifying rare and novel classes through an open-vocabulary data selection process within Intelligent Transportation Systems (ITS). This model is a practical application of the data engine for improving object detection of vulnerable road users and other transportation-related entities.
|
| 25 |
+
|
| 26 |
+
- **Paper**: [Mcity Data Engine: Iterative Model Improvement Through Open-Vocabulary Data Selection](https://huggingface.co/papers/2504.21614)
|
| 27 |
+
- **Project Page**: [Mcity Data Engine Documentation](https://mcity.github.io/mcity_data_engine/)
|
| 28 |
+
- **GitHub Repository**: [mcity/mcity_data_engine](https://github.com/mcity/mcity_data_engine)
|
| 29 |
+
|
| 30 |
It achieves the following results on the evaluation set:
|
| 31 |
- Loss: 3.5085
|
| 32 |
|
| 33 |
## Model description
|
| 34 |
|
| 35 |
+
This model is designed for object detection in traffic scenarios, particularly for identifying classes like `Bus`, `Bike`, `Car`, `Pedestrian`, and `Truck` in fisheye camera imagery. It leverages the Deformable DETR architecture and is fine-tuned using the iterative data improvement methodology proposed in the Mcity Data Engine project. Its goal is to improve the detection of long-tail and novel classes in large amounts of unlabeled data, which is especially challenging in Intelligent Transportation Systems.
|
| 36 |
|
| 37 |
## Intended uses & limitations
|
| 38 |
|
| 39 |
+
This model is intended for research and development in autonomous driving and intelligent transportation systems, specifically for improving the detection of long-tail and rare classes within the Mcity Data Engine's iterative model improvement pipeline.
|
| 40 |
+
|
| 41 |
+
Limitations include its training on specific fisheye camera data, which may affect generalization to other camera types or environments without further fine-tuning. The training process focuses on open-vocabulary data selection, meaning its performance on very common, standard objects might be comparable to other models, but its strength lies in identifying more challenging or rare instances.
|
| 42 |
|
| 43 |
## Training and evaluation data
|
| 44 |
|
| 45 |
+
The model was trained on the [Voxel51/fisheye8k](https://huggingface.co/datasets/Voxel51/fisheye8k) dataset. This dataset is used as part of the Mcity Data Engine's workflow, specifically for demonstrating "Embedding Selection" to determine both representative and rare samples for iterative model improvement. More details about the data curation and selection process can be found in the associated paper and the Mcity Data Engine GitHub repository.
|
| 46 |
|
| 47 |
## Training procedure
|
| 48 |
|
|
|
|
| 61 |
### Training results
|
| 62 |
|
| 63 |
| Training Loss | Epoch | Step | Validation Loss |
|
| 64 |
+
|:-------------:|:-----:|:-----:|:---------------:|\
|
| 65 |
+
| 2.551 | 1.0 | 5288 | 2.9515 |\
|
| 66 |
+
| 2.4989 | 2.0 | 10576 | 2.9100 |\
|
| 67 |
+
| 2.2642 | 3.0 | 15864 | 2.9280 |\
|
| 68 |
+
| 5.2218 | 4.0 | 21152 | 7.3972 |\
|
| 69 |
+
| 3.69 | 5.0 | 26440 | 2.8083 |\
|
| 70 |
+
| 3.3462 | 6.0 | 31728 | 5.0976 |\
|
| 71 |
+
| 2.5944 | 7.0 | 37016 | 4.1669 |\
|
| 72 |
+
| 2.5709 | 8.0 | 42304 | 3.6812 |\
|
| 73 |
+
| 2.6956 | 9.0 | 47592 | 4.0466 |\
|
| 74 |
+
| 2.5195 | 10.0 | 52880 | 3.5085 |\
|
| 75 |
|
| 76 |
|
| 77 |
### Framework versions
|
|
|
|
| 81 |
- Datasets 3.2.0
|
| 82 |
- Tokenizers 0.21.0
|
| 83 |
|
| 84 |
+
## Sample Usage
|
| 85 |
+
|
| 86 |
+
You can use this model directly with the Hugging Face `transformers` library for object detection:
|
| 87 |
+
|
| 88 |
+
```python
|
| 89 |
+
from transformers import AutoImageProcessor, DeformableDetrForObjectDetection
|
| 90 |
+
import torch
|
| 91 |
+
from PIL import Image
|
| 92 |
+
import requests
|
| 93 |
+
|
| 94 |
+
# Load image (replace with your image path or URL)
|
| 95 |
+
url = "http://images.cocodataset.org/val2017/000000039769.jpg" # Example image from COCO
|
| 96 |
+
image = Image.open(requests.get(url, stream=True).raw).convert("RGB")
|
| 97 |
+
|
| 98 |
+
# Load the image processor and model
|
| 99 |
+
image_processor = AutoImageProcessor.from_pretrained("mcity-data-engine/fisheye8k_facebook_deformable-detr-box-supervised")
|
| 100 |
+
model = DeformableDetrForObjectDetection.from_pretrained("mcity-data-engine/fisheye8k_facebook_deformable-detr-box-supervised")
|
| 101 |
+
|
| 102 |
+
# Prepare inputs
|
| 103 |
+
inputs = image_processor(images=image, return_tensors="pt")
|
| 104 |
+
|
| 105 |
+
# Perform inference
|
| 106 |
+
with torch.no_grad():
|
| 107 |
+
outputs = model(**inputs)
|
| 108 |
+
|
| 109 |
+
# You can further process the outputs (logits, boxes, etc.) for visualization or evaluation.
|
| 110 |
+
# For example, to get predicted bounding boxes:
|
| 111 |
+
target_sizes = torch.tensor([image.size[::-1]])
|
| 112 |
+
results = image_processor.post_process_object_detection(outputs, target_sizes=target_sizes, threshold=0.5)[0]
|
| 113 |
+
|
| 114 |
+
print(f"Detected objects for image of size {image.size}:")
|
| 115 |
+
for score, label, box in zip(results["scores"], results["labels"], results["boxes"]):
|
| 116 |
+
box = [round(i, 2) for i in box.tolist()]
|
| 117 |
+
print(
|
| 118 |
+
f" Detected {model.config.id2label[label.item()]} with confidence "
|
| 119 |
+
f"{round(score.item(), 3)} at location {box}"
|
| 120 |
+
)
|
| 121 |
+
```
|
| 122 |
+
|
| 123 |
+
## Acknowledgements
|
| 124 |
+
Mcity would like to thank Amazon Web Services (AWS) for their pivotal role in providing the cloud infrastructure on which the Data Engine depends. We couldn’t have done it without their tremendous support!
|
| 125 |
+
|
| 126 |
+
## Citation
|
| 127 |
+
If you use the Mcity Data Engine in your research, feel free to cite the project:
|
| 128 |
+
|
| 129 |
+
```bibtex
|
| 130 |
+
@article{bogdoll2025mcitydataengine,
|
| 131 |
+
title={Mcity Data Engine},
|
| 132 |
+
author={Bogdoll, Daniel and Anata, Rajanikant Patnaik and Stevens, Gregory},
|
| 133 |
+
journal={GitHub. Note: https://github.com/mcity/mcity_data_engine},
|
| 134 |
+
year={2025}
|
| 135 |
+
}
|
| 136 |
+
```
|