danielbogdoll nielsr HF Staff commited on
Commit
5dd0ed0
·
verified ·
1 Parent(s): a2b9e7f

Improve model card: Add pipeline tag, links, usage, and expand details (#3)

Browse files

- Improve model card: Add pipeline tag, links, usage, and expand details (9d67d876d0fda7a54e542fce73039cb4bb304973)


Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +87 -23
README.md CHANGED
@@ -1,36 +1,48 @@
1
  ---
2
- library_name: transformers
3
- license: apache-2.0
4
  base_model: facebook/deformable-detr-box-supervised
5
- tags:
6
- - generated_from_trainer
7
  datasets:
8
  - Voxel51/fisheye8k
 
 
 
 
 
 
 
 
 
 
9
  model-index:
10
  - name: fisheye8k_facebook_deformable-detr-box-supervised
11
  results: []
12
  ---
13
 
14
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
- should probably proofread and complete it, then remove this comment. -->
16
-
17
  # fisheye8k_facebook_deformable-detr-box-supervised
18
 
19
- This model is a fine-tuned version of [facebook/deformable-detr-box-supervised](https://huggingface.co/facebook/deformable-detr-box-supervised) on the generator dataset.
 
 
 
 
 
 
 
20
  It achieves the following results on the evaluation set:
21
  - Loss: 3.5085
22
 
23
  ## Model description
24
 
25
- More information needed
26
 
27
  ## Intended uses & limitations
28
 
29
- More information needed
 
 
30
 
31
  ## Training and evaluation data
32
 
33
- More information needed
34
 
35
  ## Training procedure
36
 
@@ -49,17 +61,17 @@ The following hyperparameters were used during training:
49
  ### Training results
50
 
51
  | Training Loss | Epoch | Step | Validation Loss |
52
- |:-------------:|:-----:|:-----:|:---------------:|
53
- | 2.551 | 1.0 | 5288 | 2.9515 |
54
- | 2.4989 | 2.0 | 10576 | 2.9100 |
55
- | 2.2642 | 3.0 | 15864 | 2.9280 |
56
- | 5.2218 | 4.0 | 21152 | 7.3972 |
57
- | 3.69 | 5.0 | 26440 | 2.8083 |
58
- | 3.3462 | 6.0 | 31728 | 5.0976 |
59
- | 2.5944 | 7.0 | 37016 | 4.1669 |
60
- | 2.5709 | 8.0 | 42304 | 3.6812 |
61
- | 2.6956 | 9.0 | 47592 | 4.0466 |
62
- | 2.5195 | 10.0 | 52880 | 3.5085 |
63
 
64
 
65
  ### Framework versions
@@ -69,4 +81,56 @@ The following hyperparameters were used during training:
69
  - Datasets 3.2.0
70
  - Tokenizers 0.21.0
71
 
72
- Mcity Data Engine: https://arxiv.org/abs/2504.21614
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
 
2
  base_model: facebook/deformable-detr-box-supervised
 
 
3
  datasets:
4
  - Voxel51/fisheye8k
5
+ library_name: transformers
6
+ license: mit
7
+ tags:
8
+ - generated_from_trainer
9
+ - computer-vision
10
+ - autonomous-driving
11
+ - data-centric-ai
12
+ - open-vocabulary
13
+ - deformable-detr
14
+ pipeline_tag: object-detection
15
  model-index:
16
  - name: fisheye8k_facebook_deformable-detr-box-supervised
17
  results: []
18
  ---
19
 
 
 
 
20
  # fisheye8k_facebook_deformable-detr-box-supervised
21
 
22
+ This model is a fine-tuned version of [facebook/deformable-detr-box-supervised](https://huggingface.co/facebook/deformable-detr-box-supervised) on the [Fisheye8K dataset](https://huggingface.co/datasets/Voxel51/fisheye8k). It was developed within the framework of the **Mcity Data Engine** project.
23
+
24
+ The **Mcity Data Engine** provides modules for the complete data-based development cycle for AI algorithms, especially focusing on identifying rare and novel classes through an open-vocabulary data selection process within Intelligent Transportation Systems (ITS). This model is a practical application of the data engine for improving object detection of vulnerable road users and other transportation-related entities.
25
+
26
+ - **Paper**: [Mcity Data Engine: Iterative Model Improvement Through Open-Vocabulary Data Selection](https://huggingface.co/papers/2504.21614)
27
+ - **Project Page**: [Mcity Data Engine Documentation](https://mcity.github.io/mcity_data_engine/)
28
+ - **GitHub Repository**: [mcity/mcity_data_engine](https://github.com/mcity/mcity_data_engine)
29
+
30
  It achieves the following results on the evaluation set:
31
  - Loss: 3.5085
32
 
33
  ## Model description
34
 
35
+ This model is designed for object detection in traffic scenarios, particularly for identifying classes like `Bus`, `Bike`, `Car`, `Pedestrian`, and `Truck` in fisheye camera imagery. It leverages the Deformable DETR architecture and is fine-tuned using the iterative data improvement methodology proposed in the Mcity Data Engine project. Its goal is to improve the detection of long-tail and novel classes in large amounts of unlabeled data, which is especially challenging in Intelligent Transportation Systems.
36
 
37
  ## Intended uses & limitations
38
 
39
+ This model is intended for research and development in autonomous driving and intelligent transportation systems, specifically for improving the detection of long-tail and rare classes within the Mcity Data Engine's iterative model improvement pipeline.
40
+
41
+ Limitations include its training on specific fisheye camera data, which may affect generalization to other camera types or environments without further fine-tuning. The training process focuses on open-vocabulary data selection, meaning its performance on very common, standard objects might be comparable to other models, but its strength lies in identifying more challenging or rare instances.
42
 
43
  ## Training and evaluation data
44
 
45
+ The model was trained on the [Voxel51/fisheye8k](https://huggingface.co/datasets/Voxel51/fisheye8k) dataset. This dataset is used as part of the Mcity Data Engine's workflow, specifically for demonstrating "Embedding Selection" to determine both representative and rare samples for iterative model improvement. More details about the data curation and selection process can be found in the associated paper and the Mcity Data Engine GitHub repository.
46
 
47
  ## Training procedure
48
 
 
61
  ### Training results
62
 
63
  | Training Loss | Epoch | Step | Validation Loss |
64
+ |:-------------:|:-----:|:-----:|:---------------:|\
65
+ | 2.551 | 1.0 | 5288 | 2.9515 |\
66
+ | 2.4989 | 2.0 | 10576 | 2.9100 |\
67
+ | 2.2642 | 3.0 | 15864 | 2.9280 |\
68
+ | 5.2218 | 4.0 | 21152 | 7.3972 |\
69
+ | 3.69 | 5.0 | 26440 | 2.8083 |\
70
+ | 3.3462 | 6.0 | 31728 | 5.0976 |\
71
+ | 2.5944 | 7.0 | 37016 | 4.1669 |\
72
+ | 2.5709 | 8.0 | 42304 | 3.6812 |\
73
+ | 2.6956 | 9.0 | 47592 | 4.0466 |\
74
+ | 2.5195 | 10.0 | 52880 | 3.5085 |\
75
 
76
 
77
  ### Framework versions
 
81
  - Datasets 3.2.0
82
  - Tokenizers 0.21.0
83
 
84
+ ## Sample Usage
85
+
86
+ You can use this model directly with the Hugging Face `transformers` library for object detection:
87
+
88
+ ```python
89
+ from transformers import AutoImageProcessor, DeformableDetrForObjectDetection
90
+ import torch
91
+ from PIL import Image
92
+ import requests
93
+
94
+ # Load image (replace with your image path or URL)
95
+ url = "http://images.cocodataset.org/val2017/000000039769.jpg" # Example image from COCO
96
+ image = Image.open(requests.get(url, stream=True).raw).convert("RGB")
97
+
98
+ # Load the image processor and model
99
+ image_processor = AutoImageProcessor.from_pretrained("mcity-data-engine/fisheye8k_facebook_deformable-detr-box-supervised")
100
+ model = DeformableDetrForObjectDetection.from_pretrained("mcity-data-engine/fisheye8k_facebook_deformable-detr-box-supervised")
101
+
102
+ # Prepare inputs
103
+ inputs = image_processor(images=image, return_tensors="pt")
104
+
105
+ # Perform inference
106
+ with torch.no_grad():
107
+ outputs = model(**inputs)
108
+
109
+ # You can further process the outputs (logits, boxes, etc.) for visualization or evaluation.
110
+ # For example, to get predicted bounding boxes:
111
+ target_sizes = torch.tensor([image.size[::-1]])
112
+ results = image_processor.post_process_object_detection(outputs, target_sizes=target_sizes, threshold=0.5)[0]
113
+
114
+ print(f"Detected objects for image of size {image.size}:")
115
+ for score, label, box in zip(results["scores"], results["labels"], results["boxes"]):
116
+ box = [round(i, 2) for i in box.tolist()]
117
+ print(
118
+ f" Detected {model.config.id2label[label.item()]} with confidence "
119
+ f"{round(score.item(), 3)} at location {box}"
120
+ )
121
+ ```
122
+
123
+ ## Acknowledgements
124
+ Mcity would like to thank Amazon Web Services (AWS) for their pivotal role in providing the cloud infrastructure on which the Data Engine depends. We couldn’t have done it without their tremendous support!
125
+
126
+ ## Citation
127
+ If you use the Mcity Data Engine in your research, feel free to cite the project:
128
+
129
+ ```bibtex
130
+ @article{bogdoll2025mcitydataengine,
131
+ title={Mcity Data Engine},
132
+ author={Bogdoll, Daniel and Anata, Rajanikant Patnaik and Stevens, Gregory},
133
+ journal={GitHub. Note: https://github.com/mcity/mcity_data_engine},
134
+ year={2025}
135
+ }
136
+ ```