prasadsachin commited on
Commit
6bed36f
·
verified ·
1 Parent(s): d1115bb

Update README.md with new model card content

Browse files
Files changed (1) hide show
  1. README.md +446 -41
README.md CHANGED
@@ -1,44 +1,449 @@
1
  ---
2
  library_name: keras-hub
3
  ---
4
- This is a [`DFine` model](https://keras.io/api/keras_hub/models/d_fine) uploaded using the KerasHub library and can be used with JAX, TensorFlow, and PyTorch backends.
5
- This model is related to a `ObjectDetector` task.
6
-
7
- Model config:
8
- * **name:** d_fine_backbone
9
- * **trainable:** True
10
- * **backbone:** {'module': 'keras_hub.src.models.hgnetv2.hgnetv2_backbone', 'class_name': 'HGNetV2Backbone', 'config': {'name': 'hg_net_v2_backbone', 'trainable': True, 'depths': [3, 4, 6, 3], 'embedding_size': 32, 'hidden_sizes': [128, 256, 512, 1024], 'stem_channels': [3, 16, 16], 'hidden_act': 'relu', 'use_learnable_affine_block': True, 'stackwise_stage_filters': [[16, 16, 64, 1, 3, 3], [64, 32, 256, 1, 3, 3], [256, 64, 512, 2, 3, 5], [512, 128, 1024, 1, 3, 5]], 'apply_downsample': [False, True, True, True], 'use_lightweight_conv_block': [False, False, True, True], 'image_shape': [None, None, 3], 'out_features': ['stage2', 'stage3', 'stage4'], 'data_format': 'channels_last'}, 'registered_name': 'keras_hub>HGNetV2Backbone'}
11
- * **decoder_in_channels:** [256, 256, 256]
12
- * **encoder_hidden_dim:** 256
13
- * **num_labels:** 80
14
- * **num_denoising:** 100
15
- * **learn_initial_query:** False
16
- * **num_queries:** 300
17
- * **anchor_image_size:** [640, 640]
18
- * **feat_strides:** [8, 16, 32]
19
- * **num_feature_levels:** 3
20
- * **hidden_dim:** 256
21
- * **encoder_in_channels:** [256, 512, 1024]
22
- * **encode_proj_layers:** [2]
23
- * **num_attention_heads:** 8
24
- * **encoder_ffn_dim:** 1024
25
- * **num_encoder_layers:** 1
26
- * **hidden_expansion:** 0.5
27
- * **depth_multiplier:** 0.34
28
- * **eval_idx:** -1
29
- * **box_noise_scale:** 1.0
30
- * **label_noise_ratio:** 0.5
31
- * **labels:** None
32
- * **num_decoder_layers:** 3
33
- * **decoder_attention_heads:** 8
34
- * **decoder_ffn_dim:** 1024
35
- * **decoder_method:** default
36
- * **decoder_n_points:** [3, 6, 3]
37
- * **lqe_hidden_dim:** 64
38
- * **num_lqe_layers:** 2
39
- * **seed:** 0
40
- * **image_shape:** [None, None, 3]
41
- * **data_format:** channels_last
42
- * **out_features:** ['stage2', 'stage3', 'stage4']
43
-
44
- This model card has been generated automatically and should be completed by the model author. See [Model Cards documentation](https://huggingface.co/docs/hub/model-cards) for more information.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  library_name: keras-hub
3
  ---
4
+ ### Model Overview
5
+ # Model Summary
6
+
7
+ D-FINE is a family of lightweight, real-time object detection models built on the DETR (DEtection TRansformer) architecture. It achieves outstanding localization precision by redefining the bounding box regression task. D-FINE is a powerful object detector designed for a wide range of computer vision tasks. It's trained on massive image datasets, enabling it to excel at identifying and localizing objects with high accuracy and speed. D-FINE offers a balance of high performance and computational efficiency, making it suitable for both research and deployment in various real-time applications.
8
+
9
+ Key Features:
10
+
11
+ * Transformer-based Architecture: A modern, efficient design based on the DETR framework for direct set prediction of objects.
12
+ * Open Source Code: Code is publicly available, promoting accessibility and innovation.
13
+ * Strong Performance: Achieves state-of-the-art results on object detection benchmarks like COCO for its size.
14
+ * Multiple Sizes: Comes in various sizes (e.g., Nano, Small, Large, X-Large) to fit different hardware capabilities.
15
+ * Advanced Bounding Box Refinement: Instead of predicting fixed coordinates, it iteratively refines probability distributions for precise object localization using Fine-grained Distribution Refinement (FDR).
16
+
17
+ Training Strategies:
18
+
19
+ D-FINE is pre-trained on large and diverse datasets like COCO and Objects365. The training process utilizes Global Optimal Localization Self-Distillation (GO-LSD), a bidirectional optimization strategy that transfers localization knowledge from refined distributions in deeper layers to shallower layers. This accelerates convergence and improves the overall performance of the model.
20
+
21
+ Weights are released under the [Apache 2.0 License](https://www.google.com/search?q=https://github.com/Peterande/D-FINE/blob/main/LICENSE).
22
+
23
+ ## Links
24
+
25
+ * [D-FINE Quickstart Notebook](https://www.kaggle.com/code/harshaljanjani/d-fine-quickstart-notebook)
26
+ * [D-FINE API Documentation](https://keras.io/keras_hub/api/models/d_fine/)
27
+ * [D-FINE Model Card](https://arxiv.org/abs/2410.13842)
28
+ * [KerasHub Beginner Guide](https://keras.io/guides/keras_hub/getting_started/)
29
+ * [KerasHub Model Publishing Guide](https://keras.io/guides/keras_hub/upload/)
30
+
31
+ ## Installation
32
+
33
+ Keras and KerasHub can be installed with:
34
+
35
+ ```
36
+ pip install -U -q keras-hub
37
+ pip install -U -q keras
38
+ ```
39
+
40
+ Jax, TensorFlow, and Torch come preinstalled in Kaggle Notebooks. For instructions on installing them in another environment see the [Keras Getting Started](https://keras.io/getting_started/) page.
41
+
42
+ ## Available D-FINE Presets
43
+ The following model checkpoints are provided by the Keras team. Full code examples for each are available below.
44
+ | Preset |  Parameters | Description |
45
+ |--------|------------|-------------|
46
+ | dfine_nano_coco | 3.79M | D-FINE Nano model, the smallest variant in the family, pretrained on the COCO dataset. Ideal for applications where computational resources are limited. |
47
+ | dfine_small_coco | 10.33M | D-FINE Small model pretrained on the COCO dataset. Offers a balance between performance and computational efficiency. |
48
+ | dfine_medium_coco | 19.62M | D-FINE Medium model pretrained on the COCO dataset. A solid baseline with strong performance for general-purpose object detection. |
49
+ | dfine_large_coco | 31.34M | D-FINE Large model pretrained on the COCO dataset. Provides high accuracy and is suitable for more demanding tasks. |
50
+ | dfine_xlarge_coco | 62.83M | D-FINE X-Large model, the largest COCO-pretrained variant, designed for state-of-the-art performance where accuracy is the top priority. |
51
+ | dfine_small_obj365 | 10.62M | D-FINE Small model pretrained on the large-scale Objects365 dataset, enhancing its ability to recognize a wider variety of objects. |
52
+ | dfine_medium_obj365 | 19.99M | D-FINE Medium model pretrained on the Objects365 dataset. Benefits from a larger and more diverse pretraining corpus. |
53
+ | dfine_large_obj365 | 31.86M | D-FINE Large model pretrained on the Objects365 dataset for improved generalization and performance on diverse object categories. |
54
+ | dfine_xlarge_obj365 | 63.35M | D-FINE X-Large model pretrained on the Objects365 dataset, offering maximum performance by leveraging a vast number of object categories during pretraining. |
55
+ | dfine_small_obj2coco | 10.33M | D-FINE Small model first pretrained on Objects365 and then fine-tuned on COCO, combining broad feature learning with benchmark-specific adaptation. |
56
+ | dfine_medium_obj2coco | 19.62M | D-FINE Medium model using a two-stage training process: pretraining on Objects365 followed by fine-tuning on COCO. |
57
+ | dfine_large_obj2coco_e25 | 31.34M | D-FINE Large model pretrained on Objects365 and then fine-tuned on COCO for 25 epochs. A high-performance model with specialized tuning. |
58
+ | dfine_xlarge_obj2coco | 62.83M | D-FINE X-Large model, pretrained on Objects365 and fine-tuned on COCO, representing the most powerful model in this series for COCO-style tasks. |
59
+
60
+ ## Example Usage
61
+ ### Imports
62
+ ```python
63
+ import keras
64
+ import keras_hub
65
+ import numpy as np
66
+ from keras_hub.models import DFineBackbone
67
+ from keras_hub.models import DFineObjectDetector
68
+ from keras_hub.models import HGNetV2Backbone
69
+ ```
70
+
71
+ ### Load a Pretrained Model
72
+ Use `from_preset()` to load a D-FINE model with pretrained weights.
73
+ ```python
74
+ object_detector = DFineObjectDetector.from_preset(
75
+ "dfine_small_coco"
76
+ )
77
+ ```
78
+
79
+ ### Make a Prediction
80
+ Call `predict()` on a batch of images. The images will be automatically preprocessed.
81
+ ```python
82
+ # Create a random image.
83
+ image = np.random.uniform(size=(1, 256, 256, 3)).astype("float32")
84
+
85
+ # Make predictions.
86
+ predictions = object_detector.predict(image)
87
+
88
+ # The output is a dictionary containing boxes, labels, confidence scores,
89
+ # and the number of detections.
90
+ print(predictions["boxes"].shape)
91
+ print(predictions["labels"].shape)
92
+ print(predictions["confidence"].shape)
93
+ print(predictions["num_detections"])
94
+ ```
95
+
96
+ ### Fine-Tune a Pre-trained Model
97
+ You can load a pretrained backbone and attach a new detection head for a different number of classes.
98
+ ```python
99
+ # Load a pretrained backbone.
100
+ backbone = DFineBackbone.from_preset(
101
+ "dfine_small_coco"
102
+ )
103
+
104
+ # Create a new detector with a different number of classes for fine-tuning.
105
+ finetuning_detector = DFineObjectDetector(
106
+ backbone=backbone,
107
+ num_classes=10 # Example: fine-tuning on a new dataset with 10 classes
108
+ )
109
+
110
+ # The `finetuning_detector` is now ready to be compiled and trained on a new dataset.
111
+ ```
112
+
113
+ ### Create a Model From Scratch
114
+ You can also build a D-FINE detector by first creating its components, such as the underlying `HGNetV2Backbone`.
115
+ ```python
116
+ # 1. Define a base backbone for feature extraction.
117
+ hgnetv2_backbone = HGNetV2Backbone(
118
+ stem_channels=[3, 16, 16],
119
+ stackwise_stage_filters=[
120
+ [16, 16, 64, 1, 3, 3],
121
+ [64, 32, 256, 1, 3, 3],
122
+ [256, 64, 512, 2, 3, 5],
123
+ [512, 128, 1024, 1, 3, 5],
124
+ ],
125
+ apply_downsample=[False, True, True, True],
126
+ use_lightweight_conv_block=[False, False, True, True],
127
+ depths=[1, 1, 2, 1],
128
+ hidden_sizes=[64, 256, 512, 1024],
129
+ embedding_size=16,
130
+ image_shape=(256, 256, 3),
131
+ out_features=["stage3", "stage4"],
132
+ )
133
+
134
+ # 2. Create the D-FINE backbone, which includes the hybrid encoder and decoder.
135
+ d_fine_backbone = DFineBackbone(
136
+ backbone=hgnetv2_backbone,
137
+ decoder_in_channels=[128, 128],
138
+ encoder_hidden_dim=128,
139
+ num_denoising=0, # Denoising is off
140
+ num_labels=80,
141
+ hidden_dim=128,
142
+ learn_initial_query=False,
143
+ num_queries=300,
144
+ anchor_image_size=(256, 256),
145
+ feat_strides=[16, 32],
146
+ num_feature_levels=2,
147
+ encoder_in_channels=[512, 1024],
148
+ encode_proj_layers=[1],
149
+ num_attention_heads=8,
150
+ encoder_ffn_dim=512,
151
+ num_encoder_layers=1,
152
+ hidden_expansion=0.34,
153
+ depth_multiplier=0.5,
154
+ eval_idx=-1,
155
+ num_decoder_layers=3,
156
+ decoder_attention_heads=8,
157
+ decoder_ffn_dim=512,
158
+ decoder_n_points=[6, 6],
159
+ lqe_hidden_dim=64,
160
+ num_lqe_layers=2,
161
+ image_shape=(256, 256, 3),
162
+ )
163
+
164
+ # 3. Create the final object detector model.
165
+ object_detector_scratch = DFineObjectDetector(
166
+ backbone=d_fine_backbone,
167
+ num_classes=80,
168
+ bounding_box_format="yxyx",
169
+ )
170
+ ```
171
+
172
+ ### Train the Model
173
+ Call `fit()` on a batch of images and ground truth bounding boxes. The `compute_loss` method from the detector handles the complex loss calculations.
174
+ ```python
175
+ # Prepare sample training data.
176
+ images = np.random.uniform(
177
+ low=0, high=255, size=(2, 256, 256, 3)
178
+ ).astype("float32")
179
+ bounding_boxes = {
180
+ "boxes": [
181
+ np.array([[0.1, 0.1, 0.3, 0.3], [0.5, 0.5, 0.8, 0.8]], dtype="float32"),
182
+ np.array([[0.2, 0.2, 0.4, 0.4]], dtype="float32"),
183
+ ],
184
+ "labels": [
185
+ np.array([1, 10], dtype="int32"),
186
+ np.array([20], dtype="int32"),
187
+ ],
188
+ }
189
+
190
+ # Compile the model with the built-in loss function.
191
+ object_detector_scratch.compile(
192
+ optimizer="adam",
193
+ loss=object_detector_scratch.compute_loss,
194
+ )
195
+
196
+ # Train the model.
197
+ object_detector_scratch.fit(x=images, y=bounding_boxes, epochs=1)
198
+ ```
199
+
200
+ ### Train with Contrastive Denoising
201
+ To enable contrastive denoising for training, provide ground truth `labels` when initializing the `DFineBackbone`.
202
+ ```python
203
+ # Sample ground truth labels for initializing the denoising generator.
204
+ labels_for_denoising = [
205
+ {
206
+ "boxes": np.array([[0.5, 0.5, 0.2, 0.2]]), "labels": np.array([1])
207
+ },
208
+ {
209
+ "boxes": np.array([[0.6, 0.6, 0.3, 0.3]]), "labels": np.array([2])
210
+ },
211
+ ]
212
+
213
+ # Create a D-FINE backbone with denoising enabled.
214
+ d_fine_backbone_denoising = DFineBackbone(
215
+ backbone=hgnetv2_backbone, # Using the hgnetv2_backbone from before
216
+ decoder_in_channels=[128, 128],
217
+ encoder_hidden_dim=128,
218
+ num_denoising=100, # Number of denoising queries
219
+ label_noise_ratio=0.5,
220
+ box_noise_scale=1.0,
221
+ labels=labels_for_denoising, # Provide labels at initialization
222
+ num_labels=80,
223
+ hidden_dim=128,
224
+ learn_initial_query=False,
225
+ num_queries=300,
226
+ anchor_image_size=(256, 256),
227
+ feat_strides=[16, 32],
228
+ num_feature_levels=2,
229
+ encoder_in_channels=[512, 1024],
230
+ encode_proj_layers=[1],
231
+ num_attention_heads=8,
232
+ encoder_ffn_dim=512,
233
+ num_encoder_layers=1,
234
+ hidden_expansion=0.34,
235
+ depth_multiplier=0.5,
236
+ eval_idx=-1,
237
+ num_decoder_layers=3,
238
+ decoder_attention_heads=8,
239
+ decoder_ffn_dim=512,
240
+ decoder_n_points=[6, 6],
241
+ lqe_hidden_dim=64,
242
+ num_lqe_layers=2,
243
+ image_shape=(256, 256, 3),
244
+ )
245
+
246
+ # Create the final detector.
247
+ object_detector_denoising = DFineObjectDetector(
248
+ backbone=d_fine_backbone_denoising,
249
+ num_classes=80
250
+ )
251
+
252
+ # This model can now be compiled and trained as shown in the previous example.
253
+ ```
254
+
255
+ ## Example Usage with Hugging Face URI
256
+
257
+ ### Imports
258
+ ```python
259
+ import keras
260
+ import keras_hub
261
+ import numpy as np
262
+ from keras_hub.models import DFineBackbone
263
+ from keras_hub.models import DFineObjectDetector
264
+ from keras_hub.models import HGNetV2Backbone
265
+ ```
266
+
267
+ ### Load a Pretrained Model
268
+ Use `from_preset()` to load a D-FINE model with pretrained weights.
269
+ ```python
270
+ object_detector = DFineObjectDetector.from_preset(
271
+ "hf://keras/dfine_small_coco"
272
+ )
273
+ ```
274
+
275
+ ### Make a Prediction
276
+ Call `predict()` on a batch of images. The images will be automatically preprocessed.
277
+ ```python
278
+ # Create a random image.
279
+ image = np.random.uniform(size=(1, 256, 256, 3)).astype("float32")
280
+
281
+ # Make predictions.
282
+ predictions = object_detector.predict(image)
283
+
284
+ # The output is a dictionary containing boxes, labels, confidence scores,
285
+ # and the number of detections.
286
+ print(predictions["boxes"].shape)
287
+ print(predictions["labels"].shape)
288
+ print(predictions["confidence"].shape)
289
+ print(predictions["num_detections"])
290
+ ```
291
+
292
+ ### Fine-Tune a Pre-trained Model
293
+ You can load a pretrained backbone and attach a new detection head for a different number of classes.
294
+ ```python
295
+ # Load a pretrained backbone.
296
+ backbone = DFineBackbone.from_preset(
297
+ "hf://keras/dfine_small_coco"
298
+ )
299
+
300
+ # Create a new detector with a different number of classes for fine-tuning.
301
+ finetuning_detector = DFineObjectDetector(
302
+ backbone=backbone,
303
+ num_classes=10 # Example: fine-tuning on a new dataset with 10 classes
304
+ )
305
+
306
+ # The `finetuning_detector` is now ready to be compiled and trained on a new dataset.
307
+ ```
308
+
309
+ ### Create a Model From Scratch
310
+ You can also build a D-FINE detector by first creating its components, such as the underlying `HGNetV2Backbone`.
311
+ ```python
312
+ # 1. Define a base backbone for feature extraction.
313
+ hgnetv2_backbone = HGNetV2Backbone(
314
+ stem_channels=[3, 16, 16],
315
+ stackwise_stage_filters=[
316
+ [16, 16, 64, 1, 3, 3],
317
+ [64, 32, 256, 1, 3, 3],
318
+ [256, 64, 512, 2, 3, 5],
319
+ [512, 128, 1024, 1, 3, 5],
320
+ ],
321
+ apply_downsample=[False, True, True, True],
322
+ use_lightweight_conv_block=[False, False, True, True],
323
+ depths=[1, 1, 2, 1],
324
+ hidden_sizes=[64, 256, 512, 1024],
325
+ embedding_size=16,
326
+ image_shape=(256, 256, 3),
327
+ out_features=["stage3", "stage4"],
328
+ )
329
+
330
+ # 2. Create the D-FINE backbone, which includes the hybrid encoder and decoder.
331
+ d_fine_backbone = DFineBackbone(
332
+ backbone=hgnetv2_backbone,
333
+ decoder_in_channels=[128, 128],
334
+ encoder_hidden_dim=128,
335
+ num_denoising=0, # Denoising is off
336
+ num_labels=80,
337
+ hidden_dim=128,
338
+ learn_initial_query=False,
339
+ num_queries=300,
340
+ anchor_image_size=(256, 256),
341
+ feat_strides=[16, 32],
342
+ num_feature_levels=2,
343
+ encoder_in_channels=[512, 1024],
344
+ encode_proj_layers=[1],
345
+ num_attention_heads=8,
346
+ encoder_ffn_dim=512,
347
+ num_encoder_layers=1,
348
+ hidden_expansion=0.34,
349
+ depth_multiplier=0.5,
350
+ eval_idx=-1,
351
+ num_decoder_layers=3,
352
+ decoder_attention_heads=8,
353
+ decoder_ffn_dim=512,
354
+ decoder_n_points=[6, 6],
355
+ lqe_hidden_dim=64,
356
+ num_lqe_layers=2,
357
+ image_shape=(256, 256, 3),
358
+ )
359
+
360
+ # 3. Create the final object detector model.
361
+ object_detector_scratch = DFineObjectDetector(
362
+ backbone=d_fine_backbone,
363
+ num_classes=80,
364
+ bounding_box_format="yxyx",
365
+ )
366
+ ```
367
+
368
+ ### Train the Model
369
+ Call `fit()` on a batch of images and ground truth bounding boxes. The `compute_loss` method from the detector handles the complex loss calculations.
370
+ ```python
371
+ # Prepare sample training data.
372
+ images = np.random.uniform(
373
+ low=0, high=255, size=(2, 256, 256, 3)
374
+ ).astype("float32")
375
+ bounding_boxes = {
376
+ "boxes": [
377
+ np.array([[0.1, 0.1, 0.3, 0.3], [0.5, 0.5, 0.8, 0.8]], dtype="float32"),
378
+ np.array([[0.2, 0.2, 0.4, 0.4]], dtype="float32"),
379
+ ],
380
+ "labels": [
381
+ np.array([1, 10], dtype="int32"),
382
+ np.array([20], dtype="int32"),
383
+ ],
384
+ }
385
+
386
+ # Compile the model with the built-in loss function.
387
+ object_detector_scratch.compile(
388
+ optimizer="adam",
389
+ loss=object_detector_scratch.compute_loss,
390
+ )
391
+
392
+ # Train the model.
393
+ object_detector_scratch.fit(x=images, y=bounding_boxes, epochs=1)
394
+ ```
395
+
396
+ ### Train with Contrastive Denoising
397
+ To enable contrastive denoising for training, provide ground truth `labels` when initializing the `DFineBackbone`.
398
+ ```python
399
+ # Sample ground truth labels for initializing the denoising generator.
400
+ labels_for_denoising = [
401
+ {
402
+ "boxes": np.array([[0.5, 0.5, 0.2, 0.2]]), "labels": np.array([1])
403
+ },
404
+ {
405
+ "boxes": np.array([[0.6, 0.6, 0.3, 0.3]]), "labels": np.array([2])
406
+ },
407
+ ]
408
+
409
+ # Create a D-FINE backbone with denoising enabled.
410
+ d_fine_backbone_denoising = DFineBackbone(
411
+ backbone=hgnetv2_backbone, # Using the hgnetv2_backbone from before
412
+ decoder_in_channels=[128, 128],
413
+ encoder_hidden_dim=128,
414
+ num_denoising=100, # Number of denoising queries
415
+ label_noise_ratio=0.5,
416
+ box_noise_scale=1.0,
417
+ labels=labels_for_denoising, # Provide labels at initialization
418
+ num_labels=80,
419
+ hidden_dim=128,
420
+ learn_initial_query=False,
421
+ num_queries=300,
422
+ anchor_image_size=(256, 256),
423
+ feat_strides=[16, 32],
424
+ num_feature_levels=2,
425
+ encoder_in_channels=[512, 1024],
426
+ encode_proj_layers=[1],
427
+ num_attention_heads=8,
428
+ encoder_ffn_dim=512,
429
+ num_encoder_layers=1,
430
+ hidden_expansion=0.34,
431
+ depth_multiplier=0.5,
432
+ eval_idx=-1,
433
+ num_decoder_layers=3,
434
+ decoder_attention_heads=8,
435
+ decoder_ffn_dim=512,
436
+ decoder_n_points=[6, 6],
437
+ lqe_hidden_dim=64,
438
+ num_lqe_layers=2,
439
+ image_shape=(256, 256, 3),
440
+ )
441
+
442
+ # Create the final detector.
443
+ object_detector_denoising = DFineObjectDetector(
444
+ backbone=d_fine_backbone_denoising,
445
+ num_classes=80
446
+ )
447
+
448
+ # This model can now be compiled and trained as shown in the previous example.
449
+ ```