keras
/

dfine_small_coco

KerasHub

Model card Files Files and versions

xet

Community

prasadsachin commited on Feb 26

Commit

6bed36f

verified ·

1 Parent(s): d1115bb

Update README.md with new model card content

Browse files

Files changed (1) hide show

README.md +446 -41

README.md CHANGED Viewed

@@ -1,44 +1,449 @@
 ---
 library_name: keras-hub
 ---
-This is a [`DFine` model](https://keras.io/api/keras_hub/models/d_fine) uploaded using the KerasHub library and can be used with JAX, TensorFlow, and PyTorch backends.
-This model is related to a `ObjectDetector` task.
-Model config:
-* **name:** d_fine_backbone
-* **trainable:** True
-* **backbone:** {'module': 'keras_hub.src.models.hgnetv2.hgnetv2_backbone', 'class_name': 'HGNetV2Backbone', 'config': {'name': 'hg_net_v2_backbone', 'trainable': True, 'depths': [3, 4, 6, 3], 'embedding_size': 32, 'hidden_sizes': [128, 256, 512, 1024], 'stem_channels': [3, 16, 16], 'hidden_act': 'relu', 'use_learnable_affine_block': True, 'stackwise_stage_filters': [[16, 16, 64, 1, 3, 3], [64, 32, 256, 1, 3, 3], [256, 64, 512, 2, 3, 5], [512, 128, 1024, 1, 3, 5]], 'apply_downsample': [False, True, True, True], 'use_lightweight_conv_block': [False, False, True, True], 'image_shape': [None, None, 3], 'out_features': ['stage2', 'stage3', 'stage4'], 'data_format': 'channels_last'}, 'registered_name': 'keras_hub>HGNetV2Backbone'}
-* **decoder_in_channels:** [256, 256, 256]
-* **encoder_hidden_dim:** 256
-* **num_labels:** 80
-* **num_denoising:** 100
-* **learn_initial_query:** False
-* **num_queries:** 300
-* **anchor_image_size:** [640, 640]
-* **feat_strides:** [8, 16, 32]
-* **num_feature_levels:** 3
-* **hidden_dim:** 256
-* **encoder_in_channels:** [256, 512, 1024]
-* **encode_proj_layers:** [2]
-* **num_attention_heads:** 8
-* **encoder_ffn_dim:** 1024
-* **num_encoder_layers:** 1
-* **hidden_expansion:** 0.5
-* **depth_multiplier:** 0.34
-* **eval_idx:** -1
-* **box_noise_scale:** 1.0
-* **label_noise_ratio:** 0.5
-* **labels:** None
-* **num_decoder_layers:** 3
-* **decoder_attention_heads:** 8
-* **decoder_ffn_dim:** 1024
-* **decoder_method:** default
-* **decoder_n_points:** [3, 6, 3]
-* **lqe_hidden_dim:** 64
-* **num_lqe_layers:** 2
-* **seed:** 0
-* **image_shape:** [None, None, 3]
-* **data_format:** channels_last
-* **out_features:** ['stage2', 'stage3', 'stage4']
-This model card has been generated automatically and should be completed by the model author. See [Model Cards documentation](https://huggingface.co/docs/hub/model-cards) for more information.

 ---
 library_name: keras-hub
 ---
+### Model Overview
+# Model Summary
+D-FINE is a family of lightweight, real-time object detection models built on the DETR (DEtection TRansformer) architecture. It achieves outstanding localization precision by redefining the bounding box regression task. D-FINE is a powerful object detector designed for a wide range of computer vision tasks. It's trained on massive image datasets, enabling it to excel at identifying and localizing objects with high accuracy and speed. D-FINE offers a balance of high performance and computational efficiency, making it suitable for both research and deployment in various real-time applications.
+Key Features:
+  * Transformer-based Architecture: A modern, efficient design based on the DETR framework for direct set prediction of objects.
+  * Open Source Code: Code is publicly available, promoting accessibility and innovation.
+  * Strong Performance: Achieves state-of-the-art results on object detection benchmarks like COCO for its size.
+  * Multiple Sizes: Comes in various sizes (e.g., Nano, Small, Large, X-Large) to fit different hardware capabilities.
+  * Advanced Bounding Box Refinement: Instead of predicting fixed coordinates, it iteratively refines probability distributions for precise object localization using Fine-grained Distribution Refinement (FDR).
+Training Strategies:
+D-FINE is pre-trained on large and diverse datasets like COCO and Objects365. The training process utilizes Global Optimal Localization Self-Distillation (GO-LSD), a bidirectional optimization strategy that transfers localization knowledge from refined distributions in deeper layers to shallower layers. This accelerates convergence and improves the overall performance of the model.
+Weights are released under the [Apache 2.0 License](https://www.google.com/search?q=https://github.com/Peterande/D-FINE/blob/main/LICENSE).
+## Links
+  * [D-FINE Quickstart Notebook](https://www.kaggle.com/code/harshaljanjani/d-fine-quickstart-notebook)
+  * [D-FINE API Documentation](https://keras.io/keras_hub/api/models/d_fine/)
+  * [D-FINE Model Card](https://arxiv.org/abs/2410.13842)
+  * [KerasHub Beginner Guide](https://keras.io/guides/keras_hub/getting_started/)
+  * [KerasHub Model Publishing Guide](https://keras.io/guides/keras_hub/upload/)
+## Installation
+Keras and KerasHub can be installed with:
+```
+pip install -U -q keras-hub
+pip install -U -q keras
+```
+Jax, TensorFlow, and Torch come preinstalled in Kaggle Notebooks. For instructions on installing them in another environment see the [Keras Getting Started](https://keras.io/getting_started/) page.
+## Available D-FINE Presets
+The following model checkpoints are provided by the Keras team. Full code examples for each are available below.
+| Preset |&nbsp; Parameters | Description |
+|--------|------------|-------------|
+| dfine_nano_coco | 3.79M | D-FINE Nano model, the smallest variant in the family, pretrained on the COCO dataset. Ideal for applications where computational resources are limited. |
+| dfine_small_coco | 10.33M | D-FINE Small model pretrained on the COCO dataset. Offers a balance between performance and computational efficiency. |
+| dfine_medium_coco | 19.62M | D-FINE Medium model pretrained on the COCO dataset. A solid baseline with strong performance for general-purpose object detection. |
+| dfine_large_coco | 31.34M | D-FINE Large model pretrained on the COCO dataset. Provides high accuracy and is suitable for more demanding tasks. |
+| dfine_xlarge_coco | 62.83M | D-FINE X-Large model, the largest COCO-pretrained variant, designed for state-of-the-art performance where accuracy is the top priority. |
+| dfine_small_obj365 | 10.62M | D-FINE Small model pretrained on the large-scale Objects365 dataset, enhancing its ability to recognize a wider variety of objects. |
+| dfine_medium_obj365 | 19.99M | D-FINE Medium model pretrained on the Objects365 dataset. Benefits from a larger and more diverse pretraining corpus. |
+| dfine_large_obj365 | 31.86M | D-FINE Large model pretrained on the Objects365 dataset for improved generalization and performance on diverse object categories. |
+| dfine_xlarge_obj365 | 63.35M | D-FINE X-Large model pretrained on the Objects365 dataset, offering maximum performance by leveraging a vast number of object categories during pretraining. |
+| dfine_small_obj2coco | 10.33M | D-FINE Small model first pretrained on Objects365 and then fine-tuned on COCO, combining broad feature learning with benchmark-specific adaptation. |
+| dfine_medium_obj2coco | 19.62M | D-FINE Medium model using a two-stage training process: pretraining on Objects365 followed by fine-tuning on COCO. |
+| dfine_large_obj2coco_e25 | 31.34M | D-FINE Large model pretrained on Objects365 and then fine-tuned on COCO for 25 epochs. A high-performance model with specialized tuning. |
+| dfine_xlarge_obj2coco | 62.83M | D-FINE X-Large model, pretrained on Objects365 and fine-tuned on COCO, representing the most powerful model in this series for COCO-style tasks. |
+## Example Usage
+### Imports
+```python
+import keras
+import keras_hub
+import numpy as np
+from keras_hub.models import DFineBackbone
+from keras_hub.models import DFineObjectDetector
+from keras_hub.models import HGNetV2Backbone
+```
+### Load a Pretrained Model
+Use `from_preset()` to load a D-FINE model with pretrained weights.
+```python
+object_detector = DFineObjectDetector.from_preset(
+    "dfine_small_coco"
+)
+```
+### Make a Prediction
+Call `predict()` on a batch of images. The images will be automatically preprocessed.
+```python
+# Create a random image.
+image = np.random.uniform(size=(1, 256, 256, 3)).astype("float32")
+# Make predictions.
+predictions = object_detector.predict(image)
+# The output is a dictionary containing boxes, labels, confidence scores,
+# and the number of detections.
+print(predictions["boxes"].shape)
+print(predictions["labels"].shape)
+print(predictions["confidence"].shape)
+print(predictions["num_detections"])
+```
+### Fine-Tune a Pre-trained Model
+You can load a pretrained backbone and attach a new detection head for a different number of classes.
+```python
+# Load a pretrained backbone.
+backbone = DFineBackbone.from_preset(
+    "dfine_small_coco"
+)
+# Create a new detector with a different number of classes for fine-tuning.
+finetuning_detector = DFineObjectDetector(
+    backbone=backbone,
+    num_classes=10  # Example: fine-tuning on a new dataset with 10 classes
+)
+# The `finetuning_detector` is now ready to be compiled and trained on a new dataset.
+```
+### Create a Model From Scratch
+You can also build a D-FINE detector by first creating its components, such as the underlying `HGNetV2Backbone`.
+```python
+# 1. Define a base backbone for feature extraction.
+hgnetv2_backbone = HGNetV2Backbone(
+    stem_channels=[3, 16, 16],
+    stackwise_stage_filters=[
+        [16, 16, 64, 1, 3, 3],
+        [64, 32, 256, 1, 3, 3],
+        [256, 64, 512, 2, 3, 5],
+        [512, 128, 1024, 1, 3, 5],
+    ],
+    apply_downsample=[False, True, True, True],
+    use_lightweight_conv_block=[False, False, True, True],
+    depths=[1, 1, 2, 1],
+    hidden_sizes=[64, 256, 512, 1024],
+    embedding_size=16,
+    image_shape=(256, 256, 3),
+    out_features=["stage3", "stage4"],
+)
+# 2. Create the D-FINE backbone, which includes the hybrid encoder and decoder.
+d_fine_backbone = DFineBackbone(
+    backbone=hgnetv2_backbone,
+    decoder_in_channels=[128, 128],
+    encoder_hidden_dim=128,
+    num_denoising=0, # Denoising is off
+    num_labels=80,
+    hidden_dim=128,
+    learn_initial_query=False,
+    num_queries=300,
+    anchor_image_size=(256, 256),
+    feat_strides=[16, 32],
+    num_feature_levels=2,
+    encoder_in_channels=[512, 1024],
+    encode_proj_layers=[1],
+    num_attention_heads=8,
+    encoder_ffn_dim=512,
+    num_encoder_layers=1,
+    hidden_expansion=0.34,
+    depth_multiplier=0.5,
+    eval_idx=-1,
+    num_decoder_layers=3,
+    decoder_attention_heads=8,
+    decoder_ffn_dim=512,
+    decoder_n_points=[6, 6],
+    lqe_hidden_dim=64,
+    num_lqe_layers=2,
+    image_shape=(256, 256, 3),
+)
+# 3. Create the final object detector model.
+object_detector_scratch = DFineObjectDetector(
+    backbone=d_fine_backbone,
+    num_classes=80,
+    bounding_box_format="yxyx",
+)
+```
+### Train the Model
+Call `fit()` on a batch of images and ground truth bounding boxes. The `compute_loss` method from the detector handles the complex loss calculations.
+```python
+# Prepare sample training data.
+images = np.random.uniform(
+    low=0, high=255, size=(2, 256, 256, 3)
+).astype("float32")
+bounding_boxes = {
+    "boxes": [
+        np.array([[0.1, 0.1, 0.3, 0.3], [0.5, 0.5, 0.8, 0.8]], dtype="float32"),
+        np.array([[0.2, 0.2, 0.4, 0.4]], dtype="float32"),
+    ],
+    "labels": [
+        np.array([1, 10], dtype="int32"),
+        np.array([20], dtype="int32"),
+    ],
+}
+# Compile the model with the built-in loss function.
+object_detector_scratch.compile(
+    optimizer="adam",
+    loss=object_detector_scratch.compute_loss,
+)
+# Train the model.
+object_detector_scratch.fit(x=images, y=bounding_boxes, epochs=1)
+```
+### Train with Contrastive Denoising
+To enable contrastive denoising for training, provide ground truth `labels` when initializing the `DFineBackbone`.
+```python
+# Sample ground truth labels for initializing the denoising generator.
+labels_for_denoising = [
+    {
+        "boxes": np.array([[0.5, 0.5, 0.2, 0.2]]), "labels": np.array([1])
+    },
+    {
+        "boxes": np.array([[0.6, 0.6, 0.3, 0.3]]), "labels": np.array([2])
+    },
+]
+# Create a D-FINE backbone with denoising enabled.
+d_fine_backbone_denoising = DFineBackbone(
+    backbone=hgnetv2_backbone, # Using the hgnetv2_backbone from before
+    decoder_in_channels=[128, 128],
+    encoder_hidden_dim=128,
+    num_denoising=100,  # Number of denoising queries
+    label_noise_ratio=0.5,
+    box_noise_scale=1.0,
+    labels=labels_for_denoising, # Provide labels at initialization
+    num_labels=80,
+    hidden_dim=128,
+    learn_initial_query=False,
+    num_queries=300,
+    anchor_image_size=(256, 256),
+    feat_strides=[16, 32],
+    num_feature_levels=2,
+    encoder_in_channels=[512, 1024],
+    encode_proj_layers=[1],
+    num_attention_heads=8,
+    encoder_ffn_dim=512,
+    num_encoder_layers=1,
+    hidden_expansion=0.34,
+    depth_multiplier=0.5,
+    eval_idx=-1,
+    num_decoder_layers=3,
+    decoder_attention_heads=8,
+    decoder_ffn_dim=512,
+    decoder_n_points=[6, 6],
+    lqe_hidden_dim=64,
+    num_lqe_layers=2,
+    image_shape=(256, 256, 3),
+)
+# Create the final detector.
+object_detector_denoising = DFineObjectDetector(
+    backbone=d_fine_backbone_denoising,
+    num_classes=80
+)
+# This model can now be compiled and trained as shown in the previous example.
+```
+## Example Usage with Hugging Face URI
+### Imports
+```python
+import keras
+import keras_hub
+import numpy as np
+from keras_hub.models import DFineBackbone
+from keras_hub.models import DFineObjectDetector
+from keras_hub.models import HGNetV2Backbone
+```
+### Load a Pretrained Model
+Use `from_preset()` to load a D-FINE model with pretrained weights.
+```python
+object_detector = DFineObjectDetector.from_preset(
+    "hf://keras/dfine_small_coco"
+)
+```
+### Make a Prediction
+Call `predict()` on a batch of images. The images will be automatically preprocessed.
+```python
+# Create a random image.
+image = np.random.uniform(size=(1, 256, 256, 3)).astype("float32")
+# Make predictions.
+predictions = object_detector.predict(image)
+# The output is a dictionary containing boxes, labels, confidence scores,
+# and the number of detections.
+print(predictions["boxes"].shape)
+print(predictions["labels"].shape)
+print(predictions["confidence"].shape)
+print(predictions["num_detections"])
+```
+### Fine-Tune a Pre-trained Model
+You can load a pretrained backbone and attach a new detection head for a different number of classes.
+```python
+# Load a pretrained backbone.
+backbone = DFineBackbone.from_preset(
+    "hf://keras/dfine_small_coco"
+)
+# Create a new detector with a different number of classes for fine-tuning.
+finetuning_detector = DFineObjectDetector(
+    backbone=backbone,
+    num_classes=10  # Example: fine-tuning on a new dataset with 10 classes
+)
+# The `finetuning_detector` is now ready to be compiled and trained on a new dataset.
+```
+### Create a Model From Scratch
+You can also build a D-FINE detector by first creating its components, such as the underlying `HGNetV2Backbone`.
+```python
+# 1. Define a base backbone for feature extraction.
+hgnetv2_backbone = HGNetV2Backbone(
+    stem_channels=[3, 16, 16],
+    stackwise_stage_filters=[
+        [16, 16, 64, 1, 3, 3],
+        [64, 32, 256, 1, 3, 3],
+        [256, 64, 512, 2, 3, 5],
+        [512, 128, 1024, 1, 3, 5],
+    ],
+    apply_downsample=[False, True, True, True],
+    use_lightweight_conv_block=[False, False, True, True],
+    depths=[1, 1, 2, 1],
+    hidden_sizes=[64, 256, 512, 1024],
+    embedding_size=16,
+    image_shape=(256, 256, 3),
+    out_features=["stage3", "stage4"],
+)
+# 2. Create the D-FINE backbone, which includes the hybrid encoder and decoder.
+d_fine_backbone = DFineBackbone(
+    backbone=hgnetv2_backbone,
+    decoder_in_channels=[128, 128],
+    encoder_hidden_dim=128,
+    num_denoising=0, # Denoising is off
+    num_labels=80,
+    hidden_dim=128,
+    learn_initial_query=False,
+    num_queries=300,
+    anchor_image_size=(256, 256),
+    feat_strides=[16, 32],
+    num_feature_levels=2,
+    encoder_in_channels=[512, 1024],
+    encode_proj_layers=[1],
+    num_attention_heads=8,
+    encoder_ffn_dim=512,
+    num_encoder_layers=1,
+    hidden_expansion=0.34,
+    depth_multiplier=0.5,
+    eval_idx=-1,
+    num_decoder_layers=3,
+    decoder_attention_heads=8,
+    decoder_ffn_dim=512,
+    decoder_n_points=[6, 6],
+    lqe_hidden_dim=64,
+    num_lqe_layers=2,
+    image_shape=(256, 256, 3),
+)
+# 3. Create the final object detector model.
+object_detector_scratch = DFineObjectDetector(
+    backbone=d_fine_backbone,
+    num_classes=80,
+    bounding_box_format="yxyx",
+)
+```
+### Train the Model
+Call `fit()` on a batch of images and ground truth bounding boxes. The `compute_loss` method from the detector handles the complex loss calculations.
+```python
+# Prepare sample training data.
+images = np.random.uniform(
+    low=0, high=255, size=(2, 256, 256, 3)
+).astype("float32")
+bounding_boxes = {
+    "boxes": [
+        np.array([[0.1, 0.1, 0.3, 0.3], [0.5, 0.5, 0.8, 0.8]], dtype="float32"),
+        np.array([[0.2, 0.2, 0.4, 0.4]], dtype="float32"),
+    ],
+    "labels": [
+        np.array([1, 10], dtype="int32"),
+        np.array([20], dtype="int32"),
+    ],
+}
+# Compile the model with the built-in loss function.
+object_detector_scratch.compile(
+    optimizer="adam",
+    loss=object_detector_scratch.compute_loss,
+)
+# Train the model.
+object_detector_scratch.fit(x=images, y=bounding_boxes, epochs=1)
+```
+### Train with Contrastive Denoising
+To enable contrastive denoising for training, provide ground truth `labels` when initializing the `DFineBackbone`.
+```python
+# Sample ground truth labels for initializing the denoising generator.
+labels_for_denoising = [
+    {
+        "boxes": np.array([[0.5, 0.5, 0.2, 0.2]]), "labels": np.array([1])
+    },
+    {
+        "boxes": np.array([[0.6, 0.6, 0.3, 0.3]]), "labels": np.array([2])
+    },
+]
+# Create a D-FINE backbone with denoising enabled.
+d_fine_backbone_denoising = DFineBackbone(
+    backbone=hgnetv2_backbone, # Using the hgnetv2_backbone from before
+    decoder_in_channels=[128, 128],
+    encoder_hidden_dim=128,
+    num_denoising=100,  # Number of denoising queries
+    label_noise_ratio=0.5,
+    box_noise_scale=1.0,
+    labels=labels_for_denoising, # Provide labels at initialization
+    num_labels=80,
+    hidden_dim=128,
+    learn_initial_query=False,
+    num_queries=300,
+    anchor_image_size=(256, 256),
+    feat_strides=[16, 32],
+    num_feature_levels=2,
+    encoder_in_channels=[512, 1024],
+    encode_proj_layers=[1],
+    num_attention_heads=8,
+    encoder_ffn_dim=512,
+    num_encoder_layers=1,
+    hidden_expansion=0.34,
+    depth_multiplier=0.5,
+    eval_idx=-1,
+    num_decoder_layers=3,
+    decoder_attention_heads=8,
+    decoder_ffn_dim=512,
+    decoder_n_points=[6, 6],
+    lqe_hidden_dim=64,
+    num_lqe_layers=2,
+    image_shape=(256, 256, 3),
+)
+# Create the final detector.
+object_detector_denoising = DFineObjectDetector(
+    backbone=d_fine_backbone_denoising,
+    num_classes=80
+)
+# This model can now be compiled and trained as shown in the previous example.
+```