hf-internal-testing
/

tiny-random-GroundingDinoForObjectDetection

Zero-Shot Object Detection

Model card Files Files and versions

Xenova HF Staff commited on Jan 6, 2025

Commit

8b5cd60

·

verified ·

1 Parent(s): 4e4527c

Update README.md

Files changed (1) hide show

README.md +84 -0

README.md CHANGED Viewed

@@ -59,6 +59,90 @@ processor = AutoProcessor.from_pretrained(model_id)
 print(model.num_parameters())  # 7751525
 ```
 ## Model Details

 print(model.num_parameters())  # 7751525
 ```
+## Code to export to ONNX
+```python
+import requests
+import torch
+from PIL import Image
+from transformers import AutoProcessor, AutoModelForZeroShotObjectDetection
+from transformers.models.grounding_dino.modeling_grounding_dino import (
+    GroundingDinoObjectDetectionOutput,
+)
+# torch.onnx.errors.UnsupportedOperatorError: Exporting the operator 'aten::__ior_' to ONNX opset version 16 is not supported.
+# Please feel free to request support or submit a pull request on PyTorch GitHub: https://github.com/pytorch/pytorch/issues.
+torch.Tensor.__ior__ = lambda self, other: self.__or__(other)
+# model_id = "IDEA-Research/grounding-dino-tiny"
+model_id = "hf-internal-testing/tiny-random-GroundingDinoForObjectDetection"
+processor = AutoProcessor.from_pretrained(model_id)
+model = AutoModelForZeroShotObjectDetection.from_pretrained(model_id)
+old_forward = model.forward
+def new_forward(*args, **kwargs):
+    output = old_forward(*args, **kwargs, return_dict=True)
+    # Only return the logits and pred_boxes
+    return GroundingDinoObjectDetectionOutput(
+        logits=output.logits, pred_boxes=output.pred_boxes
+    )
+model.forward = new_forward
+image_url = "http://images.cocodataset.org/val2017/000000039769.jpg"
+image = Image.open(requests.get(image_url, stream=True).raw).resize((800, 800))
+text = "a cat."  # NB: text query need to be lowercased + end with a dot
+# Run python model
+inputs = processor(images=image, text=text, return_tensors="pt")
+with torch.no_grad():
+    outputs = model(**inputs)
+results = processor.post_process_grounded_object_detection(
+    outputs,
+    inputs.input_ids,
+    box_threshold=0.4,
+    text_threshold=0.3,
+    target_sizes=[image.size[::-1]],
+)
+text_axes = {
+    "input_ids": {1: "sequence_length"},
+    "token_type_ids": {1: "sequence_length"},
+    "attention_mask": {1: "sequence_length"},
+}
+image_axes = {}
+output_axes = {
+    "logits": {1: "num_queries"},
+    "pred_boxes": {1: "num_queries"},
+}
+input_names = [
+    "pixel_values",
+    "input_ids",
+    "token_type_ids",
+    "attention_mask",
+    "pixel_mask",
+]
+# Input to the model
+x = tuple(inputs[key] for key in input_names)
+# Export the model
+torch.onnx.export(
+    model,  # model being run
+    x,  # model input (or a tuple for multiple inputs)
+    "model.onnx",  # where to save the model (can be a file or file-like object)
+    export_params=True,  # store the trained parameter weights inside the model file
+    opset_version=16,  # the ONNX version to export the model to
+    do_constant_folding=True,  # whether to execute constant folding for optimization
+    input_names=input_names,
+    output_names=list(output_axes.keys()),
+    dynamic_axes={
+        **text_axes,
+        **image_axes,
+        **output_axes,
+    },
+)
+```
 ## Model Details