--- library_name: transformers tags: - object-detection - grounding - vision - custom-dataset - groundingdino license: mit pipeline_tag: object-detection --- # Custom GroundingDINO Model This is a custom trained GroundingDINO model for object detection and grounding, compatible with the Hugging Face Transformers library. ## Model Details - **Model Type**: GroundingDINO - **Number of Classes**: 1180 - **Training Dataset**: Custom dataset with 1180 object classes - **Architecture**: GroundingDINO with Swin-T backbone - **Transformers Compatible**: ✅ Yes ## Usage with Transformers ```python from transformers import AutoModel, AutoConfig, AutoTokenizer import torch from PIL import Image # Load model and config model = AutoModel.from_pretrained("your_username/your_model_name") config = AutoConfig.from_pretrained("your_username/your_model_name") # Load label map import json with open("label_map.json", "r") as f: label_map = json.load(f) # Prepare text prompt text_prompt = ". ".join(list(label_map.values())[:100]) + "." # Load and preprocess image image = Image.open("your_image.jpg").convert("RGB") # Add your image preprocessing here # Run inference with torch.no_grad(): outputs = model(images=image, text_prompts=[text_prompt]) logits = outputs.logits boxes = outputs.boxes ``` ## Usage with Original Implementation ```python from model_loader import ModelLoader, quick_inference # Quick inference results = quick_inference('your_image.jpg') # Or load model manually model = ModelLoader.load_model( checkpoint_path='pytorch_model.bin', config_path='original_config.py', device='cuda' ) label_map = ModelLoader.load_label_map('label_map.json') ``` ## Model Files - `pytorch_model.bin`: Model weights (transformers format) - `config.json`: Transformers configuration - `modeling_groundingdino.py`: Custom model class - `tokenizer_config.json`: Tokenizer configuration - `label_map.json`: Class label mapping (1180 classes) - `original_config.py`: Original training configuration ## Classes This model can detect 1180 unique object classes including: - blue and purple polka dot block - blue and purple polka dot bowl - blue and purple polka dot container - blue and purple polka dot cross - blue and purple polka dot diamond - blue and purple polka dot flower - blue and purple polka dot frame - blue and purple polka dot heart - blue and purple polka dot hexagon - blue and purple polka dot l-shaped block - blue and purple polka dot letter a - blue and purple polka dot letter e - blue and purple polka dot letter g - blue and purple polka dot letter m - blue and purple polka dot letter r - blue and purple polka dot letter t - blue and purple polka dot letter v - blue and purple polka dot line - blue and purple polka dot pallet - blue and purple polka dot pan ... and 1160 more classes. ## Installation ```bash pip install transformers torch torchvision ``` ## Example Classes The model can detect objects with various: - **Colors**: blue, red, green, yellow, purple, etc. - **Patterns**: polka dot, stripe, paisley, swirl, checkerboard - **Shapes**: block, bowl, container, cross, diamond, flower - **Combinations**: "blue and purple polka dot block", "red stripe heart" ## Performance - **Model Size**: ~1.1 GB - **Parameters**: ~172M - **Training**: 12 epochs on custom dataset - **Memory Usage**: ~2-4 GB GPU memory during inference ## Citation If you use this model, please cite the original GroundingDINO paper: ```bibtex @article{{liu2023grounding, title={{Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection}}, author={{Liu, Shilong and Zeng, Zhaoyang and Ren, Tianhe and Li, Feng and Zhang, Hao and Yang, Jie and Li, Chunyuan and Yang, Jianwei and Su, Hang and Zhu, Jun and others}}, journal={{arXiv preprint arXiv:2303.05499}}, year={{2023}} }} ``` ## License This model is released under the MIT License.