WeDetect-demo / README.md
mrdbourke's picture
Upload 3 files
98545e4 verified

A newer version of the Gradio SDK is available: 6.13.0

Upgrade
metadata
title: WeDetect Open-Vocabulary Detection
emoji: ๐Ÿ”
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 5.50.0
app_file: app.py
pinned: false
license: gpl-3.0
suggested_hardware: t4-medium
models:
  - fushh7/WeDetect
tags:
  - object-detection
  - zero-shot
  - open-vocabulary
  - computer-vision
  - chinese

๐Ÿ” WeDetect: Open-Vocabulary Object Detection

Paper GitHub Models

WeDetect is a fast, open-vocabulary object detection model that can detect arbitrary objects specified via text prompts. This demo provides an interactive interface for testing the model.

โœจ Features

  • Open-Vocabulary Detection: Detect any object by simply specifying its name
  • English & Chinese Support: Enter class names in English (auto-translated) or Chinese directly
  • Editable Translation: Review and correct auto-translations before detection
  • Multiple Model Sizes: Choose between Tiny (fast), Base (balanced), or Large (best quality)
  • Adjustable Threshold: Fine-tune detection sensitivity

๐Ÿš€ How to Use

  1. Upload an Image: Click the upload area or drag-and-drop an image
  2. Choose Input Language: Select English or Chinese (ไธญๆ–‡)
  3. Enter Class Names: Type the objects you want to detect, separated by commas
    • English example: person, car, dog, cat
    • Chinese example: ไบบ, ่ฝฆ, ็‹—, ็Œซ
  4. Review Translation: If using English, check the Chinese preview and edit if needed
  5. Adjust Threshold: Lower values = more detections, higher values = more confident detections
  6. Click Detect: Press the "๐Ÿ” Detect Objects" button

๐Ÿ“Š Model Information

Model Parameters Speed Quality GPU Memory
WeDetect-Tiny ~28M โšกโšกโšก Fastest Good ~4-6 GB
WeDetect-Base ~89M โšกโšก Fast Better ~8-10 GB
WeDetect-Large ~198M โšก Moderate Best ~12-16 GB

๐Ÿ“š Common Class Names

English Chinese English Chinese English Chinese
person ไบบ car ่ฝฆ dog ็‹—
cat ็Œซ bird ้ธŸ bicycle ่‡ช่กŒ่ฝฆ
chair ๆค…ๅญ table ๆกŒๅญ bed ๅบŠ
phone ๆ‰‹ๆœบ laptop ็ฌ”่ฎฐๆœฌ็”ต่„‘ book ไนฆ
bottle ็“ถๅญ cup ๆฏๅญ shoe ้ž‹

Note: WeDetect is trained on Chinese data, so it works best with Chinese class names. The built-in dictionary covers ~200 common objects.

โš ๏ธ Important Notes

  • Chinese Model: WeDetect uses Chinese class names internally. English inputs are auto-translated using a dictionary of ~200 common objects.
  • Unknown Words: If a word isn't in the dictionary, it will be passed through unchanged. Check the Chinese preview to verify translations.
  • GPU Required: This demo requires GPU acceleration. If you encounter memory errors, try using a smaller model.

๐Ÿ”ง Technical Details

This Space uses:

  • Gradio 5.50.0+ - Compatible with huggingface_hub 1.x
  • huggingface_hub 1.x - Latest HF Hub API
  • MMDetection 3.3.0 - Object detection framework
  • @spaces.GPU decorator for GPU acceleration

๐Ÿ“– Citation

If you use WeDetect in your research, please cite:

@article{fu2025wedetect,
  title={WeDetect: Fast Open-Vocabulary Object Detection as Retrieval},
  author={Fu, Shenghao and Su, Yukun and Rao, Fengyun and LYU, Jing and Xie, Xiaohua and Zheng, Wei-Shi},
  journal={arXiv preprint arXiv:2512.12309},
  year={2025}
}

๐Ÿ“„ License

This demo uses the WeDetect model which is licensed under GPL-3.0.

๐Ÿ™ Acknowledgments