--- title: VisionQuery emoji: 🔍 colorFrom: indigo colorTo: purple sdk: docker app_port: 7860 short_description: SigLIP based zero-shot image classification tags: - vision - zero-shot - siglip - taipy - image-classification - transformers --- # VisionQuery ### Zero-Shot Image Understanding with Google SigLIP + Taipy ## Problem Statement Traditional image classification systems demand: - **Thousands of labeled images** per category - **Expensive GPU training pipelines** - **Re-training** every time you add a new category - **ML expertise** to build and maintain This makes vision AI inaccessible for most real-world use cases. ## Solution **VisionQuery AI** uses **SigLIP** (Sigmoid Loss for Language-Image Pre-Training by Google DeepMind) to deliver **zero-shot image classification**: - Describe what you're looking for in **plain English** - No training data or fine-tuning — ever - Add **unlimited categories** on the fly - Multilingual: supports **100+ languages** ## How to Use 1. **Upload** any image (JPG, PNG, WebP) 2. **Enter text labels** as comma-separated descriptions e.g. `a cat, a dog, a person walking, a sunset` 3. Click **Analyze Image** 4. Instantly see **similarity scores** for every label ## How SigLIP Works ``` Image ──► ViT Encoder ──► Image Embedding ──┐ ├──► Sigmoid Score per pair Text ──► BERT Encoder ──► Text Embedding ──┘ ``` Unlike CLIP's softmax loss (which normalises scores globally), SigLIP uses a **sigmoid loss** — each image-text pair is scored independently. This gives: - Better calibration - True multi-label support - Stronger zero-shot accuracy **Model used:** `google/siglip-base-patch16-224` ## Tech Stack | Layer | Technology | |---|---| | Vision-Language Model | Google SigLIP via 🤗 Transformers | | GUI Framework | [Taipy](https://github.com/Avaiga/taipy) | | Charts | Plotly | | Deployment | Hugging Face Spaces (Docker) | | Backend | PyTorch | ## Applications | Domain | Use Case | |---|---| | 🏥 Healthcare | Describe symptoms → find matching scan types | | 🛒 E-Commerce | Natural language visual product search | | 🔒 Security | Detect unusual scenes with text descriptions | | 🎨 Asset Management | Auto-tag image libraries | | ♿ Accessibility | Auto-describe images for visually impaired | | 🔬 Research | Classify microscopy / satellite imagery | ## Local Setup ```bash git clone https://huggingface.co/spaces/YOUR_USERNAME/visionquery-ai cd visionquery-ai pip install -r requirements.txt python app.py ``` App runs at `http://localhost:7860` ## Citation ``` @article{zhai2023sigmoid, title = {Sigmoid Loss for Language Image Pre-Training}, author = {Zhai, Xiaohua and others}, journal = {arXiv:2303.15343}, year = {2023}, publisher = {Google DeepMind} } ```