---
title: VisionQuery 
emoji: 🔍
colorFrom: indigo
colorTo: purple
sdk: docker
app_port: 7860
short_description: SigLIP based zero-shot image classification
tags:
  - vision
  - zero-shot
  - siglip
  - taipy
  - image-classification
  - transformers
---

# VisionQuery 
### Zero-Shot Image Understanding with Google SigLIP + Taipy


## Problem Statement

Traditional image classification systems demand:
- **Thousands of labeled images** per category  
- **Expensive GPU training pipelines**  
- **Re-training** every time you add a new category  
- **ML expertise** to build and maintain

This makes vision AI inaccessible for most real-world use cases.


## Solution

**VisionQuery AI** uses **SigLIP** (Sigmoid Loss for Language-Image Pre-Training by Google DeepMind) to deliver **zero-shot image classification**:

- Describe what you're looking for in **plain English**
- No training data or fine-tuning — ever
- Add **unlimited categories** on the fly
- Multilingual: supports **100+ languages**


## How to Use

1. **Upload** any image (JPG, PNG, WebP)
2. **Enter text labels** as comma-separated descriptions  
   e.g. `a cat, a dog, a person walking, a sunset`
3. Click **Analyze Image**
4. Instantly see **similarity scores** for every label


## How SigLIP Works

```
Image ──► ViT Encoder ──► Image Embedding ──┐
                                             ├──► Sigmoid Score per pair
Text  ──► BERT Encoder ──► Text Embedding ──┘
```

Unlike CLIP's softmax loss (which normalises scores globally), SigLIP uses a **sigmoid loss** — each image-text pair is scored independently. This gives:
- Better calibration
- True multi-label support
- Stronger zero-shot accuracy

**Model used:** `google/siglip-base-patch16-224`


## Tech Stack

| Layer | Technology |
|---|---|
| Vision-Language Model | Google SigLIP via 🤗 Transformers |
| GUI Framework | [Taipy](https://github.com/Avaiga/taipy) |
| Charts | Plotly |
| Deployment | Hugging Face Spaces (Docker) |
| Backend | PyTorch |


## Applications

| Domain | Use Case |
|---|---|
| 🏥 Healthcare | Describe symptoms → find matching scan types |
| 🛒 E-Commerce | Natural language visual product search |
| 🔒 Security | Detect unusual scenes with text descriptions |
| 🎨 Asset Management | Auto-tag image libraries |
| ♿ Accessibility | Auto-describe images for visually impaired |
| 🔬 Research | Classify microscopy / satellite imagery |


## Local Setup

```bash
git clone https://huggingface.co/spaces/YOUR_USERNAME/visionquery-ai
cd visionquery-ai
pip install -r requirements.txt
python app.py
```

App runs at `http://localhost:7860`


## Citation

```
@article{zhai2023sigmoid,
  title     = {Sigmoid Loss for Language Image Pre-Training},
  author    = {Zhai, Xiaohua and others},
  journal   = {arXiv:2303.15343},
  year      = {2023},
  publisher = {Google DeepMind}
}
```