Spaces:
Sleeping
Sleeping
metadata
title: VisionQuery
emoji: π
colorFrom: indigo
colorTo: purple
sdk: docker
app_port: 7860
short_description: SigLIP based zero-shot image classification
tags:
- vision
- zero-shot
- siglip
- taipy
- image-classification
- transformers
VisionQuery
Zero-Shot Image Understanding with Google SigLIP + Taipy
Problem Statement
Traditional image classification systems demand:
- Thousands of labeled images per category
- Expensive GPU training pipelines
- Re-training every time you add a new category
- ML expertise to build and maintain
This makes vision AI inaccessible for most real-world use cases.
Solution
VisionQuery AI uses SigLIP (Sigmoid Loss for Language-Image Pre-Training by Google DeepMind) to deliver zero-shot image classification:
- Describe what you're looking for in plain English
- No training data or fine-tuning β ever
- Add unlimited categories on the fly
- Multilingual: supports 100+ languages
How to Use
- Upload any image (JPG, PNG, WebP)
- Enter text labels as comma-separated descriptions
e.g.a cat, a dog, a person walking, a sunset - Click Analyze Image
- Instantly see similarity scores for every label
How SigLIP Works
Image βββΊ ViT Encoder βββΊ Image Embedding βββ
ββββΊ Sigmoid Score per pair
Text βββΊ BERT Encoder βββΊ Text Embedding βββ
Unlike CLIP's softmax loss (which normalises scores globally), SigLIP uses a sigmoid loss β each image-text pair is scored independently. This gives:
- Better calibration
- True multi-label support
- Stronger zero-shot accuracy
Model used: google/siglip-base-patch16-224
Tech Stack
| Layer | Technology |
|---|---|
| Vision-Language Model | Google SigLIP via π€ Transformers |
| GUI Framework | Taipy |
| Charts | Plotly |
| Deployment | Hugging Face Spaces (Docker) |
| Backend | PyTorch |
Applications
| Domain | Use Case |
|---|---|
| π₯ Healthcare | Describe symptoms β find matching scan types |
| π E-Commerce | Natural language visual product search |
| π Security | Detect unusual scenes with text descriptions |
| π¨ Asset Management | Auto-tag image libraries |
| βΏ Accessibility | Auto-describe images for visually impaired |
| π¬ Research | Classify microscopy / satellite imagery |
Local Setup
git clone https://huggingface.co/spaces/YOUR_USERNAME/visionquery-ai
cd visionquery-ai
pip install -r requirements.txt
python app.py
App runs at http://localhost:7860
Citation
@article{zhai2023sigmoid,
title = {Sigmoid Loss for Language Image Pre-Training},
author = {Zhai, Xiaohua and others},
journal = {arXiv:2303.15343},
year = {2023},
publisher = {Google DeepMind}
}