Spaces:

SciX2050
/

visionquery

Sleeping

App Files Files Community

visionquery / README.md

Saptadip Saha

Update readme

faf9430 3 months ago

preview code

raw

history blame contribute delete

2.92 kB

metadata

title: VisionQuery
emoji: 🔍
colorFrom: indigo
colorTo: purple
sdk: docker
app_port: 7860
short_description: SigLIP based zero-shot image classification
tags:
  - vision
  - zero-shot
  - siglip
  - taipy
  - image-classification
  - transformers

VisionQuery

Zero-Shot Image Understanding with Google SigLIP + Taipy

Problem Statement

Traditional image classification systems demand:

Thousands of labeled images per category
Expensive GPU training pipelines
Re-training every time you add a new category
ML expertise to build and maintain

This makes vision AI inaccessible for most real-world use cases.

Solution

VisionQuery AI uses SigLIP (Sigmoid Loss for Language-Image Pre-Training by Google DeepMind) to deliver zero-shot image classification:

Describe what you're looking for in plain English
No training data or fine-tuning — ever
Add unlimited categories on the fly
Multilingual: supports 100+ languages

How to Use

Upload any image (JPG, PNG, WebP)
Enter text labels as comma-separated descriptions
e.g. a cat, a dog, a person walking, a sunset
Click Analyze Image
Instantly see similarity scores for every label

How SigLIP Works

Image ──► ViT Encoder ──► Image Embedding ──┐
                                             ├──► Sigmoid Score per pair
Text  ──► BERT Encoder ──► Text Embedding ──┘

Unlike CLIP's softmax loss (which normalises scores globally), SigLIP uses a sigmoid loss — each image-text pair is scored independently. This gives:

Better calibration
True multi-label support
Stronger zero-shot accuracy

Model used: google/siglip-base-patch16-224

Tech Stack

Layer	Technology
Vision-Language Model	Google SigLIP via 🤗 Transformers
GUI Framework	Taipy
Charts	Plotly
Deployment	Hugging Face Spaces (Docker)
Backend	PyTorch

Applications

Domain	Use Case
🏥 Healthcare	Describe symptoms → find matching scan types
🛒 E-Commerce	Natural language visual product search
🔒 Security	Detect unusual scenes with text descriptions
🎨 Asset Management	Auto-tag image libraries
♿ Accessibility	Auto-describe images for visually impaired
🔬 Research	Classify microscopy / satellite imagery

Local Setup

git clone https://huggingface.co/spaces/YOUR_USERNAME/visionquery-ai
cd visionquery-ai
pip install -r requirements.txt
python app.py

App runs at http://localhost:7860

Citation

@article{zhai2023sigmoid,
  title     = {Sigmoid Loss for Language Image Pre-Training},
  author    = {Zhai, Xiaohua and others},
  journal   = {arXiv:2303.15343},
  year      = {2023},
  publisher = {Google DeepMind}
}