visionquery / README.md
Saptadip Saha
Update readme
faf9430
metadata
title: VisionQuery
emoji: πŸ”
colorFrom: indigo
colorTo: purple
sdk: docker
app_port: 7860
short_description: SigLIP based zero-shot image classification
tags:
  - vision
  - zero-shot
  - siglip
  - taipy
  - image-classification
  - transformers

VisionQuery

Zero-Shot Image Understanding with Google SigLIP + Taipy

Problem Statement

Traditional image classification systems demand:

  • Thousands of labeled images per category
  • Expensive GPU training pipelines
  • Re-training every time you add a new category
  • ML expertise to build and maintain

This makes vision AI inaccessible for most real-world use cases.

Solution

VisionQuery AI uses SigLIP (Sigmoid Loss for Language-Image Pre-Training by Google DeepMind) to deliver zero-shot image classification:

  • Describe what you're looking for in plain English
  • No training data or fine-tuning β€” ever
  • Add unlimited categories on the fly
  • Multilingual: supports 100+ languages

How to Use

  1. Upload any image (JPG, PNG, WebP)
  2. Enter text labels as comma-separated descriptions
    e.g. a cat, a dog, a person walking, a sunset
  3. Click Analyze Image
  4. Instantly see similarity scores for every label

How SigLIP Works

Image ──► ViT Encoder ──► Image Embedding ──┐
                                             β”œβ”€β”€β–Ί Sigmoid Score per pair
Text  ──► BERT Encoder ──► Text Embedding β”€β”€β”˜

Unlike CLIP's softmax loss (which normalises scores globally), SigLIP uses a sigmoid loss β€” each image-text pair is scored independently. This gives:

  • Better calibration
  • True multi-label support
  • Stronger zero-shot accuracy

Model used: google/siglip-base-patch16-224

Tech Stack

Layer Technology
Vision-Language Model Google SigLIP via πŸ€— Transformers
GUI Framework Taipy
Charts Plotly
Deployment Hugging Face Spaces (Docker)
Backend PyTorch

Applications

Domain Use Case
πŸ₯ Healthcare Describe symptoms β†’ find matching scan types
πŸ›’ E-Commerce Natural language visual product search
πŸ”’ Security Detect unusual scenes with text descriptions
🎨 Asset Management Auto-tag image libraries
β™Ώ Accessibility Auto-describe images for visually impaired
πŸ”¬ Research Classify microscopy / satellite imagery

Local Setup

git clone https://huggingface.co/spaces/YOUR_USERNAME/visionquery-ai
cd visionquery-ai
pip install -r requirements.txt
python app.py

App runs at http://localhost:7860

Citation

@article{zhai2023sigmoid,
  title     = {Sigmoid Loss for Language Image Pre-Training},
  author    = {Zhai, Xiaohua and others},
  journal   = {arXiv:2303.15343},
  year      = {2023},
  publisher = {Google DeepMind}
}