nikkoyabut's picture
Upload 8 files
f925e95 verified

A newer version of the Gradio SDK is available: 6.5.1

Upgrade
metadata
title: CLIP Zero-Shot Classifier
emoji: πŸ–ΌοΈ
colorFrom: indigo
colorTo: green
sdk: gradio
sdk_version: 4.24.0
app_file: app.py
pinned: false

πŸ–ΌοΈ CLIP Zero-Shot Classifier

This interactive web app demonstrates a zero-shot image classification system using OpenAI's CLIP model (ViT-B/32) and a custom Gradio interface.

πŸš€ What It Does

CLIP can understand images and text in the same embedding space. With this app, you can:

  • Upload an image
  • Enter any number of labels (comma-separated)
  • Get predictions on how likely the image matches each label β€” even without training!

πŸ’‘ How It Works

  1. The input image is preprocessed and encoded using CLIP.
  2. Your custom labels are tokenized and also encoded.
  3. The cosine similarity between image and text embeddings is computed.
  4. The results are displayed with a probability score and a visual bar indicator.

πŸ“¦ Technologies Used

  • Gradio β€” for the interactive web interface
  • OpenAI CLIP β€” the core model for zero-shot classification
  • PyTorch β€” model backend
  • Hugging Face Spaces β€” for easy and free deployment

πŸ“· Example Use Cases

  • Test if an image matches multiple tags
  • Quickly validate custom labels
  • Educational demos for multimodal ML

πŸ› οΈ How to Use

  1. Upload an image.
  2. Type in labels like: a cat, a dog, a diagram, a spacecraft
  3. Click Classify.
  4. See prediction probabilities and visual bars for each label.

πŸ“ Notes

  • You can enter any text labels β€” even abstract or creative ones!
  • Works best on natural images (e.g., animals, objects, scenes)

πŸ““ Notebook

You can explore the companion Jupyter notebook here: πŸ“˜ Open notebook.ipynb


πŸ‘€ About Me

I'm Nikko, a Machine Learning Engineer and AI enthusiast with a Master's degree in Artificial Intelligence from the University of the Philippines Diliman. With over a decade of experience in ICT consulting and telecommunications, I now specialize in vision-language models, LLMs, and generative AI applications.

I'm passionate about creating systems where AI and humans can collaborate seamlessly β€” working toward a future where smart cities and intelligent automation become reality.

Feel free to connect with me on LinkedIn.


Made with ❀️ using CLIP + Gradio