Spaces:

nikkoyabut
/

clip_zero_shot_classifier

Sleeping

App Files Files Community

clip_zero_shot_classifier / README.md

nikkoyabut

Upload 8 files

f925e95 verified 10 months ago

preview code

raw

history blame contribute delete

2.46 kB

A newer version of the Gradio SDK is available: 6.5.1

Upgrade

metadata

title: CLIP Zero-Shot Classifier
emoji: 🖼️
colorFrom: indigo
colorTo: green
sdk: gradio
sdk_version: 4.24.0
app_file: app.py
pinned: false

🖼️ CLIP Zero-Shot Classifier

This interactive web app demonstrates a zero-shot image classification system using OpenAI's CLIP model (ViT-B/32) and a custom Gradio interface.

🚀 What It Does

CLIP can understand images and text in the same embedding space. With this app, you can:

Upload an image
Enter any number of labels (comma-separated)
Get predictions on how likely the image matches each label — even without training!

💡 How It Works

The input image is preprocessed and encoded using CLIP.
Your custom labels are tokenized and also encoded.
The cosine similarity between image and text embeddings is computed.
The results are displayed with a probability score and a visual bar indicator.

📦 Technologies Used

Gradio — for the interactive web interface
OpenAI CLIP — the core model for zero-shot classification
PyTorch — model backend
Hugging Face Spaces — for easy and free deployment

📷 Example Use Cases

Test if an image matches multiple tags
Quickly validate custom labels
Educational demos for multimodal ML

🛠️ How to Use

Upload an image.
Type in labels like: a cat, a dog, a diagram, a spacecraft
Click Classify.
See prediction probabilities and visual bars for each label.

📍 Notes

You can enter any text labels — even abstract or creative ones!
Works best on natural images (e.g., animals, objects, scenes)

📓 Notebook

You can explore the companion Jupyter notebook here: 📘 Open notebook.ipynb

👤 About Me

I'm Nikko, a Machine Learning Engineer and AI enthusiast with a Master's degree in Artificial Intelligence from the University of the Philippines Diliman. With over a decade of experience in ICT consulting and telecommunications, I now specialize in vision-language models, LLMs, and generative AI applications.

I'm passionate about creating systems where AI and humans can collaborate seamlessly — working toward a future where smart cities and intelligent automation become reality.

Feel free to connect with me on LinkedIn.

Made with ❤️ using CLIP + Gradio