Spaces:

nikkoyabut
/

clip_zero_shot_classifier

Runtime error

App Files Files Community

nikkoyabut commited on Apr 24, 2025

Commit

f925e95

verified ·

1 Parent(s): 891c569

Upload 8 files

Browse files

Files changed (9) hide show

.gitattributes +2 -0
README.md +66 -6
app.py +103 -0
images/boy.jpg +0 -0
images/boy_dog.jpg +0 -0
images/dog.jpg +0 -0
notebook/CLIP.png +3 -0
notebook/clip_architecture.png +3 -0
notebook/clip_inspect.ipynb +0 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+notebook/clip_architecture.png filter=lfs diff=lfs merge=lfs -text
+notebook/CLIP.png filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -1,12 +1,72 @@
 ---
-title: Clip Zero Shot Classifier
-emoji: 👁
-colorFrom: gray
-colorTo: blue
 sdk: gradio
-sdk_version: 5.26.0
 app_file: app.py
 pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: CLIP Zero-Shot Classifier
+emoji: 🖼️
+colorFrom: indigo
+colorTo: green
 sdk: gradio
+sdk_version: "4.24.0"
 app_file: app.py
 pinned: false
 ---
+# 🖼️ CLIP Zero-Shot Classifier
+This interactive web app demonstrates a **zero-shot image classification** system using **OpenAI's CLIP model** (`ViT-B/32`) and a custom Gradio interface.
+## 🚀 What It Does
+CLIP can understand images and text in the same embedding space. With this app, you can:
+- Upload an image
+- Enter any number of labels (comma-separated)
+- Get predictions on how likely the image matches each label — **even without training!**
+## 💡 How It Works
+1. The input image is preprocessed and encoded using CLIP.
+2. Your custom labels are tokenized and also encoded.
+3. The cosine similarity between image and text embeddings is computed.
+4. The results are displayed with a probability score and a visual bar indicator.
+## 📦 Technologies Used
+- [Gradio](https://www.gradio.app/) — for the interactive web interface
+- [OpenAI CLIP](https://github.com/openai/CLIP) — the core model for zero-shot classification
+- PyTorch — model backend
+- Hugging Face Spaces — for easy and free deployment
+## 📷 Example Use Cases
+- Test if an image matches multiple tags
+- Quickly validate custom labels
+- Educational demos for multimodal ML
+## 🛠️ How to Use
+1. Upload an image.
+2. Type in labels like: `a cat, a dog, a diagram, a spacecraft`
+3. Click **Classify**.
+4. See prediction probabilities and visual bars for each label.
+## 📍 Notes
+- You can enter *any text labels* — even abstract or creative ones!
+- Works best on natural images (e.g., animals, objects, scenes)
+## 📓 Notebook
+You can explore the companion Jupyter notebook here:
+[📘 Open notebook.ipynb](./notebook.ipynb)
+---
+## 👤 About Me
+I'm **Nikko**, a Machine Learning Engineer and AI enthusiast with a Master's degree in Artificial Intelligence from the University of the Philippines Diliman. With over a decade of experience in ICT consulting and telecommunications, I now specialize in **vision-language models**, **LLMs**, and **generative AI applications**.
+I'm passionate about creating systems where AI and humans can collaborate seamlessly — working toward a future where **smart cities** and intelligent automation become reality.
+Feel free to connect with me on [LinkedIn](https://www.linkedin.com/in/nikkoyabut/).
+---
+Made with ❤️ using CLIP + Gradio

app.py ADDED Viewed

	@@ -0,0 +1,103 @@

+# app.py
+# 🛠️ Setup
+# pip install -q gradio torch ftfy regex tqdm git+https://github.com/openai/CLIP.git matplotlib
+# 📦 Imports
+import gradio as gr
+import torch
+import clip
+from PIL import Image
+import numpy as np
+from typing import List, Tuple, Union
+# 🚀 Load CLIP Model
+device: str = "cuda" if torch.cuda.is_available() else "cpu"
+model, preprocess = clip.load("ViT-B/32", device=device)
+def predict(image: Image.Image, label_text: str) -> List[List[Union[str, float]]]:
+    """
+    Perform zero-shot classification using the CLIP model.
+    Args:
+        image (PIL.Image.Image): Input image.
+        label_text (str): Comma-separated labels to classify against.
+    Returns:
+        List[List[Union[str, float]]]: A list of results with label, probability, and confidence bar HTML.
+    """
+    labels: List[str] = [label.strip() for label in label_text.split(",") if label.strip()]
+    if not image or not labels:
+        return []
+    # Preprocess inputs
+    image_input: torch.Tensor = preprocess(image).unsqueeze(0).to(device)
+    text_inputs: torch.Tensor = clip.tokenize(labels).to(device)
+    # Run model
+    with torch.no_grad():
+        image_features: torch.Tensor = model.encode_image(image_input)
+        text_features: torch.Tensor = model.encode_text(text_inputs)
+        logits_per_image, _ = model(image_input, text_inputs)
+        probs: np.ndarray = logits_per_image.softmax(dim=-1).cpu().numpy()[0]
+    # Create table with bar visualization
+    results: List[List[Union[str, float]]] = []
+    for label, prob in zip(labels, probs):
+        bar_html: str = (
+            f'<div style="background-color:#4caf50;width:{prob * 100:.1f}%;height:20px;"></div>'
+        )
+        results.append([label, f"{prob * 100:.2f}%", bar_html])
+    return results
+# 🎨 Gradio Interface
+with gr.Blocks() as demo:
+    gr.Markdown("## CLIP Zero-Shot Classifier")
+    with gr.Row():
+        image = gr.Image(type="pil", label="Upload Image")
+        label_text = gr.Textbox(
+            lines=2,
+            label="Enter comma-separated labels",
+            placeholder="e.g., a cat, a dog, a diagram"
+        )
+    # Image Examples
+    with gr.Row():
+        gr.Examples(
+            examples=[
+                ["images/boy.jpg"],
+                ["images/dog.jpg"],
+                ["images/boy_dog.jpg"]
+            ],
+            inputs=[image],
+            label="🖼️ Click to select example image"
+        )
+        # Label Text Examples
+        gr.Examples(
+            examples=[
+                ["boy, girl, dog, cat"],
+                ["a boy with a dog, a boy with a cat, a girl with a dog, a girl with a cat"],
+                ["a cat, a dog, a diagram"]
+            ],
+            inputs=[label_text],
+            label="📝 Click to autofill example labels"
+        )
+    submit = gr.Button("Classify")
+    output = gr.Dataframe(
+        headers=["Label", "Probability", "Confidence Bar"],
+        datatype=["str", "str", "html"],
+        row_count=5,
+        interactive=False
+    )
+    submit.click(fn=predict, inputs=[image, label_text], outputs=output)
+if __name__ == "__main__":
+    demo.launch(share=True)

images/boy.jpg ADDED Viewed

images/boy_dog.jpg ADDED Viewed

images/dog.jpg ADDED Viewed

notebook/CLIP.png ADDED Viewed

Git LFS Details

SHA256: 308a3ca4503f1c7a07803916c369d78c4ef501e5ab7fc727da9b5e1d2f9ec85b
Pointer size: 131 Bytes
Size of remote file: 252 kB

notebook/clip_architecture.png ADDED Viewed

Git LFS Details

SHA256: 0cfdaa3c4d98a4ba3d7afb811560f43414ba93abaec2a69748d16c823034a6d6
Pointer size: 132 Bytes
Size of remote file: 2.65 MB

notebook/clip_inspect.ipynb ADDED Viewed

The diff for this file is too large to render. See raw diff