Spaces:

adisaljusi
/

computer_vision_classification_model_comparison

Runtime error

App Files Files Community

adisaljusi commited on Apr 7

Commit

3e6a4eb

1 Parent(s): 21b4eed

Refactor code structure for improved readability

Browse files

Files changed (13) hide show

.gitattributes +1 -1
.gitignore +8 -0
README.md +56 -2
app.py +113 -0
example_images/airplane.jpg +0 -0
example_images/automobile.jpg +0 -0
example_images/cat.jpg +0 -0
example_images/dog.jpg +0 -0
example_images/horse.jpg +0 -0
example_images/ship.jpg +0 -0
requirements-dev.txt +8 -0
requirements.txt +115 -0
train.ipynb +0 -0

.gitattributes CHANGED Viewed

@@ -32,4 +32,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.xz filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
-*tfevents* filter=lfs diff=lfs merge=lfs -text

 *.xz filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text

.gitignore ADDED Viewed

	@@ -0,0 +1,8 @@

+requirements.md
+__pycache__/
+*.pyc
+.env
+*.pt
+*.pth
+checkpoint-*/
+cifar10-vit/

README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 title: Computer Vision Classification Model Comparison
-emoji: 📊
 colorFrom: purple
 colorTo: gray
 sdk: gradio
@@ -10,4 +10,58 @@ pinned: false
 short_description: 'Block 2 '
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
 title: Computer Vision Classification Model Comparison
+emoji: "\U0001F4CA"
 colorFrom: purple
 colorTo: gray
 sdk: gradio
 short_description: 'Block 2 '
 ---
+# CIFAR-10 Image Classification — Model Comparison
+Compare three classification approaches on CIFAR-10 images:
+- **Fine-tuned ViT** ([adisaljusi/vit-base-cifar10](https://huggingface.co/adisaljusi/vit-base-cifar10)) — transfer learning model trained on CIFAR-10
+- **CLIP Zero-Shot** (`openai/clip-vit-large-patch14`) — open-source zero-shot classification
+- **OpenAI GPT-4.1-mini** — closed-source vision model via API
+## Dataset
+**CIFAR-10** — 60,000 32x32 color images in 10 classes (6,000 images per class):
+| Split | Images |
+|-------|--------|
+| Train | 50,000 |
+| Test  | 10,000 |
+**Classes:** airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck
+Source: [Hugging Face `uoft-cs/cifar10`](https://huggingface.co/datasets/uoft-cs/cifar10)
+## Preprocessing
+- Resize from 32x32 to 224x224 (ViT input size)
+- Normalize pixel values with mean=0.5, std=0.5 per channel
+- Convert all images to RGB
+Applied using `AutoImageProcessor` from `google/vit-base-patch16-224`.
+## Model & Evaluation
+**Base model:** [google/vit-base-patch16-224](https://huggingface.co/google/vit-base-patch16-224)
+**Transfer learning approach:** All layers frozen except the final classification head (10 outputs for CIFAR-10 classes). Only 7,690 of 85.8M parameters are trainable.
+**Training config:** 5 epochs, batch size 16, learning rate 3e-4, AdamW optimizer.
+### Training Results
+| Epoch | Training Loss | Validation Loss | Accuracy |
+|------:|--------------:|----------------:|---------:|
+| _To be filled after training_ | | | |
+## Links
+- **Model:** [adisaljusi/vit-base-cifar10](https://huggingface.co/adisaljusi/vit-base-cifar10)
+- **App:** [adisaljusi/computer-vision-classification-model-comparison](https://huggingface.co/spaces/adisaljusi/computer-vision-classification-model-comparison)
+## Comparison Results
+Results on example images comparing all three models:
+| Image | True Class | ViT Top-1 (score) | CLIP Top-1 (score) | OpenAI (label, confidence) |
+|-------|-----------|-------------------|-------------------|---------------------------|
+| _To be filled after running the app_ | | | | |

app.py ADDED Viewed

	@@ -0,0 +1,113 @@

+import base64
+import json
+import os
+import gradio as gr
+from dotenv import load_dotenv
+from openai import OpenAI
+from transformers import pipeline
+load_dotenv()
+OPENAI_MODEL = os.getenv("OPENAI_MODEL", "gpt-4.1-mini")
+OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
+openai_client = OpenAI(api_key=OPENAI_API_KEY) if OPENAI_API_KEY else None
+# Load models
+vit_classifier = pipeline("image-classification", model="adisaljusi/vit-base-cifar10")
+clip_detector = pipeline(
+    model="openai/clip-vit-large-patch14",
+    task="zero-shot-image-classification",
+)
+labels_cifar10 = [
+    "airplane", "automobile", "bird", "cat", "deer",
+    "dog", "frog", "horse", "ship", "truck",
+]
+def encode_image(image_path):
+    with open(image_path, "rb") as image_file:
+        return base64.b64encode(image_file.read()).decode("utf-8")
+def classify_with_openai(image_path):
+    if openai_client is None:
+        return {
+            "error": "Missing OPENAI_API_KEY. Add it to your environment or .env file to enable OpenAI classification."
+        }
+    prompt = (
+        "Classify the object in this image. Choose the best matching label from this list: "
+        f"{', '.join(labels_cifar10)}. "
+        "Return valid JSON with exactly these keys: "
+        "label, confidence, reasoning. "
+        "The confidence must be a number between 0 and 1."
+    )
+    base64_image = encode_image(image_path)
+    response = openai_client.responses.create(
+        model=OPENAI_MODEL,
+        input=[
+            {
+                "role": "user",
+                "content": [
+                    {"type": "input_text", "text": prompt},
+                    {
+                        "type": "input_image",
+                        "image_url": f"data:image/jpeg;base64,{base64_image}",
+                    },
+                ],
+            }
+        ],
+    )
+    try:
+        parsed_response = json.loads(response.output_text)
+    except json.JSONDecodeError:
+        parsed_response = {
+            "raw_response": response.output_text,
+            "warning": "OpenAI response was not valid JSON.",
+        }
+    return parsed_response
+def classify_image(image):
+    vit_results = vit_classifier(image)
+    vit_output = {result["label"]: result["score"] for result in vit_results}
+    clip_results = clip_detector(image, candidate_labels=labels_cifar10)
+    clip_output = {result["label"]: result["score"] for result in clip_results}
+    openai_output = classify_with_openai(image)
+    return {
+        "ViT Classification": vit_output,
+        "CLIP Zero-Shot Classification": clip_output,
+        "OpenAI Vision Classification": openai_output,
+    }
+example_images = [
+    ["example_images/airplane.jpg"],
+    ["example_images/automobile.jpg"],
+    ["example_images/cat.jpg"],
+    ["example_images/dog.jpg"],
+    ["example_images/horse.jpg"],
+    ["example_images/ship.jpg"],
+]
+iface = gr.Interface(
+    fn=classify_image,
+    inputs=gr.Image(type="filepath"),
+    outputs=gr.JSON(),
+    title="CIFAR-10 Classification Comparison",
+    description=(
+        "Upload an image and compare classification results from three models: "
+        "a fine-tuned ViT model, a zero-shot CLIP model, and OpenAI GPT-4.1-mini vision."
+    ),
+    examples=example_images,
+)
+iface.launch()

example_images/airplane.jpg ADDED Viewed

example_images/automobile.jpg ADDED Viewed

example_images/cat.jpg ADDED Viewed

example_images/dog.jpg ADDED Viewed

example_images/horse.jpg ADDED Viewed

example_images/ship.jpg ADDED Viewed

requirements-dev.txt ADDED Viewed

	@@ -0,0 +1,8 @@

+-r requirements.txt
+ipykernel
+datasets
+evaluate
+matplotlib
+numpy
+huggingface-hub
+ipywidgets

requirements.txt ADDED Viewed

	@@ -0,0 +1,115 @@

+annotated-doc==0.0.4
+    # via typer
+annotated-types==0.7.0
+    # via pydantic
+anyio==4.13.0
+    # via
+    #   httpx
+    #   openai
+certifi==2026.2.25
+    # via
+    #   httpcore
+    #   httpx
+click==8.3.2
+    # via typer
+distro==1.9.0
+    # via openai
+filelock==3.25.2
+    # via
+    #   huggingface-hub
+    #   torch
+fsspec==2026.3.0
+    # via
+    #   huggingface-hub
+    #   torch
+h11==0.16.0
+    # via httpcore
+hf-xet==1.4.3
+    # via huggingface-hub
+httpcore==1.0.9
+    # via httpx
+httpx==0.28.1
+    # via
+    #   huggingface-hub
+    #   openai
+huggingface-hub==1.9.0
+    # via
+    #   tokenizers
+    #   transformers
+idna==3.11
+    # via
+    #   anyio
+    #   httpx
+jinja2==3.1.6
+    # via torch
+jiter==0.13.0
+    # via openai
+markdown-it-py==4.0.0
+    # via rich
+markupsafe==3.0.3
+    # via jinja2
+mdurl==0.1.2
+    # via markdown-it-py
+mpmath==1.3.0
+    # via sympy
+networkx==3.6.1
+    # via torch
+numpy==2.4.4
+    # via transformers
+openai==2.30.0
+    # via -r requirements.txt
+packaging==26.0
+    # via
+    #   huggingface-hub
+    #   transformers
+pydantic==2.12.5
+    # via openai
+pydantic-core==2.41.5
+    # via pydantic
+pygments==2.20.0
+    # via rich
+python-dotenv==1.2.2
+    # via -r requirements.txt
+pyyaml==6.0.3
+    # via
+    #   huggingface-hub
+    #   transformers
+regex==2026.4.4
+    # via transformers
+rich==14.3.3
+    # via typer
+safetensors==0.7.0
+    # via transformers
+setuptools==81.0.0
+    # via torch
+shellingham==1.5.4
+    # via typer
+sniffio==1.3.1
+    # via openai
+sympy==1.14.0
+    # via torch
+tokenizers==0.22.2
+    # via transformers
+torch==2.11.0
+    # via -r requirements.txt
+tqdm==4.67.3
+    # via
+    #   huggingface-hub
+    #   openai
+    #   transformers
+transformers==5.5.0
+    # via -r requirements.txt
+typer==0.24.1
+    # via
+    #   huggingface-hub
+    #   transformers
+typing-extensions==4.15.0
+    # via
+    #   huggingface-hub
+    #   openai
+    #   pydantic
+    #   pydantic-core
+    #   torch
+    #   typing-inspection
+typing-inspection==0.4.2
+    # via pydantic

train.ipynb ADDED Viewed

The diff for this file is too large to render. See raw diff