Assignment_3 / README.md
Yoel125's picture
Update README.md
d138130 verified
|
Raw
History Blame Contribute Delete
6.48 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade
metadata
title: Assignment 3
emoji: 馃
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 6.15.2
python_version: '3.13'
app_file: app.py
pinned: false
short_description: Bird - recommender

Smart Bird Tracker App

Video

Description

For this assignment, I selected the JotDe/birds dataset.

  • Source: Hugging Face Datasets (https://huggingface.co/datasets/JotDe/birds).
  • Size: The complete dataset contains 11,788 bird images, divided into train and test splits. (Note: For the computational steps like embeddings and clustering, a balanced subset of 3,000 images was used to preserve structure while remaining computationally efficient).
  • Features: The dataset provides visual and textual data, making it ideal for multi-modal embeddings. The key features include:
    • image: The visual data (PIL Image object).
    • label: An integer ID representing the bird species (there are 200 distinct species, such as Yellow-breasted Chat, Albatross, etc.).
    • description: A textual description detailing the visual characteristics of the bird.
    • file_name: The original file name of the image.

EDA

Visual Data Inspection:

We are using the matplotlib library to render the actual visual PIL Image objects of the first 3 birds in our dataset side-by-side. we are doing it because the goal of our application revolves around the vision modality, we must verify that the images load correctly and are not corrupted, and actually contain birds ,this confirms our visual data is "healthy". 爪讬诇讜诐 诪住讱 2026-06-06 讘-19.06.03 As seen in the images clearly display distinct birds (Label 0), confirming our dataset is healthy and ready for the vision-based recommendation model.

Train/Test Split Statistics

We are extracting the exact number of rows (images) in both the training and testing datasets and plotting a quick bar chart to visualize the split. we are doing it because is important to know how much data we have available for building our embeddings versus evaluating them. A massive imbalance in the split could affect how we decide to subset the data later. 爪讬诇讜诐 诪住讱 2026-06-06 讘-19.06.20 The output shows a very healthy, well-balanced split (roughly 6,000 train and 5,800 test images), ensuring we have plenty of data to safely build our embeddings without bias.

Label Distribution Analysis

We are extracting all the species labels from the training dataset, counting how many images belong to each bird species, and plotting the Top 20 most frequent species in a bar chart. We are doing it because we need to check if the dataset is balanced. If one bird species has 1,000 images and another only has 2, our model might become biased toward the most common species. Understanding this distribution helps us explain why our recommendation algorithm performs the way it does. 爪讬诇讜诐 诪住讱 2026-06-06 讘-19.06.36 The output reveals a perfectly balanced distribution (exactly 30 images per species), guaranteeing that our recommendation algorithm will not become biased toward any specific type of bird.

Image Dimension Analysis

We are taking a random sample of 200 images from the training set, extracting their width and height in pixels, and plotting them on a scatter plot.(i did it with 200 pictures so the platform that im using (google colab) will work soomthly) we are doing it because Image models (like the CLIP model we will use for embeddings) often require images to be a certain size or aspect ratio. By analyzing the dimensions, we can see if our dataset contains mostly uniform squares, tall rectangles, or wide panoramas, which helps us understand how much reshaping the model will have to do. 爪讬诇讜诐 诪住讱 2026-06-06 讘-19.06.55 The output shows that almost all images have at least one dimension capped at exactly 500 pixels, meaning our CLIP embedding model can easily and consistently resize them into its required format without causing heavy visual distortion.

Embeddings

Visualization: K-Means Clustering

We are using the "K-Means" algorithm to group our bird embeddings into 10 distinct clusters based on their mathematical similarities. Then, we are graphing those clusters on our 2D PCA projection so we can physically see the groupings. we are doing it because Embeddings are just raw numbers. By applying K-Means, we force the model to sort the birds into groups. Visualizing this proves that our embeddings "work"鈥攊f the model successfully captured the visual data, birds that look similar should be clustered together in the same color on our chart. 爪讬诇讜诐 诪住讱 2026-06-06 讘-19.07.18 The output shows that almost all images have at least one dimension capped at exactly 500 pixels, meaning our CLIP embedding model can easily and consistently resize them into its required format without causing heavy visual distortion.

Cluster Coherency & Reasoning

we are selecting a specific cluster (in this case, Cluster 3) and plotting 5 bird images that the model mathematically grouped into that cluster. we are doing it because we need to prove our embeddings are "coherent". If the CLIP model successfully captured visual features, the birds in Cluster 3 should share obvious visual traits鈥攍ike all being water birds, all having yellow feathers, or all having long beaks. We need our human eyes to verify that the math makes intuitive sense. 爪讬诇讜诐 诪住讱 2026-06-06 讘-19.07.41 The output confirms high coherency: despite belonging to different species, all five birds share the distinct visual trait of a bright yellow belly, proving our embedding model successfully captured actual physical characteristics.