galcomis's picture
Update README.md
2a30a26 verified

A newer version of the Gradio SDK is available: 6.13.0

Upgrade
metadata
title: AI Visual Album Recommender
emoji: ๐Ÿ’ฟ
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.1
app_file: app.py
pinned: false

๐Ÿ“บ Project Demo & Technical Walkthrough

You can watch the full video presentation, including the EDA analysis and the model demonstration, here: Watch Demo on Loom

๐ŸŽต AI Visual Album Recommender: A Computer Vision Approach to Music Discovery

๐Ÿง Introduction

This project explores the intersection of Computer Vision and Musicology by developing a recommendation system based on the visual aesthetics of album cover art. The core premise is that an album's visual presentation is a reflection of its sonic identity and genre.


๐Ÿ“Š Data Selection & Exploratory Data Analysis (EDA)

The research began with the selection of a robust visual dataset: 20k-Album-Covers-within-20-Genres.

  • Data Composition: The dataset comprises 20,000 high-quality images of album covers, categorized into 20 distinct musical genres.
  • Integrity & Quality: During the EDA phase, the data structure was meticulously analyzed. The dataset was found to be perfectly clean with zero missing values and valid labels.
  • Class Distribution: A critical finding was that the dataset is perfectly balanced, with exactly 1,000 samples per genre.

image


๐Ÿ› ๏ธ Technical Methodology

1. Feature Extraction & Embedding Generation

The technical core involves transforming visual art into numerical vectors:

  • Vision Transformer (ViT): We utilized a Vision Transformer (ViT) model to extract high-dimensional features. Each cover was transformed into a 768-dimensional embedding vector. These vectors represent the "visual DNA" of the music.

2. Dimensionality Reduction & Clustering

To validate the quality of our embeddings, we employed:

  • UMAP: Used for dimensionality reduction to project the 768D vectors into a 2D space.
  • K-Means Clustering: An unsupervised algorithm used to group the data into 20 clusters based solely on their visual embeddings.

๐Ÿ”ฌ Validation and Results Analysis

By comparing the visualization of the ground-truth genres with the unsupervised K-Means clusters, we observed a high degree of correlation.

image

This confirms that our ViT-based vectors successfully captured the "visual language" of music. Visually similar genres naturally clustered together in the vector space, proving our system is effective for visual-based recommendations using Cosine Similarity.


๐ŸŽฎ Interactive Discovery (Gradio Interface)

The final model is deployed via an interactive interface where users can input an album index to discover visually similar music.

๐Ÿ” Recommended Test Cases:

  • Index 2700 (Rock): Showcases high-energy, iconic rock aesthetics.
  • Index 500 (Blues): Explores the melancholic and warm visual tones of the blues.
  • Index 4697 (Hip-Hop): Highlights urban art, graffiti-style graphics, and vibrant contrasts.

๐ŸŽ“ Conclusion

This project demonstrates that visual embeddings can effectively facilitate semantic discovery without manual tagging. By bridging Computer Vision and Recommendation Systems, we have created a tool that understands the "vibe" of music through its art.