Spaces:

galcomis
/

album-recommender-final

Sleeping

App Files Files Community

album-recommender-final / README.md

galcomis

Update README.md

2a30a26 verified 4 months ago

preview code

raw

history blame contribute delete

3.56 kB

A newer version of the Gradio SDK is available: 6.13.0

Upgrade

metadata

title: AI Visual Album Recommender
emoji: 💿
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.1
app_file: app.py
pinned: false

📺 Project Demo & Technical Walkthrough

You can watch the full video presentation, including the EDA analysis and the model demonstration, here: Watch Demo on Loom

🎵 AI Visual Album Recommender: A Computer Vision Approach to Music Discovery

🧐 Introduction

This project explores the intersection of Computer Vision and Musicology by developing a recommendation system based on the visual aesthetics of album cover art. The core premise is that an album's visual presentation is a reflection of its sonic identity and genre.

📊 Data Selection & Exploratory Data Analysis (EDA)

The research began with the selection of a robust visual dataset: 20k-Album-Covers-within-20-Genres.

Data Composition: The dataset comprises 20,000 high-quality images of album covers, categorized into 20 distinct musical genres.
Integrity & Quality: During the EDA phase, the data structure was meticulously analyzed. The dataset was found to be perfectly clean with zero missing values and valid labels.
Class Distribution: A critical finding was that the dataset is perfectly balanced, with exactly 1,000 samples per genre.

🛠️ Technical Methodology

1. Feature Extraction & Embedding Generation

The technical core involves transforming visual art into numerical vectors:

Vision Transformer (ViT): We utilized a Vision Transformer (ViT) model to extract high-dimensional features. Each cover was transformed into a 768-dimensional embedding vector. These vectors represent the "visual DNA" of the music.

2. Dimensionality Reduction & Clustering

To validate the quality of our embeddings, we employed:

UMAP: Used for dimensionality reduction to project the 768D vectors into a 2D space.
K-Means Clustering: An unsupervised algorithm used to group the data into 20 clusters based solely on their visual embeddings.

🔬 Validation and Results Analysis

By comparing the visualization of the ground-truth genres with the unsupervised K-Means clusters, we observed a high degree of correlation.

This confirms that our ViT-based vectors successfully captured the "visual language" of music. Visually similar genres naturally clustered together in the vector space, proving our system is effective for visual-based recommendations using Cosine Similarity.

🎮 Interactive Discovery (Gradio Interface)

The final model is deployed via an interactive interface where users can input an album index to discover visually similar music.

🔍 Recommended Test Cases:

Index 2700 (Rock): Showcases high-energy, iconic rock aesthetics.
Index 500 (Blues): Explores the melancholic and warm visual tones of the blues.
Index 4697 (Hip-Hop): Highlights urban art, graffiti-style graphics, and vibrant contrasts.

🎓 Conclusion

This project demonstrates that visual embeddings can effectively facilitate semantic discovery without manual tagging. By bridging Computer Vision and Recommendation Systems, we have created a tool that understands the "vibe" of music through its art.