Spaces:
Sleeping
A newer version of the Gradio SDK is available: 6.13.0
title: AI Visual Album Recommender
emoji: ๐ฟ
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.1
app_file: app.py
pinned: false
๐บ Project Demo & Technical Walkthrough
You can watch the full video presentation, including the EDA analysis and the model demonstration, here: Watch Demo on Loom
๐ต AI Visual Album Recommender: A Computer Vision Approach to Music Discovery
๐ง Introduction
This project explores the intersection of Computer Vision and Musicology by developing a recommendation system based on the visual aesthetics of album cover art. The core premise is that an album's visual presentation is a reflection of its sonic identity and genre.
๐ Data Selection & Exploratory Data Analysis (EDA)
The research began with the selection of a robust visual dataset: 20k-Album-Covers-within-20-Genres.
- Data Composition: The dataset comprises 20,000 high-quality images of album covers, categorized into 20 distinct musical genres.
- Integrity & Quality: During the EDA phase, the data structure was meticulously analyzed. The dataset was found to be perfectly clean with zero missing values and valid labels.
- Class Distribution: A critical finding was that the dataset is perfectly balanced, with exactly 1,000 samples per genre.
๐ ๏ธ Technical Methodology
1. Feature Extraction & Embedding Generation
The technical core involves transforming visual art into numerical vectors:
- Vision Transformer (ViT): We utilized a Vision Transformer (ViT) model to extract high-dimensional features. Each cover was transformed into a 768-dimensional embedding vector. These vectors represent the "visual DNA" of the music.
2. Dimensionality Reduction & Clustering
To validate the quality of our embeddings, we employed:
- UMAP: Used for dimensionality reduction to project the 768D vectors into a 2D space.
- K-Means Clustering: An unsupervised algorithm used to group the data into 20 clusters based solely on their visual embeddings.
๐ฌ Validation and Results Analysis
By comparing the visualization of the ground-truth genres with the unsupervised K-Means clusters, we observed a high degree of correlation.
This confirms that our ViT-based vectors successfully captured the "visual language" of music. Visually similar genres naturally clustered together in the vector space, proving our system is effective for visual-based recommendations using Cosine Similarity.
๐ฎ Interactive Discovery (Gradio Interface)
The final model is deployed via an interactive interface where users can input an album index to discover visually similar music.
๐ Recommended Test Cases:
- Index 2700 (Rock): Showcases high-energy, iconic rock aesthetics.
- Index 500 (Blues): Explores the melancholic and warm visual tones of the blues.
- Index 4697 (Hip-Hop): Highlights urban art, graffiti-style graphics, and vibrant contrasts.
๐ Conclusion
This project demonstrates that visual embeddings can effectively facilitate semantic discovery without manual tagging. By bridging Computer Vision and Recommendation Systems, we have created a tool that understands the "vibe" of music through its art.

