ConquestAce commited on
Commit
c633121
·
verified ·
1 Parent(s): 787d41e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +160 -3
README.md CHANGED
@@ -7,11 +7,165 @@ datasets:
7
  pipeline_tag: audio-classification
8
  tags:
9
  - music
 
 
 
 
 
 
 
10
  ---
11
 
12
- This model is extremely weak. I am not good at data science
13
 
14
- # Iterations
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  **null**:
16
 
17
  <details>
@@ -90,4 +244,7 @@ with torch.no_grad():
90
 
91
  ```
92
 
93
- </details>
 
 
 
 
7
  pipeline_tag: audio-classification
8
  tags:
9
  - music
10
+ - spotify
11
+ - machine-learning
12
+ - music-prediction
13
+ - data-science
14
+ - regression
15
+ - classification
16
+ - popularity-analysis`
17
  ---
18
 
19
+ # 🎵 Spotify Song Popularity Prediction
20
 
21
+ Predict the popularity of a song based on its audio features and estimate potential Spotify royalties.
22
+
23
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
24
+
25
+ ---
26
+
27
+ ## 📖 Project Overview
28
+
29
+ This project explores machine learning models to predict the popularity of songs using publicly available features such as danceability, energy, tempo, and valence. It also demonstrates a prototype pricing tool that estimates potential Spotify revenue based on predicted popularity.
30
+
31
+ Despite the challenges in accurately forecasting popularity due to time-evolving factors, our models show that **minimum popularity and expected revenue can be estimated** using machine learning techniques.
32
+
33
+ ---
34
+
35
+ ## 📊 Dataset
36
+
37
+ - **Source**:
38
+ - Spotify Web API
39
+ - Original Dataset (~114,000 songs) expanded to **~2 million songs**
40
+ - **Features**:
41
+ - Acoustic features (energy, danceability, valence, etc.)
42
+ - Target variable: `popularity` (integer from 0–100)
43
+
44
+ ---
45
+
46
+ ## 🔬 Methods
47
+
48
+ - **Data Cleaning and Preparation**:
49
+ - Removed zero-popularity entries, duplicates (~8% of rows), and outliers
50
+ - Standardized genres using clustering
51
+ - **Exploratory Data Analysis (EDA)**:
52
+ - Analyzed distributions, correlations, and cumulative trends
53
+ - **Modeling**:
54
+ - Linear Regression, Ridge Regression
55
+ - Decision Tree, Random Forest, AdaBoost (best recall: 86% on popular songs)
56
+ - XGBoost (binning) and Neural Networks
57
+ - **Revenue Estimation**:
58
+ - Quadratic regression fit between predicted popularity and play counts
59
+ - Prototype pricing tool predicting Spotify revenue for songs
60
+
61
+ ---
62
+
63
+ ## 🏆 Results
64
+
65
+ | Model | Highlights |
66
+ |-------------------------|----------------------------------------------|
67
+ | Linear/Ridge Regression | Poor fit due to complex, noisy data |
68
+ | Random Forest | Best overall stability (recall on populars) |
69
+ | AdaBoost (weighted) | **Best performance**: 86% recall for popular songs |
70
+ | Neural Networks | Showed challenges due to "popularity" instability |
71
+
72
+ - Predicted revenue for a song with **popularity 55** ≈ **\$357,000 CAD**.
73
+ - Pricing tool demonstrated practical viability despite prediction limitations.
74
+
75
+ ---
76
+
77
+ ## 📈 Example
78
+
79
+ Predicting a song’s revenue based on its feature vector:
80
+
81
+ ```python
82
+ # Example (simplified)
83
+ predicted_popularity = model.predict(features)
84
+ predicted_revenue = pricing_function(predicted_popularity)
85
+ ```
86
+
87
+ ---
88
+
89
+ ## 🚀 How to Run
90
+
91
+ ```bash
92
+ # Clone this repo
93
+ git clone https://huggingface.co/username/spotify-popularity-prediction
94
+
95
+ # Install dependencies
96
+ pip install -r requirements.txt
97
+
98
+ # Train or evaluate models
99
+ python train_models.py
100
+ python evaluate_models.py
101
+
102
+ # Predict song revenue
103
+ python pricing_tool.py
104
+ ```
105
+
106
+ (Adaptable scripts for different model types: AdaBoost, Random Forest, Neural Net.)
107
+
108
+ ---
109
+
110
+ ## 🤔 Limitations
111
+
112
+ - Song features alone are **not sufficient** for high-accuracy predictions.
113
+ - "Popularity" is a **time-dependent** and **dynamic** metric.
114
+ - Genre diversity (>5000 unique genres) complicated modeling.
115
+
116
+ ---
117
+
118
+ ## 🧠 Future Work
119
+
120
+ - Predict **play count** directly instead of popularity.
121
+ - Fine-tune **XGBoost** and **deep neural networks** on larger datasets.
122
+ - Integrate **time-evolution models** for dynamic popularity changes.
123
+ - Improve genre classification with unsupervised learning (e.g., genre embeddings).
124
+
125
+ ---
126
+
127
+ ## 📚 Citation
128
+
129
+ If you use this project, please cite:
130
+
131
+ ```bibtex
132
+ @misc{bhuiyan2024spotify,
133
+ title={Spotify Song Popularity Prediction},
134
+ author={Ashiful Bhuiyan, Blanca Fernández Méndez, Nazanin Ghelichi, Pavle Curcin},
135
+ year={2024},
136
+ institution={York University},
137
+ }
138
+ ```
139
+
140
+ ---
141
+
142
+ ## 🧑‍💻 Authors
143
+
144
+ - Ashiful Bhuiyan
145
+ - Blanca Elvira Fernández Méndez
146
+ - Nazanin Ghelichi
147
+ - Pavle Curcin
148
+
149
+ ---
150
+ ## 📄 License
151
+
152
+ This project is licensed under the [MIT License](LICENSE).
153
+
154
+ ---
155
+
156
+ ---
157
+ **Would you also like me to create**:
158
+ - A **`README.md` file** version you can upload directly?
159
+ - A **short Hugging Face model card** (if you plan to deploy it as a model too)?
160
+
161
+ (They have slightly different requirements!) 🎯
162
+ Would you like it? 🚀
163
+
164
+ # `popularity_predictor.pth`
165
+
166
+ This neural network model is extremely weak. I was not good at data science when I made this
167
+
168
+ ## Iterations
169
  **null**:
170
 
171
  <details>
 
244
 
245
  ```
246
 
247
+ </details>
248
+
249
+ # 🏷 Tags
250
+ `#spotify` `#machine-learning` `#music-prediction` `#data-science` `#regression` `#classification` `#popularity-analysis`