Upload 2 files
Browse files## 📱 Mobile Price Prediction (NPR) | Quick Summary
**Objective:** Predict smartphone prices in Nepal using a regression pipeline built on messy scraped data.
### 🛠️ The Tech Stack
* **Data:** 109 cleaned records (from 127 raw) of mobile specs in Nepal.
* **Features:** RAM, Storage, 5G, "Ultra/Pro" status, Foldable status, and a custom **Premium Score**.
* **Top Model:** **Gradient Boosting Regressor** ($R^2 = 0.7487$, MAE $\approx$ NPR 31k).
### 🔑 Key Findings
1. **Brand is King:** Apple and Samsung (Ultra/Fold) drive the highest price premiums.
2. **Interaction Matters:** The relationship between `RAM x Storage` is a better predictor than either alone.
3. **Tiered Logic:** The model successfully categorizes phones from Budget (<20k) to Flagship (≥120k).
### 🏃 Use It in 3 Lines
```python
import joblib
bundle = joblib.load('mobile_price_model.pkl')
# Input: [RAM, Storage, 5G, Ultra, Pro, Foldable, Interaction, Log_Store, Premium, Brand]
price = bundle['model'].predict(sample_df[bundle['feature_cols']])
```
---
### 💡 Pro-Tips for "V2"
* **Data Scarcity:** With only 109 rows, Gradient Boosting might overfit. Consider **Simple Linear Regression** or **ElasticNet** as benchmarks to see if the complexity is truly paying off.
* **Feature Scaling:** Ensure your `StandardScaler` (if used) is applied inside a `Pipeline` object to prevent data leakage during cross-validation.
* **Categorical Handling:** Since you have few brands, try **One-Hot Encoding** instead of Ordinal Encoding to see if it captures brand-specific "hype" better without implying a mathematical order.
- mobile_price_model_2.pkl +3 -0
- requirements.txt.txt +22 -0
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:2fdb09b24d45e64c1bbdd0950db5a1c483882aa1e89e73b56fbb674fc472cef8
|
| 3 |
+
size 136595
|
|
@@ -0,0 +1,22 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
python>=3.8
|
| 2 |
+
numpy>=1.24
|
| 3 |
+
pandas>=2.0
|
| 4 |
+
scikit-learn>=1.2
|
| 5 |
+
joblib>=1.2
|
| 6 |
+
matplotlib>=3.7
|
| 7 |
+
seaborn>=0.12
|
| 8 |
+
scipy>=1.10
|
| 9 |
+
|
| 10 |
+
# Optional / useful for extended workflows
|
| 11 |
+
# For faster gradient-boosted models (if you later switch from RandomForest)
|
| 12 |
+
xgboost>=1.7 # optional
|
| 13 |
+
lightgbm>=3.3 # optional
|
| 14 |
+
|
| 15 |
+
# If you use the scraper / JS rendering scripts from earlier messages
|
| 16 |
+
requests>=2.31
|
| 17 |
+
beautifulsoup4>=4.12
|
| 18 |
+
lxml>=4.9
|
| 19 |
+
playwright>=1.36 # optional, only if you use Playwright; run `playwright install` after installing
|
| 20 |
+
|
| 21 |
+
# Jupyter / notebook support (optional)
|
| 22 |
+
jupyterlab>=4.0
|