Sanoj111 commited on
Commit
b5cd72d
·
verified ·
1 Parent(s): ef7672f

Upload 2 files

Browse files

## 📱 Mobile Price Prediction (NPR) | Quick Summary

**Objective:** Predict smartphone prices in Nepal using a regression pipeline built on messy scraped data.

### 🛠️ The Tech Stack

* **Data:** 109 cleaned records (from 127 raw) of mobile specs in Nepal.
* **Features:** RAM, Storage, 5G, "Ultra/Pro" status, Foldable status, and a custom **Premium Score**.
* **Top Model:** **Gradient Boosting Regressor** ($R^2 = 0.7487$, MAE $\approx$ NPR 31k).

### 🔑 Key Findings

1. **Brand is King:** Apple and Samsung (Ultra/Fold) drive the highest price premiums.
2. **Interaction Matters:** The relationship between `RAM x Storage` is a better predictor than either alone.
3. **Tiered Logic:** The model successfully categorizes phones from Budget (<20k) to Flagship (≥120k).

### 🏃 Use It in 3 Lines

```python
import joblib
bundle = joblib.load('mobile_price_model.pkl')
# Input: [RAM, Storage, 5G, Ultra, Pro, Foldable, Interaction, Log_Store, Premium, Brand]
price = bundle['model'].predict(sample_df[bundle['feature_cols']])

```

---

### 💡 Pro-Tips for "V2"

* **Data Scarcity:** With only 109 rows, Gradient Boosting might overfit. Consider **Simple Linear Regression** or **ElasticNet** as benchmarks to see if the complexity is truly paying off.
* **Feature Scaling:** Ensure your `StandardScaler` (if used) is applied inside a `Pipeline` object to prevent data leakage during cross-validation.
* **Categorical Handling:** Since you have few brands, try **One-Hot Encoding** instead of Ordinal Encoding to see if it captures brand-specific "hype" better without implying a mathematical order.

Files changed (2) hide show
  1. mobile_price_model_2.pkl +3 -0
  2. requirements.txt.txt +22 -0
mobile_price_model_2.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2fdb09b24d45e64c1bbdd0950db5a1c483882aa1e89e73b56fbb674fc472cef8
3
+ size 136595
requirements.txt.txt ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ python>=3.8
2
+ numpy>=1.24
3
+ pandas>=2.0
4
+ scikit-learn>=1.2
5
+ joblib>=1.2
6
+ matplotlib>=3.7
7
+ seaborn>=0.12
8
+ scipy>=1.10
9
+
10
+ # Optional / useful for extended workflows
11
+ # For faster gradient-boosted models (if you later switch from RandomForest)
12
+ xgboost>=1.7 # optional
13
+ lightgbm>=3.3 # optional
14
+
15
+ # If you use the scraper / JS rendering scripts from earlier messages
16
+ requests>=2.31
17
+ beautifulsoup4>=4.12
18
+ lxml>=4.9
19
+ playwright>=1.36 # optional, only if you use Playwright; run `playwright install` after installing
20
+
21
+ # Jupyter / notebook support (optional)
22
+ jupyterlab>=4.0