File size: 1,403 Bytes
dacc8d9
 
 
8d5b219
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
dacc8d9
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
---
license: mit
---
# 🏑 House Price Predictor (Kaggle + Hugging Face)

This project is a complete machine learning pipeline for predicting house prices in Ames, Iowa, using structured data and transformer-based text embeddings. It was developed as part of the [Kaggle House Prices - Advanced Regression Techniques](https://www.kaggle.com/c/house-prices-advanced-regression-techniques) competition.

The model is published on the Hugging Face Hub:
πŸ‘‰ https://huggingface.co/DanteChapterMaster/house-price-predictor

---

## πŸ“¦ Project Highlights

- βœ… Exploratory Data Analysis (EDA)
- βœ… Feature Engineering from domain knowledge
- βœ… Model training: Ridge, Lasso, Random Forest, XGBoost, and Stacking
- βœ… NLP augmentation: BERT embeddings from generated property descriptions
- βœ… Full model pipeline with preprocessing (ColumnTransformer)
- βœ… Deployment-ready model saved with `joblib`

---

## πŸ“Š Features

**Numerical Features:**
- `GrLivArea`, `TotalBsmtSF`, `GarageCars`, etc.

**Categorical Features:**
- `Neighborhood`, `HouseStyle`, etc. (one-hot encoded)

**Generated Features:**
- Log-transformed target
- Interaction terms
- Transformer-based embeddings from property descriptions

---

## πŸ€– Model Card

- **Type:** Regressor
- **Algorithm:** XGBoost in Scikit-learn `Pipeline`
- **Target:** `SalePrice` (log-transformed)
- **Evaluation:** Root Mean Squared Error (RMSE)