|
|
--- |
|
|
license: mit |
|
|
--- |
|
|
# π‘ House Price Predictor (Kaggle + Hugging Face) |
|
|
|
|
|
This project is a complete machine learning pipeline for predicting house prices in Ames, Iowa, using structured data and transformer-based text embeddings. It was developed as part of the [Kaggle House Prices - Advanced Regression Techniques](https://www.kaggle.com/c/house-prices-advanced-regression-techniques) competition. |
|
|
|
|
|
The model is published on the Hugging Face Hub: |
|
|
π https://huggingface.co/DanteChapterMaster/house-price-predictor |
|
|
|
|
|
--- |
|
|
|
|
|
## π¦ Project Highlights |
|
|
|
|
|
- β
Exploratory Data Analysis (EDA) |
|
|
- β
Feature Engineering from domain knowledge |
|
|
- β
Model training: Ridge, Lasso, Random Forest, XGBoost, and Stacking |
|
|
- β
NLP augmentation: BERT embeddings from generated property descriptions |
|
|
- β
Full model pipeline with preprocessing (ColumnTransformer) |
|
|
- β
Deployment-ready model saved with `joblib` |
|
|
|
|
|
--- |
|
|
|
|
|
## π Features |
|
|
|
|
|
**Numerical Features:** |
|
|
- `GrLivArea`, `TotalBsmtSF`, `GarageCars`, etc. |
|
|
|
|
|
**Categorical Features:** |
|
|
- `Neighborhood`, `HouseStyle`, etc. (one-hot encoded) |
|
|
|
|
|
**Generated Features:** |
|
|
- Log-transformed target |
|
|
- Interaction terms |
|
|
- Transformer-based embeddings from property descriptions |
|
|
|
|
|
--- |
|
|
|
|
|
## π€ Model Card |
|
|
|
|
|
- **Type:** Regressor |
|
|
- **Algorithm:** XGBoost in Scikit-learn `Pipeline` |
|
|
- **Target:** `SalePrice` (log-transformed) |
|
|
- **Evaluation:** Root Mean Squared Error (RMSE) |