Spaces:

datawizard116
/

House-Price-Prediction

Running

App Files Files Community

House-Price-Prediction / README.md

datawizard116

Update README.md

61b466f verified 16 days ago

preview code

raw

history blame contribute delete

7.16 kB

metadata

title: House Price Prediction
emoji: 🏠
colorFrom: blue
colorTo: purple
sdk: docker
app_file: app.py
pinned: false

🏠 House Price Prediction

An end-to-end Machine Learning project that predicts house prices in Bengaluru using features like square footage, BHK, bathrooms, and locality-based pricing.

Built using:

Python
Pandas
Scikit-learn
XGBoost
Flask
Streamlit

📌 Project Overview

This project uses the Bengaluru House Price dataset to build a real estate price prediction system.

The workflow includes:

Data cleaning
Feature engineering
Outlier removal
Log transformation
Model training
Hyperparameter tuning
Feature importance analysis
Flask API development
Streamlit frontend deployment

🚀 Features

✅ Cleaned messy real-estate data
✅ Converted sqft ranges into numeric values
✅ Engineered geospatial locality pricing feature
✅ Removed outliers using IQR method
✅ Applied log transformation to target variable
✅ Compared multiple ML models
✅ Tuned XGBoost hyperparameters
✅ Built Flask prediction API
✅ Created interactive frontend UI
✅ Ready for deployment on Hugging Face / Render

📂 Dataset

Dataset used:

Bengaluru House Price Dataset

Main features:

location
total_sqft
bath
balcony
BHK
price

🧹 Data Preprocessing

1. Handling Missing Values

Removed null values and inconsistent rows.

data = data.dropna()

2. Converted `size` Column to BHK

Example:

2 BHK → 2

Code:

data['bhk'] = data['size'].apply(
    lambda x: int(str(x).split()[0])
)

3. Cleaned `total_sqft`

Handled:

ranges
inconsistent units
invalid values

Examples:

2100 - 2850 → 2475

Code:

def convert_sqft(x):

    x = str(x)

    if '-' in x:
        a, b = x.split('-')
        return (float(a) + float(b)) / 2

    try:
        return float(x)

    except:
        return None

Applied:

data['total_sqft'] = data['total_sqft'].apply(convert_sqft)

⚙️ Feature Engineering

1. Price Per Sqft

Created normalized pricing feature:

data['price_per_sqft'] = (
    data['price'] * 100000
) / data['total_sqft']

Used for:

outlier detection
normalization
better model learning

2. Geospatial Locality Feature

Calculated average locality price using:

location_price = data.groupby(
    'location'
)['price'].mean()

Mapped back to dataset:

data['location_avg_price'] = data[
    'location'
].map(location_price)

This feature helps the model learn:

expensive locations
cheaper localities
pricing trends by area

📊 Outlier Removal

Used IQR (Interquartile Range) method.

Formula:

IQR = Q3 - Q1

Outlier Range:

[Q1 - 1.5(IQR), Q3 + 1.5(IQR)]

Code:

Q1 = data['price_per_sqft'].quantile(0.25)

Q3 = data['price_per_sqft'].quantile(0.75)

IQR = Q3 - Q1

lower_limit = Q1 - 1.5 * IQR

upper_limit = Q3 + 1.5 * IQR

data = data[
    (data['price_per_sqft'] >= lower_limit) &
    (data['price_per_sqft'] <= upper_limit)
]

📈 Log Transformation

Applied logarithmic transformation on target variable:

import numpy as np

data['log_price'] = np.log(data['price'])

Benefits:

reduced skewness
stabilized variance
improved regression performance

🤖 Machine Learning Models

Compared:

Linear Regression
Ridge Regression
XGBoost Regressor

📌 Feature & Target Selection

X = data[
    [
        'total_sqft',
        'bath',
        'bhk',
        'location_avg_price'
    ]
]

y = data['log_price']

✂️ Train Test Split

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    test_size=0.2,
    random_state=42
)

📊 Cross Validation Evaluation

Used:

cross_val_score()

Scoring Metric:

R² Score

📈 Model Results

Model	R² Score
Linear Regression	0.559
Ridge Regression	0.559
XGBoost	0.827

🏆 Best Model

XGBoost Regressor

Reason:

captures non-linear relationships
handles feature interactions
performs well on tabular datasets

🔧 Hyperparameter Tuning

Used:

GridSearchCV

Parameter Grid:

params = {
    'n_estimators': [100, 200],
    'max_depth': [3, 5, 7],
    'learning_rate': [0.01, 0.1, 0.2]
}

Best Parameters:

{
 'learning_rate': 0.1,
 'max_depth': 7,
 'n_estimators': 100
}

Best Tuned Score:

0.823

📌 Feature Importance

Visualized feature importance using XGBoost.

Top contributing features:

Location Average Price
Total Square Feet
BHK
Bathrooms

Code:

import matplotlib.pyplot as plt

importance = xgb.feature_importances_

features = X.columns

plt.figure(figsize=(8,5))

plt.bar(features, importance)

plt.xlabel("Features")
plt.ylabel("Importance")

plt.title("Feature Importance")

plt.show()

🌐 Flask API

Created a Flask API for predictions.

POST Endpoint

/predict

Example Request

{
  "location": 85,
  "BHK": 2,
  "area": 1200,
  "bath": 2
}

Example Response

{
  "predicted_price": 78.5
}

🖥️ Streamlit Frontend

Built an interactive UI using Streamlit.

Features:

Area input
Bathroom input
BHK input
Location pricing input
Instant prediction display

📦 Installation

Clone repository:

git clone https://github.com/your-username/house-price-predictor.git

Move into project directory:

cd house-price-predictor

Install dependencies:

pip install -r requirements.txt

▶️ Run Flask App

python app.py

Open:

http://127.0.0.1:5000

▶️ Run Streamlit App

streamlit run streamlit_app.py

📁 Project Structure

house-price-predictor/
│
├── app.py
├── streamlit_app.py
├── house_price_model.pkl
├── requirements.txt
├── runtime.txt
├── README.md

🛠️ Tech Stack

Tool	Purpose
Python	Programming
Pandas	Data preprocessing
NumPy	Numerical operations
Matplotlib	Visualization
Scikit-learn	ML utilities
XGBoost	Regression model
Flask	API backend
Streamlit	Frontend UI

📚 Key Learnings

Real-world data preprocessing
Feature engineering
Outlier handling using IQR
Log transformation
Model comparison using cross-validation
Hyperparameter tuning
Flask API creation
Streamlit UI development
ML deployment workflow

🔮 Future Improvements

Use actual location names
Add location dropdown
Add map-based visualization
Improve frontend UI
Add cloud deployment pipeline
Add model monitoring

👨‍💻 Author

Mohd Faizanullah

Aspiring ML Engineer focused on:

Machine Learning
Deep Learning
AI Applications
Full ML Deployment Pipelines

🏠 House Price Prediction

📌 Project Overview

🚀 Features

📂 Dataset

🧹 Data Preprocessing

1. Handling Missing Values

2. Converted size Column to BHK

3. Cleaned total_sqft

⚙️ Feature Engineering

1. Price Per Sqft

2. Geospatial Locality Feature

📊 Outlier Removal

📈 Log Transformation

🤖 Machine Learning Models

📌 Feature & Target Selection

✂️ Train Test Split

📊 Cross Validation Evaluation

📈 Model Results

🏆 Best Model

XGBoost Regressor

🔧 Hyperparameter Tuning

📌 Feature Importance

🌐 Flask API

POST Endpoint

Example Request

Example Response

🖥️ Streamlit Frontend

📦 Installation

▶️ Run Flask App

▶️ Run Streamlit App

📁 Project Structure

🛠️ Tech Stack

📚 Key Learnings

🔮 Future Improvements

👨‍💻 Author

2. Converted `size` Column to BHK

3. Cleaned `total_sqft`