exercise1 / README.md
nbacchi's picture
Update README.md
754fe91 verified

A newer version of the Gradio SDK is available: 6.14.0

Upgrade
metadata
title: Exercise1
emoji: 🏃
colorFrom: gray
colorTo: gray
sdk: gradio
app_file: app.py
pinned: false

Model Iterations Documentation

Task: Apartment Price Prediction (Regression)

Application Link

Public URL (Hugging Face Space):

https://huggingface.co/spaces/nbacchi/exercise1


Summary of Iterative Process

Iteration Objective Key Changes Models Used CV Mean R² CV Std Dev Change in Performance Fit Diagnosis
1 Build baseline model - Drop missing values
- Remove duplicates
- Price filter (750–8000 CHF)
- Valid rooms/area filter
- 5-fold CV
Linear Regression
Random Forest (n_estimators=300)
0.5446 (LR)
0.5178 (RF)
0.1071 (LR)
0.1195 (RF)
Baseline ☑ Overfitting ☐ Underfitting ☐ Good Fit
2 Improve generalization - Feature engineering
- municipality_area_proxy = pop/pop_dens
- emp_per_resident = emp/pop
- foreigner_count_est = pop×(frg_pct/100)
- Hyperparameter tuning
- 5-fold CV
Ridge (alpha=1.0)
Tuned Random Forest (n_estimators=500, max_depth=12, min_samples_split=5, min_samples_leaf=2)
0.5297 (Ridge)
0.5509 (RF)
0.0947 (Ridge)
0.1060 (RF)
+0.0331 (RF) ☐ Overfitting ☐ Underfitting ☑ Good Fit

Detailed Metrics Comparison

Iteration 1 – Baseline

Model CV Mean R² CV Std R² CV Mean RMSE CV Mean MAE
Linear Regression 0.5446 0.1071 673.00 468.07
Random Forest 0.5178 0.1195 698.51 500.13

Iteration 2 – Feature Engineering

Model CV Mean R² CV Std R² CV Mean RMSE CV Mean MAE
Ridge 0.5297 0.0947 682.01 481.08
Tuned Random Forest 0.5509 0.1060 674.54 473.98

Created Features

Iteration 2 Feature Engineering:

  • municipality_area_proxy = population / population density
  • emp_per_resident = employees / population
  • foreigner_count_est = population × (foreigner_pct / 100)

All features are reproducible from municipality-level variables and can be computed in real-time in the web application.

In der App angezeigte Bezeichnungen (Deutsch):

  • municipality_area_proxyGemeindegröße
  • emp_per_residentArbeitsplatzquote
  • foreigner_count_estAusländerpopulation

Final Selected Features

Feature Set for Final Model:

  • rooms – number of apartment rooms
  • area – living area in m²
  • pop – municipality population
  • pop_dens – population density (per km²)
  • frg_pct – percentage of foreign residents
  • emp – number of employees in municipality
  • tax_income – taxable income per capita
  • municipality_area_proxy – proxy for geographic size
  • emp_per_resident – economic activity indicator
  • foreigner_count_est – estimated foreigner count

Reason for Selection

Final model: RandomForestRegressor (tuned from iteration 2)
Justification:

  • Highest cross-validated $R^2$ across all iterations (0.5509)
  • Lowest generalization gap (CV Std = 0.1060 vs baseline 0.1195)
  • Feature engineering improves predictive power by +0.0331 in $R^2$
  • Tuned hyperparameters reduce overfitting (max_depth=12, min_samples_split=5)
  • RMSE of CHF 674.54 acceptable for price range 750–8000

Preprocessing Steps (Iteration 1 → 2)

Data Cleaning

  1. Load original dataset (apartments in canton Zurich)
  2. Remove rows with missing values (dropna())
  3. Remove duplicate rows (drop_duplicates())
  4. Filter unrealistic prices: keep 750 ≤ price ≤ 8000 CHF
  5. Filter invalid structures: keep rooms > 0 and area > 0

Feature Engineering (Iteration 2)

  1. Compute municipality_area_proxy from pop and pop_dens
  2. Compute emp_per_resident from emp and pop
  3. Compute foreigner_count_est from pop and frg_pct
  4. Combine with baseline features for final training

Evaluation Method

  • 5-fold cross-validation
  • Metrics: $R^2$, RMSE, MAE
  • No separate validation set (full data used with CV)

Metric Definition

$R^2$ (Coefficient of Determination):
Proportion of variance in price explained by features. Range: [0, 1]. Higher is better.

RMSE (Root Mean Squared Error):
Square root of average squared prediction error. Units: CHF. Lower is better.

MAE (Mean Absolute Error):
Average absolute prediction error. Units: CHF. Lower is better.


Application & Deployment

How to Run Locally

cd Projekt1
uv run python app.py

Submission Checklist (Mandatory)

  • Trained regression model available (models/apartment_price_model.pkl)
  • New feature(s) added (iteration 2 feature engineering)
  • Working web application (app.py)
  • Documented iterative modeling process (2 iterations, tables + metrics)
  • Completed README
  • README uploaded to Hugging Face repository
  • Public application link inserted above

Notes

  • Baseline R² (0.5446) is competitive for real estate price prediction
  • Feature engineering provides modest +0.0331 improvement in $R^2$
  • Standard deviation drop (0.1195 → 0.1060) indicates more stable predictions
  • Model saved and ready for deployment on Hugging Face Spaces