A newer version of the Gradio SDK is available: 6.14.0
title: Exercise1
emoji: 🏃
colorFrom: gray
colorTo: gray
sdk: gradio
app_file: app.py
pinned: false
Model Iterations Documentation
Task: Apartment Price Prediction (Regression)
Application Link
Public URL (Hugging Face Space):
https://huggingface.co/spaces/nbacchi/exercise1
Summary of Iterative Process
| Iteration | Objective | Key Changes | Models Used | CV Mean R² | CV Std Dev | Change in Performance | Fit Diagnosis |
|---|---|---|---|---|---|---|---|
| 1 | Build baseline model | - Drop missing values - Remove duplicates - Price filter (750–8000 CHF) - Valid rooms/area filter - 5-fold CV |
Linear Regression Random Forest (n_estimators=300) |
0.5446 (LR) 0.5178 (RF) |
0.1071 (LR) 0.1195 (RF) |
Baseline | ☑ Overfitting ☐ Underfitting ☐ Good Fit |
| 2 | Improve generalization | - Feature engineering - municipality_area_proxy = pop/pop_dens - emp_per_resident = emp/pop - foreigner_count_est = pop×(frg_pct/100) - Hyperparameter tuning - 5-fold CV |
Ridge (alpha=1.0) Tuned Random Forest (n_estimators=500, max_depth=12, min_samples_split=5, min_samples_leaf=2) |
0.5297 (Ridge) 0.5509 (RF) |
0.0947 (Ridge) 0.1060 (RF) |
+0.0331 (RF) | ☐ Overfitting ☐ Underfitting ☑ Good Fit |
Detailed Metrics Comparison
Iteration 1 – Baseline
| Model | CV Mean R² | CV Std R² | CV Mean RMSE | CV Mean MAE |
|---|---|---|---|---|
| Linear Regression | 0.5446 | 0.1071 | 673.00 | 468.07 |
| Random Forest | 0.5178 | 0.1195 | 698.51 | 500.13 |
Iteration 2 – Feature Engineering
| Model | CV Mean R² | CV Std R² | CV Mean RMSE | CV Mean MAE |
|---|---|---|---|---|
| Ridge | 0.5297 | 0.0947 | 682.01 | 481.08 |
| Tuned Random Forest | 0.5509 | 0.1060 | 674.54 | 473.98 |
Created Features
Iteration 2 Feature Engineering:
municipality_area_proxy= population / population densityemp_per_resident= employees / populationforeigner_count_est= population × (foreigner_pct / 100)
All features are reproducible from municipality-level variables and can be computed in real-time in the web application.
In der App angezeigte Bezeichnungen (Deutsch):
municipality_area_proxy→ Gemeindegrößeemp_per_resident→ Arbeitsplatzquoteforeigner_count_est→ Ausländerpopulation
Final Selected Features
Feature Set for Final Model:
rooms– number of apartment roomsarea– living area in m²pop– municipality populationpop_dens– population density (per km²)frg_pct– percentage of foreign residentsemp– number of employees in municipalitytax_income– taxable income per capitamunicipality_area_proxy– proxy for geographic sizeemp_per_resident– economic activity indicatorforeigner_count_est– estimated foreigner count
Reason for Selection
Final model: RandomForestRegressor (tuned from iteration 2)
Justification:
- Highest cross-validated $R^2$ across all iterations (0.5509)
- Lowest generalization gap (CV Std = 0.1060 vs baseline 0.1195)
- Feature engineering improves predictive power by +0.0331 in $R^2$
- Tuned hyperparameters reduce overfitting (
max_depth=12,min_samples_split=5) - RMSE of CHF 674.54 acceptable for price range 750–8000
Preprocessing Steps (Iteration 1 → 2)
Data Cleaning
- Load original dataset (apartments in canton Zurich)
- Remove rows with missing values (
dropna()) - Remove duplicate rows (
drop_duplicates()) - Filter unrealistic prices: keep
750 ≤ price ≤ 8000CHF - Filter invalid structures: keep
rooms > 0andarea > 0
Feature Engineering (Iteration 2)
- Compute
municipality_area_proxyfrompopandpop_dens - Compute
emp_per_residentfromempandpop - Compute
foreigner_count_estfrompopandfrg_pct - Combine with baseline features for final training
Evaluation Method
- 5-fold cross-validation
- Metrics: $R^2$, RMSE, MAE
- No separate validation set (full data used with CV)
Metric Definition
$R^2$ (Coefficient of Determination):
Proportion of variance in price explained by features. Range: [0, 1]. Higher is better.
RMSE (Root Mean Squared Error):
Square root of average squared prediction error. Units: CHF. Lower is better.
MAE (Mean Absolute Error):
Average absolute prediction error. Units: CHF. Lower is better.
Application & Deployment
- App Framework: Gradio
- App File: app.py
- Saved Model: models/apartment_price_model.pkl
- Deployment Platform: Hugging Face Spaces (URL to be updated)
How to Run Locally
cd Projekt1
uv run python app.py
Submission Checklist (Mandatory)
- Trained regression model available (models/apartment_price_model.pkl)
- New feature(s) added (iteration 2 feature engineering)
- Working web application (app.py)
- Documented iterative modeling process (2 iterations, tables + metrics)
- Completed README
- README uploaded to Hugging Face repository
- Public application link inserted above
Notes
- Baseline R² (0.5446) is competitive for real estate price prediction
- Feature engineering provides modest +0.0331 improvement in $R^2$
- Standard deviation drop (0.1195 → 0.1060) indicates more stable predictions
- Model saved and ready for deployment on Hugging Face Spaces