A newer version of the Gradio SDK is available: 6.16.0
Documentation
Block 3: Apartment Predictor
1. Project Summary
The app takes a German free-text apartment request (e.g. number of rooms, area, municipality) and returns an estimated monthly rent in CHF. GPT-4o-mini (via OpenAI API) extracts the structured parameters from the text and converts them into a JSON object. A pre-trained Random Forest regression model then calculates the rent estimate. Finally, the LLM generates a clear German explanation of the result including an uncertainty note.
2. Files Used
| File | Purpose |
|---|---|
ai_applications_exercise2.ipynb |
Notebook for developing and testing the functions |
app_student.py |
Starting file with TODOs (not deployed) |
app.py |
Final deployable app for Hugging Face Spaces |
random_forest_regression.pkl |
Pre-trained scikit-learn regression model |
bfs_municipality_and_tax_data.csv |
BFS municipality data (population, taxes, etc.) |
requirements.txt |
Python dependencies |
documentation.md |
This documentation |
README.md |
Hugging Face Spaces configuration |
3. Numeric Prediction Part
3.1 Reused Model
Which saved model did you use?random_forest_regression.pkl – a pre-trained scikit-learn RandomForestRegressor model.
What does the model predict?
The model estimates the monthly gross rent (CHF) of a Swiss apartment based on apartment characteristics and municipality statistics.
Which input features are used for prediction?
rooms– Number of rooms (e.g. 3.5)area_m2– Living area in square metrespop– Population of the municipalitypop_dens– Population density (inhabitants/km²)frg_pct– Percentage of foreign residentsemp– Number of employees in the municipalitytax_income– Taxable income (median income of the municipality)
3.2 Prediction Logic
The municipality name is matched in lowercase against the town_to_row dictionary. The five municipality features (pop, pop_dens, frg_pct, emp, tax_income) are retrieved and passed together with rooms and area_m2 as a NumPy array [[...]] to model.predict(). The result is rounded to two decimal places.
4. LLM Extraction Part
4.1 Goal
The LLM should extract the three fields rooms, area_m2 and town from a German free-text apartment request and return them as a pure JSON object.
4.2 Prompt Design
System Prompt:
The model is instructed to return exclusively a JSON object (no Markdown, no explanations) with exactly the three keys rooms (number), area_m2 (number) and town (string in German). An example JSON is included in the system prompt.
User Prompt:
The user's free text is passed directly with the instruction to extract the three values.
- ✅ System instruction used
- ✅ Strict JSON required
- ✅ Keys explicitly named:
rooms,area_m2,town - ✅ German input expected (municipality names match BFS dataset)
4.3 Expected Output Format
{"rooms": 3.5, "area_m2": 85, "town": "Winterthur"}
4.4 Validation
After the LLM call, parse_json_response() is used, which:
- Removes Markdown fences (if present)
- Applies
json.loads()to the cleaned string - Checks whether all three keys are present
Then match_town() is called to validate the extracted municipality name against the BFS dataset. If no match is found, a clear ValueError is raised.
5. LLM Explanation Part
5.1 Goal
The second LLM step should explain the already calculated model result in plain German. The LLM does not calculate its own price – it receives the prediction value and explains it.
5.2 Prompt Design
System Prompt: The model is instructed to return exclusively JSON with the key "answer". The explanation should be 2–4 sentences in German and mention a concrete uncertainty factor (e.g. fixtures, micro-location, year of construction).
User Prompt: Contains number of rooms, area, municipality and the model-calculated rent price.
- ✅ Structured preferences included
- ✅ Prediction value explicitly passed
- ✅ German output required
- ✅ Uncertainty note required
- ✅ JSON output with key
answerrequired
5.3 Expected Output Format
{"answer": "The predicted monthly rent for a 3.5-room apartment with 85.0 m² in Winterthur is 2117 CHF. This estimate may vary, as factors such as the apartment's fixtures, the exact micro-location or the year of construction can have a significant influence on the actual rent."}
6. End-to-End Pipeline
- User input: The user enters a German apartment request (Gradio text field).
- Extraction:
extract_preferences()calls the LLM and receivesrooms,area_m2,townas JSON. - Validation: Python validates the fields and matches the municipality name using
match_town(). - Prediction:
predict_apartment_price()loads the municipality data and callsmodel.predict(). - Explanation:
generate_explanation()passes preferences + prediction to the LLM and receives a German explanation. - Output: Gradio displays the JSON extraction, estimated price and explanation text.
7. Test Cases
| Test Input | Extracted correctly? | Prediction received? | Explanation received? | Notes |
|---|---|---|---|---|
Ich suche eine 3.5-Zimmer-Wohnung mit etwa 85 m² in Winterthur. |
Yes | Yes | Yes | Prediction: 2117 CHF |
4-Zimmer-Wohnung, 110 m², Zürich |
Yes | Yes | Yes | Prediction: 4029 CHF |
Ich brauche 2 Zimmer und etwa 55 m2 in Kloten. |
Yes | Yes | Yes | Smaller apartment near airport |
5 Zimmer, 150 m2 in Zug |
Yes | Yes | Yes | Affluent municipality, high rent |
8. Errors and Problems
Problem 1: scikit-learn version conflict
- Cause: The
.pklmodel was trained with a different scikit-learn version than installed on Hugging Face. - Fix: Pin
scikit-learn==1.6.1inrequirements.txt.
Problem 2: LLM returns Markdown instead of pure JSON
- Cause: The LLM sometimes wraps the response in
```jsonfences. - Fix:
parse_json_response()strips Markdown fences before JSON parsing.
Problem 3: Municipality names do not match
- Cause: User writes e.g. "Zürich" while the dataset entry is "Zürich (Kreis 1)", or typos occur.
- Fix:
match_town()first uses exact match, then substring match.
9. Deployment Notes
9.1 Files included
app.pyrequirements.txtREADME.mddocumentation.mdrandom_forest_regression.pklbfs_municipality_and_tax_data.csv
9.2 Secrets / Environment Variables
OPENAI_API_KEY– OpenAI API key (mandatory)
9.3 Deployment Result
The Space runs successfully on Hugging Face. The Gradio UI is publicly accessible. All three output fields (JSON, price, explanation) are populated correctly.
9.4 Screenshots
Screenshot 1: Input "Ich suche eine 3.5-Zimmer-Wohnung mit etwa 85 m² in Winterthur."
Extracted JSON: rooms: 3.5, area_m2: 85, town: Winterthur – Prediction: 2117.32 CHF – The explanation mentions fixtures, micro-location and year of construction as uncertainty factors.
Screenshot 2: Input "4-Zimmer-Wohnung, 110 m², Zürich"
Extracted JSON: rooms: 4, area_m2: 110, town: Zürich – Prediction: 4028.79 CHF – The explanation highlights general market conditions in Zurich and names micro-location and year of construction as uncertainty factors.
10. Reflection
The combination of a numeric regression model and an LLM works well: the model delivers fast, consistent estimates, while the LLM makes the communication with the user friendly and natural. The biggest weakness is the dependency on correct municipality names – typos or unknown municipalities lead to errors. German input is important because the BFS municipality data is in German and the LLM should adopt the municipality names directly. Potential improvements include more robust fuzzy matching for municipality names and the ability to ask clarifying questions when information is missing.
11. Responsible Use Note
The app provides estimates only, based on aggregated municipality and apartment data – not reliable rent figures for specific properties. The model does not account for important factors such as building condition, floor level, fixtures and micro-location. LLM-based extraction may produce incorrect values for ambiguous inputs. The results should be understood as a rough guide and do not replace professional real estate advice.

