Spaces:
Sleeping
A newer version of the Gradio SDK is available: 6.15.2
Documentation
Week 12: Apartment Predictor (Saved Regression Model + LLM Workflow)
Use this file to document what you built, tested, and learned in this exercise.
Do not rename this file to README.md, because README.md is needed by Hugging Face Spaces.
This file is part of the submission. Complete it after you have tested and deployed your app.
1. Project Summary
Short description of your app:
This app allows users to describe their apartment search in natural German language (e.g. number of rooms, size, and location). The system extracts structured input from the text, predicts a monthly rent using a regression model, and generates a short explanation.
The regression model predicts the estimated monthly rent in CHF. The LLM is used for two tasks: extracting structured data from free text and generating a human-readable explanation of the prediction.
2. Files Used
List the main files you worked with.
| File | Purpose |
|---|---|
ai_applications_exercise2.ipynb |
Notebook for initial understanding and testing |
app_student.py |
Implementation of the full pipeline & deployable app for Hugging Face |
random_forest_regression.pkl |
Pre-trained regression model |
bfs_municipality_and_tax_data.csv |
Municipality features used for prediction |
requirements.txt |
Python dependencies |
documentation.md |
This documentation: Written documentation for the submission |
3. Numeric Prediction Part
3.1 Reused Model
Which saved model did you use?random_forest_regression.pkl
What does the model predict?
The model predicts the estimated monthly rent (in CHF) for an apartment based on apartment characteristics and municipality data.
Which input features are used for prediction?
roomsarea_m2poppop_densfrg_pctemptax_incomedistance_to_zurich_center(not available in dataset, approximated as 0)
3.2 Prediction Logic
The user input is first converted into structured values (rooms, area_m2, town).
The town is matched to the dataset, and municipality features are retrieved.
Since the trained model expects 8 features, but the dataset does not include distance_to_zurich_center, this value is approximated as 0. The full feature vector is then passed to the model for prediction.
4. LLM Extraction Part
4.1 Goal
The LLM extracts structured information from the user’s German text:
- number of rooms
- apartment size (m²)
- town name
4.2 Prompt Design
The LLM is instructed with:
- a system prompt defining the task (extraction)
- a requirement to return strict JSON only
- explicit keys: rooms, area_m2, town
- numeric values for rooms and area
- no additional text or Markdown
4.3 Expected Output Format
Document the ideal extraction output.
{
"rooms":4,
"area_m2":110,
"town":"Zürich"
}
4.4 Validation
The JSON response is validated in Python:
- empty responses are rejected
- JSON parsing is enforced
- required keys are checked
- Markdown code blocks are removed if present
This ensures robustness against LLM formatting errors.
5. LLM Explanation Part
5.1 Goal
The second LLM step generates a short explanation of the predicted rent. It does not calculate a new prediction.
5.2 Prompt Design
The prompt includes:
- structured preferences (rooms, area, town)
- the predicted rent value
- instruction to respond in German
- requirement to include one uncertainty note
- strict JSON output with key answer
5.3 Expected Output Format
Example:
{"answer": "Die geschätzte Monatsmiete für eine 4-Zimmer-Wohnung mit 110 m² in Zürich beträgt etwa 4.310 CHF. Bitte beachten Sie, dass es sich um eine Schätzung handelt und der tatsächliche Mietpreis je nach Lage und Ausstattung variieren kann."}
6. End-to-End Pipeline
- User enters a German apartment request.
- LLM extracts
rooms,area_m2, andtown. - Python validates the extracted values.
- The regression model predicts the rent.
- The LLM generates a short explanation.
- The app returns JSON, prediction, and explanation
7. Test Cases
Document at least 3 test inputs.
| Test Input | Extracted Output Correct? | Prediction Returned? | Explanation Returned? | Notes |
|---|---|---|---|---|
3.5 Zimmer, 80m2, Adlikon |
Yes | Yes | Yes | Short input works |
Ich suche eine 3.5 Zimmer Wohnung mit 80m2 in Adlikon. |
Yes | Yes | Yes | Correct extraction and prediction |
Ich suche eine 4 Zimmer Wohnung mit 110m2 in Zürich. |
Yes | Yes | Yes | Full pipeline works |
8. Errors and Problems
Problem1: Missing OpenAI API key
Cause: Environment variable not set correctly
Fix: Used OPENAI_API_KEY and environment variables
Problem2: LLM returned invalid JSON
Cause: LLM returned Markdown-formatted JSON (```json)
Fix: Removed Markdown in parser and improved prompt
Problem3: Feature mismatch (7 vs 8 features)
Cause: Model expected 8 features
Fix: Added dummy value (0) for missing feature
9. Deployment Notes
https://huggingface.co/spaces/lst0004/ApartmentPredictor2.0
9.1 Files included
- app_student.py
- random_forest_regression.pkl
- bfs_municipality_and_tax_data.csv
- requirements.txt
- documentation.md
- README.md
- .gitattributes
9.2 Secrets / Environment Variables
OPENAI_API_KEY
9.3 Deployment Result
The app runs successfully on Hugging Face Spaces. The full pipeline works, including LLM extraction, prediction, and explanation.
9.4 Screenshots
Dieses Beispiel zeigt eine Eingabe für Adlikon. Die Werte wurden korrekt extrahiert und eine plausible Mietschätzung erzeugt.
Hier wird eine Anfrage für Zürich verarbeitet. Die Pipeline funktioniert ebenfalls vollständig inklusive Erklärung durch das LLM.
10. Reflection
The combination of a regression model and an LLM worked well. The LLM simplifies the user interface by allowing natural language input.
However, the system is fragile when the LLM returns incorrectly formatted JSON. Additionally, the model is limited because it does not include important factors such as apartment condition or micro-location.
German input is important because the dataset contains Swiss town names. In the future, adding more features or improving extraction robustness would improve the system.
11. Responsible Use Note
The prediction is only an estimate and should not be used as a definitive price. The model uses limited structured features and ignores important real-world factors such as condition, location within a city, and amenities.
The LLM may also extract incorrect values from user input. Users should treat the output as a rough guideline rather than an exact prediction.

