Spaces:

lst0004
/

ApartmentPredictor2.0

Sleeping

This app allows users to describe their apartment search in natural German language (e.g. number of rooms, size, and location). The system extracts structured input from the text, predicts a monthly rent using a regression model, and generates a short explanation.

The regression model predicts the estimated monthly rent in CHF. The LLM is used for two tasks: extracting structured data from free text and generating a human-readable explanation of the prediction.

2. Files Used

List the main files you worked with.

File	Purpose
`ai_applications_exercise2.ipynb`	Notebook for initial understanding and testing
`app_student.py`	Implementation of the full pipeline & deployable app for Hugging Face
`random_forest_regression.pkl`	Pre-trained regression model
`bfs_municipality_and_tax_data.csv`	Municipality features used for prediction
`requirements.txt`	Python dependencies
`documentation.md`	This documentation: Written documentation for the submission

3. Numeric Prediction Part

3.1 Reused Model

Which saved model did you use?
random_forest_regression.pkl

What does the model predict?

The model predicts the estimated monthly rent (in CHF) for an apartment based on apartment characteristics and municipality data.

Which input features are used for prediction?

rooms
area_m2
pop
pop_dens
frg_pct
emp
tax_income
distance_to_zurich_center (not available in dataset, approximated as 0)

3.2 Prediction Logic

The user input is first converted into structured values (rooms, area_m2, town). The town is matched to the dataset, and municipality features are retrieved.

Since the trained model expects 8 features, but the dataset does not include distance_to_zurich_center, this value is approximated as 0. The full feature vector is then passed to the model for prediction.

4. LLM Extraction Part

4.1 Goal

The LLM extracts structured information from the user’s German text:

number of rooms
apartment size (m²)
town name

4.2 Prompt Design

The LLM is instructed with:

a system prompt defining the task (extraction)
a requirement to return strict JSON only
explicit keys: rooms, area_m2, town
numeric values for rooms and area
no additional text or Markdown

4.3 Expected Output Format

Document the ideal extraction output.

{
"rooms":4,
"area_m2":110,
"town":"Zürich"
}

4.4 Validation

The JSON response is validated in Python:

empty responses are rejected
JSON parsing is enforced
required keys are checked
Markdown code blocks are removed if present

This ensures robustness against LLM formatting errors.

5. LLM Explanation Part

5.1 Goal

The second LLM step generates a short explanation of the predicted rent. It does not calculate a new prediction.

5.2 Prompt Design

The prompt includes:

structured preferences (rooms, area, town)
the predicted rent value
instruction to respond in German
requirement to include one uncertainty note
strict JSON output with key answer

5.3 Expected Output Format

Example:

{"answer": "Die geschätzte Monatsmiete für eine 4-Zimmer-Wohnung mit 110 m² in Zürich beträgt etwa 4.310 CHF. Bitte beachten Sie, dass es sich um eine Schätzung handelt und der tatsächliche Mietpreis je nach Lage und Ausstattung variieren kann."}

6. End-to-End Pipeline

User enters a German apartment request.
LLM extracts rooms, area_m2, and town.
Python validates the extracted values.
The regression model predicts the rent.
The LLM generates a short explanation.
The app returns JSON, prediction, and explanation

7. Test Cases

Document at least 3 test inputs.

Test Input	Extracted Output Correct?	Prediction Returned?	Explanation Returned?	Notes
`3.5 Zimmer, 80m2, Adlikon`	Yes	Yes	Yes	Short input works
`Ich suche eine 3.5 Zimmer Wohnung mit 80m2 in Adlikon.`	Yes	Yes	Yes	Correct extraction and prediction
`Ich suche eine 4 Zimmer Wohnung mit 110m2 in Zürich.`	Yes	Yes	Yes	Full pipeline works

8. Errors and Problems

Problem1: Missing OpenAI API key

Cause: Environment variable not set correctly

Fix: Used OPENAI_API_KEY and environment variables

Problem2: LLM returned invalid JSON

Cause: LLM returned Markdown-formatted JSON (```json)

Fix: Removed Markdown in parser and improved prompt

Problem3: Feature mismatch (7 vs 8 features)

Cause: Model expected 8 features

Fix: Added dummy value (0) for missing feature

9. Deployment Notes

https://huggingface.co/spaces/lst0004/ApartmentPredictor2.0

9.1 Files included

app_student.py
random_forest_regression.pkl
bfs_municipality_and_tax_data.csv
requirements.txt
documentation.md
README.md
.gitattributes

9.2 Secrets / Environment Variables

OPENAI_API_KEY

9.3 Deployment Result

The app runs successfully on Hugging Face Spaces. The full pipeline works, including LLM extraction, prediction, and explanation.

9.4 Screenshots

Dieses Beispiel zeigt eine Eingabe für Adlikon. Die Werte wurden korrekt extrahiert und eine plausible Mietschätzung erzeugt.

Hier wird eine Anfrage für Zürich verarbeitet. Die Pipeline funktioniert ebenfalls vollständig inklusive Erklärung durch das LLM.

10. Reflection

The combination of a regression model and an LLM worked well. The LLM simplifies the user interface by allowing natural language input.

However, the system is fragile when the LLM returns incorrectly formatted JSON. Additionally, the model is limited because it does not include important factors such as apartment condition or micro-location.

German input is important because the dataset contains Swiss town names. In the future, adding more features or improving extraction robustness would improve the system.

11. Responsible Use Note

The prediction is only an estimate and should not be used as a definitive price. The model uses limited structured features and ignores important real-world factors such as condition, location within a city, and amenities.

The LLM may also extract incorrect values from user input. Users should treat the output as a rough guideline rather than an exact prediction.