Spaces:

lst0004
/

ApartmentPredictor2.0

Sleeping

App Files Files Community

ApartmentPredictor2.0 / documentation.md

lst0004

Update documentation.md

d3d1c18 verified 25 days ago

preview code

raw

history blame contribute delete

6.98 kB

	# Documentation
	## Week 12: Apartment Predictor (Saved Regression Model + LLM Workflow)

	Use this file to document what you built, tested, and learned in this exercise.

	Do not rename this file to `README.md`, because `README.md` is needed by Hugging Face Spaces.

	This file is part of the submission. Complete it after you have tested and deployed your app.

	---

	## 1. Project Summary

	Short description of your app:

	This app allows users to describe their apartment search in natural German language (e.g. number of rooms, size, and location). The system extracts structured input from the text, predicts a monthly rent using a regression model, and generates a short explanation.

	The regression model predicts the estimated monthly rent in CHF. The LLM is used for two tasks: extracting structured data from free text and generating a human-readable explanation of the prediction.

	---

	## 2. Files Used

	List the main files you worked with.

	\| File \| Purpose \|
	\|------\|---------\|
	\| `ai_applications_exercise2.ipynb` \| Notebook for initial understanding and testing \|
	\| `app_student.py` \| Implementation of the full pipeline & deployable app for Hugging Face \|
	\| `random_forest_regression.pkl` \| Pre-trained regression model \|
	\| `bfs_municipality_and_tax_data.csv` \| Municipality features used for prediction \|
	\| `requirements.txt` \| Python dependencies \|
	\| `documentation.md` \| This documentation: Written documentation for the submission \|


	---

	## 3. Numeric Prediction Part

	### 3.1 Reused Model

	Which saved model did you use?
	`random_forest_regression.pkl`

	What does the model predict?

	The model predicts the estimated monthly rent (in CHF) for an apartment based on apartment characteristics and municipality data.

	Which input features are used for prediction?

	1. `rooms`
	2. `area_m2`
	3. `pop`
	4. `pop_dens`
	5. `frg_pct`
	6. `emp`
	7. `tax_income`
	8. `distance_to_zurich_center` (not available in dataset, approximated as 0)

	### 3.2 Prediction Logic

	The user input is first converted into structured values (`rooms`, `area_m2`, `town`).
	The town is matched to the dataset, and municipality features are retrieved.

	Since the trained model expects 8 features, but the dataset does not include distance_to_zurich_center, this value is approximated as 0. The full feature vector is then passed to the model for prediction.

	---

	## 4. LLM Extraction Part

	### 4.1 Goal

	The LLM extracts structured information from the user’s German text:

	- number of rooms
	- apartment size (m²)
	- town name

	### 4.2 Prompt Design

	The LLM is instructed with:

	- a system prompt defining the task (extraction)
	- a requirement to return strict JSON only
	- explicit keys: rooms, area_m2, town
	- numeric values for rooms and area
	- no additional text or Markdown

	### 4.3 Expected Output Format

	Document the ideal extraction output.

	```json
	{
	"rooms":4,
	"area_m2":110,
	"town":"Zürich"
	}
	```

	### 4.4 Validation

	The JSON response is validated in Python:

	- empty responses are rejected
	- JSON parsing is enforced
	- required keys are checked
	- Markdown code blocks are removed if present

	This ensures robustness against LLM formatting errors.

	---

	## 5. LLM Explanation Part

	### 5.1 Goal

	The second LLM step generates a short explanation of the predicted rent.
	It does not calculate a new prediction.

	### 5.2 Prompt Design

	The prompt includes:

	- structured preferences (rooms, area, town)
	- the predicted rent value
	- instruction to respond in German
	- requirement to include one uncertainty note
	- strict JSON output with key answer

	### 5.3 Expected Output Format

	Example:

	```json
	{"answer": "Die geschätzte Monatsmiete für eine 4-Zimmer-Wohnung mit 110 m² in Zürich beträgt etwa 4.310 CHF. Bitte beachten Sie, dass es sich um eine Schätzung handelt und der tatsächliche Mietpreis je nach Lage und Ausstattung variieren kann."}
	```

	---

	## 6. End-to-End Pipeline

	1. User enters a German apartment request.
	2. LLM extracts `rooms`, `area_m2`, and `town`.
	3. Python validates the extracted values.
	4. The regression model predicts the rent.
	5. The LLM generates a short explanation.
	6. The app returns JSON, prediction, and explanation

	---

	## 7. Test Cases

	Document at least 3 test inputs.

	\| Test Input \| Extracted Output Correct? \| Prediction Returned? \| Explanation Returned? \| Notes \|
	\|------------\|----------------------------\|----------------------\|-----------------------\|-------\|
	\| `3.5 Zimmer, 80m2, Adlikon` \| Yes \| Yes \| Yes \| Short input works \|
	\| `Ich suche eine 3.5 Zimmer Wohnung mit 80m2 in Adlikon.` \| Yes \| Yes \| Yes \| Correct extraction and prediction \|
	\| `Ich suche eine 4 Zimmer Wohnung mit 110m2 in Zürich.`\| Yes \| Yes \| Yes \| Full pipeline works \|

	---

	## 8. Errors and Problems

	Problem1: Missing OpenAI API key

	Cause:
	Environment variable not set correctly

	Fix:
	Used OPENAI_API_KEY and environment variables


	Problem2: LLM returned invalid JSON

	Cause:
	LLM returned Markdown-formatted JSON (```json)

	Fix:
	Removed Markdown in parser and improved prompt


	Problem3: Feature mismatch (7 vs 8 features)

	Cause:
	Model expected 8 features

	Fix:
	Added dummy value (0) for missing feature

	---

	## 9. Deployment Notes

	https://huggingface.co/spaces/lst0004/ApartmentPredictor2.0

	### 9.1 Files included

	- app_student.py
	- random_forest_regression.pkl
	- bfs_municipality_and_tax_data.csv
	- requirements.txt
	- documentation.md
	- README.md
	- .gitattributes

	### 9.2 Secrets / Environment Variables

	- `OPENAI_API_KEY`


	### 9.3 Deployment Result

	The app runs successfully on Hugging Face Spaces.
	The full pipeline works, including LLM extraction, prediction, and explanation.

	### 9.4 Screenshots

	![Example 1](Screenshot_1.png)

	Dieses Beispiel zeigt eine Eingabe für Adlikon. Die Werte wurden korrekt extrahiert und eine plausible Mietschätzung erzeugt.

	![Example 2](Screenshot_2.png)

	Hier wird eine Anfrage für Zürich verarbeitet. Die Pipeline funktioniert ebenfalls vollständig inklusive Erklärung durch das LLM.

	---

	## 10. Reflection

	The combination of a regression model and an LLM worked well. The LLM simplifies the user interface by allowing natural language input.

	However, the system is fragile when the LLM returns incorrectly formatted JSON. Additionally, the model is limited because it does not include important factors such as apartment condition or micro-location.

	German input is important because the dataset contains Swiss town names.
	In the future, adding more features or improving extraction robustness would improve the system.

	---

	## 11. Responsible Use Note

	The prediction is only an estimate and should not be used as a definitive price.
	The model uses limited structured features and ignores important real-world factors such as condition, location within a city, and amenities.

	The LLM may also extract incorrect values from user input.
	Users should treat the output as a rough guideline rather than an exact prediction.