Spaces:

saettsam
/

conversational-agent

Sleeping

App Files Files Community

conversational-agent / documentation.md

samuelsaettler

Align README + documentation.md with reference template

3682b71 12 days ago

preview code

raw

history blame contribute delete

12.6 kB

	# Documentation
	## Apartment Predictor (Saved Regression Model + LLM Workflow)

	This file documents what was built, tested, and learned in this exercise.
	It follows the structure of the reference template from
	`zhaw-iwi/ai-applications-prediction-and-nlp/documentation.md`.

	---

	## 1. Project Summary

	Short description of your app:

	The app turns a free-text German apartment wish (e.g. *"Ich suche eine
	3.5-Zimmer-Wohnung mit etwa 85 m² in Winterthur."*) into an estimated
	monthly rent in CHF for the Canton of Zürich. An OpenAI LLM extracts the
	structured fields, a saved scikit-learn `GradientBoostingRegressor`
	predicts the rent, and a second LLM call returns a short German
	explanation including one uncertainty note. The app is deployed as a
	Gradio Space on Hugging Face.

	---

	## 2. Files Used

	\| File \| Purpose \|
	\|------\|---------\|
	\| `app.py` \| Final deployable Gradio app (LLM → model → LLM pipeline) \|
	\| `model_gbm.pkl` \| Saved scikit-learn `GradientBoostingRegressor` (12 features) \|
	\| `municipality_lookup.csv` \| Zürich municipality features used for prediction \|
	\| `requirements.txt` \| Python dependencies \|
	\| `README.md` \| Hugging Face Space metadata + project overview \|
	\| `documentation.md` \| Written documentation for the submission \|

	---

	## 3. Numeric Prediction Part

	### 3.1 Reused Model

	Which saved model did you use?
	`model_gbm.pkl` – the `GradientBoostingRegressor` trained earlier in
	`ai-applications/end-of-module-block-1/train_model.ipynb`
	(5-fold CV R² ≈ 0.73, RMSE ≈ 559 CHF).

	What does the model predict?
	The monthly gross rent in CHF for an apartment located in a municipality
	of the Canton of Zürich.

	Which input features are used for prediction?

	The model uses 12 features in this exact order:

	1. `rooms`
	2. `area` (m²)
	3. `pop` – municipality population
	4. `pop_dens` – population density
	5. `frg_pct` – percentage of foreign residents
	6. `emp` – number of employees
	7. `tax_income` – average taxable income (CHF)
	8. `room_per_m2` – engineered: `area / rooms`
	9. `luxurious` – binary flag
	10. `furnished` – binary flag
	11. `zurich_city` – 1 if municipality is the City of Zürich
	12. `distance_to_zurich_center` – Haversine distance to Zürich centre (km)

	### 3.2 Prediction Logic

	The LLM returns `rooms`, `area_m2`, `town`, plus the optional flags
	`luxurious` and `furnished`. The Python function
	`predict_apartment_price` looks up the municipality row in
	`municipality_lookup.csv` to pull the BFS socioeconomic features
	(`pop`, `pop_dens`, `frg_pct`, `emp`, `tax_income`) and the precomputed
	`zurich_city` / `distance_to_zurich_center` columns. `room_per_m2` is
	computed on the fly. The 12-column DataFrame is passed to
	`model.predict(...)` and the result is rounded to the nearest CHF.

	---

	## 4. LLM Extraction Part

	### 4.1 Goal

	Convert a free-form German sentence into a strict JSON object containing
	the values the regression model needs (`rooms`, `area_m2`, `town`) plus
	two optional binary flags (`luxurious`, `furnished`).

	### 4.2 Prompt Design

	- System prompt in German, naming the role
	("Du bist ein Extraktionshelfer für eine Schweizer Wohnungs-App.").
	- Strict JSON only is required – no Markdown, no explanation.
	- Required keys are spelled out exactly: `rooms`, `area_m2`, `town`.
	- Optional keys with default `false`: `luxurious`, `furnished`.
	- The user prompt is the raw German wish.
	- The OpenAI call uses `response_format={"type": "json_object"}` and
	`temperature=0` so the output is deterministic and parseable.

	```text
	Du bist ein Extraktionshelfer für eine Schweizer Wohnungs-App.
	Lies den deutschen Text und gib AUSSCHLIESSLICH ein JSON-Objekt zurück.

	Pflichtfelder:
	- "rooms" (Zahl, z.B. 3.5)
	- "area_m2" (Zahl in Quadratmetern, z.B. 85)
	- "town" (Schweizer Gemeindename im Kanton Zürich, z.B. "Winterthur")

	Optionale Felder (sonst false):
	- "luxurious"
	- "furnished"
	```

	### 4.3 Expected Output Format

	```json
	{"rooms": 3.5, "area_m2": 85, "town": "Winterthur", "luxurious": false, "furnished": false}
	```

	### 4.4 Validation

	`parse_json_response` enforces three checks before any value is used:

	1. The response is non-empty.
	2. `json.loads` succeeds (otherwise the raw text is shown in the error).
	3. All required keys are present.

	`extract_preferences` then verifies that `rooms` and `area_m2` are not
	`None`, that `town` is non-empty, and that `match_town` resolves the
	town to a canonical `bfs_name` (case-insensitive exact match first,
	then a substring match against the BFS list). Any failure raises a
	`ValueError` that surfaces in the German error message in the UI – there
	is no silent regex fallback.

	---

	## 5. LLM Explanation Part

	### 5.1 Goal

	Produce a short, plain German explanation of the model's prediction.
	The LLM must not recompute the price – it only describes the result
	the regression model already produced.

	### 5.2 Prompt Design

	- System prompt tells the model it is explaining a rent estimate
	from a machine-learning model, in German.
	- The user message contains a JSON payload with the structured
	preferences and the predicted rent in CHF.
	- The model is instructed to return JSON with one key, `answer`,
	containing 2–4 short German sentences.
	- The answer must reference the user's rooms, area, and town and must
	include exactly one uncertainty / limitation note (condition,
	micro location, floor, year of renovation, …).
	- No Markdown formatting.

	### 5.3 Expected Output Format

	```json
	{"answer": "Für eine 3.5-Zimmer-Wohnung mit 85 m² in Winterthur schätzt das Modell rund 2100 CHF pro Monat. Die Schätzung basiert auf Wohnfläche und Ortsmerkmalen wie Steuerkraft und Distanz zur Stadt Zürich. Eine Unsicherheit ist, dass Zustand und Stockwerk nicht im Modell enthalten sind."}
	```

	The `answer` string is the text shown in the Erklärung (LLM) textbox.

	---

	## 6. End-to-End Pipeline

	1. The user enters a German apartment description in the textbox.
	2. `extract_preferences` calls the LLM and returns a validated dict
	`{rooms, area_m2, town, luxurious, furnished}`.
	3. Python validates the values with `parse_json_response` and
	`match_town` – any failure raises a clear German error.
	4. `predict_apartment_price` joins the BFS lookup, builds the
	12-feature row, and calls `model.predict(...)`.
	5. `generate_explanation` calls the LLM again with the preferences and
	the prediction; the JSON `answer` field is extracted.
	6. The Gradio app returns the structured preferences (JSON), the
	rounded CHF prediction, and the explanation text.

	If any step fails, the error message is shown in the Erklärung field
	and the prediction is left empty – nothing is silently filled in.

	---

	## 7. Test Cases

	\| # \| Test Input \| Extracted Output Correct? \| Prediction Returned? \| Explanation Returned? \| Notes \|
	\|---\|------------\|---------------------------\|----------------------\|-----------------------\|-------\|
	\| 1 \| `Ich suche eine 3.5-Zimmer-Wohnung mit 85 m2 in Winterthur.` \| Yes \| Yes (~CHF 2,100) \| Yes \| Baseline case from the assignment \|
	\| 2 \| `Ich brauche eine möblierte 2-Zimmer-Wohnung mit 55 m2 in Kloten.` \| Yes (`furnished=true`) \| Yes \| Yes \| Tests optional flag detection \|
	\| 3 \| `Ich hätte gerne eine luxuriöse 4.5-Zimmer-Wohnung mit 140 m2 in Küsnacht (ZH).` \| Yes (`luxurious=true`) \| Yes (~CHF 4,500) \| Yes \| Tests luxury flag and a town with parentheses \|
	\| 4 \| `Eine 5-Zimmer-Wohnung mit 130 m2 in Zürich wäre ideal.` \| Yes \| Yes \| Yes \| Tests `zurich_city=1` path \|
	\| 5 \| `Ich suche etwas in Bern.` \| Pipeline raises a German error \| No \| Error message shown \| Out-of-canton town → friendly failure, no silent fallback \|

	Local sanity check (calling `predict_apartment_price` directly, no LLM):

	```text
	3.5 rooms / 85 m² / Winterthur → CHF 2,103
	4.5 rooms / 140 m² / Küsnacht (ZH) luxury → CHF 4,462
	```

	---

	## 8. Errors and Problems

	Problem: First test runs returned a 132-byte `model_gbm.pkl` and
	`pickle.load` failed.
	Cause: The copy of the file in `apartment-price-prediction/` was a
	Git LFS pointer, not the real model.
	Fix: Use the actual 1.4 MB model from
	`ai-applications/end-of-module-block-1/model_gbm.pkl`.

	Problem: First push to Hugging Face was rejected with
	"contains binary files. Please use Xet to store binary files."
	Cause: `model_gbm.pkl` was committed as a regular blob and the HF
	pre-receive hook enforces Xet/LFS for `.pkl` files.
	Fix: Reset the commit, upload the model with
	`hf upload --repo-type space saettsam/conversational-agent
	model_gbm.pkl model_gbm.pkl` (uses Xet), pull the new commit, then push
	the rest of the files normally.

	Problem: Town names like `Küsnacht (ZH)` or `Zürich` (umlaut) did
	not match user input.
	Cause: Strict, case-sensitive equality on the BFS list.
	Fix: `match_town` lower-cases both sides and falls back to a
	substring match against the canonical `bfs_name` list.

	Problem: Missing `OPENAI_API_KEY` on the Space crashed the app on
	the first user interaction with an opaque traceback.
	Cause: The OpenAI client was being created at import time.
	Fix: Lazy `get_openai_client()` raises a clear German error message
	that surfaces directly in the UI textbox.

	---

	## 9. Deployment Notes

	### 9.1 Files included

	- `app.py`
	- `model_gbm.pkl` (uploaded via Xet)
	- `municipality_lookup.csv`
	- `requirements.txt`
	- `README.md`
	- `documentation.md`
	- `.gitattributes`

	### 9.2 Secrets / Environment Variables

	Configured in Settings → Variables and secrets of the Space:

	- `OPENAI_API_KEY` (required)
	- `OPENAI_MODEL` (optional, defaults to `gpt-4o-mini`)

	### 9.3 Deployment Result

	The Space builds with the standard Gradio template. The model file
	(~1.4 MB) lives in Xet storage and loads on cold start. After the
	secret is set, end-to-end latency is roughly 0.5 s for extraction,
	negligible for the local model prediction, and ~1 s for the
	explanation – about 1.5–2 s per German request in total.

	### 9.4 Screenshots

	Two screenshots from the running app, each showing a different German
	input, the Extrahierte Eingaben (LLM) JSON, the *Geschätzte
	Monatsmiete (CHF)* number, and the Erklärung (LLM) text:

	![Beispiel 1](Beispiel1.png)

	Beispiel 1: A first German apartment wish is entered. The LLM
	extracts the structured JSON (`rooms`, `area_m2`, `town`, plus the
	optional `luxurious` / `furnished` flags), the GradientBoostingRegressor
	returns a CHF rent estimate, and the second LLM call produces the short
	German explanation visible in the Erklärung (LLM) textbox – including
	one uncertainty note about features not contained in the model.

	![Beispiel 2](Beispiel2.png)

	Beispiel 2: A second German apartment wish with different rooms,
	area, and town is entered. Again the extracted JSON, the predicted
	monthly rent, and the German explanation are all visible at the same
	time, demonstrating that the end-to-end pipeline (LLM extraction →
	model prediction → LLM explanation) works for multiple inputs.

	---

	## 10. Reflection

	Combining a regression model with an LLM gives a friendly natural-language
	front end without giving up the deterministic numerics – the model still
	owns the price. The system is most fragile when the user names a town
	outside the canton or omits a required value; strict JSON mode plus an
	explicit `match_town` check keeps those failures visible instead of
	producing a confidently wrong prediction. German input matters because
	the BFS dataset uses Swiss spellings (`Zürich`, `Küsnacht (ZH)`) that an
	English prompt drifts away from. The biggest missing inputs are
	condition, year of renovation, floor / elevator, and balcony – features
	that humans weigh heavily but the training data did not capture. Next
	iteration: add confidence intervals from a quantile regressor and an
	optional clarifying question when the LLM returns `null` for `area_m2`
	or `rooms`.

	---

	## 11. Responsible Use Note

	The prediction is a rough indication, not a market quote. The model was
	trained on a snapshot of public listings and only sees twelve structured
	features – condition, micro location, balcony, floor, elevator and many
	other rent drivers are not represented. The LLM may also misread the
	user text (e.g. confuse "etwa 85 m²" with another number); that is why
	every prediction is shown together with the extracted JSON, so the user
	can verify what the model actually saw. The app is intended for
	educational and exploratory use only and must not be used as the sole
	basis for any rental decision.