conversational-agent / documentation.md
samuelsaettler
Align README + documentation.md with reference template
3682b71
# Documentation
## Apartment Predictor (Saved Regression Model + LLM Workflow)
This file documents what was built, tested, and learned in this exercise.
It follows the structure of the reference template from
`zhaw-iwi/ai-applications-prediction-and-nlp/documentation.md`.
---
## 1. Project Summary
**Short description of your app:**
The app turns a free-text German apartment wish (e.g. *"Ich suche eine
3.5-Zimmer-Wohnung mit etwa 85 m² in Winterthur."*) into an estimated
monthly rent in CHF for the Canton of Zürich. An OpenAI LLM extracts the
structured fields, a saved scikit-learn `GradientBoostingRegressor`
predicts the rent, and a second LLM call returns a short German
explanation including one uncertainty note. The app is deployed as a
Gradio Space on Hugging Face.
---
## 2. Files Used
| File | Purpose |
|------|---------|
| `app.py` | Final deployable Gradio app (LLM → model → LLM pipeline) |
| `model_gbm.pkl` | Saved scikit-learn `GradientBoostingRegressor` (12 features) |
| `municipality_lookup.csv` | Zürich municipality features used for prediction |
| `requirements.txt` | Python dependencies |
| `README.md` | Hugging Face Space metadata + project overview |
| `documentation.md` | Written documentation for the submission |
---
## 3. Numeric Prediction Part
### 3.1 Reused Model
**Which saved model did you use?**
`model_gbm.pkl` – the `GradientBoostingRegressor` trained earlier in
`ai-applications/end-of-module-block-1/train_model.ipynb`
(5-fold CV R² ≈ 0.73, RMSE ≈ 559 CHF).
**What does the model predict?**
The monthly gross rent in CHF for an apartment located in a municipality
of the Canton of Zürich.
**Which input features are used for prediction?**
The model uses **12 features** in this exact order:
1. `rooms`
2. `area` (m²)
3. `pop` – municipality population
4. `pop_dens` – population density
5. `frg_pct` – percentage of foreign residents
6. `emp` – number of employees
7. `tax_income` – average taxable income (CHF)
8. `room_per_m2` – engineered: `area / rooms`
9. `luxurious` – binary flag
10. `furnished` – binary flag
11. `zurich_city` – 1 if municipality is the City of Zürich
12. `distance_to_zurich_center` – Haversine distance to Zürich centre (km)
### 3.2 Prediction Logic
The LLM returns `rooms`, `area_m2`, `town`, plus the optional flags
`luxurious` and `furnished`. The Python function
`predict_apartment_price` looks up the municipality row in
`municipality_lookup.csv` to pull the BFS socioeconomic features
(`pop`, `pop_dens`, `frg_pct`, `emp`, `tax_income`) and the precomputed
`zurich_city` / `distance_to_zurich_center` columns. `room_per_m2` is
computed on the fly. The 12-column DataFrame is passed to
`model.predict(...)` and the result is rounded to the nearest CHF.
---
## 4. LLM Extraction Part
### 4.1 Goal
Convert a free-form German sentence into a strict JSON object containing
the values the regression model needs (`rooms`, `area_m2`, `town`) plus
two optional binary flags (`luxurious`, `furnished`).
### 4.2 Prompt Design
- **System prompt** in German, naming the role
(*"Du bist ein Extraktionshelfer für eine Schweizer Wohnungs-App."*).
- **Strict JSON only** is required – no Markdown, no explanation.
- **Required keys** are spelled out exactly: `rooms`, `area_m2`, `town`.
- **Optional keys** with default `false`: `luxurious`, `furnished`.
- The user prompt is the raw German wish.
- The OpenAI call uses `response_format={"type": "json_object"}` and
`temperature=0` so the output is deterministic and parseable.
```text
Du bist ein Extraktionshelfer für eine Schweizer Wohnungs-App.
Lies den deutschen Text und gib AUSSCHLIESSLICH ein JSON-Objekt zurück.
Pflichtfelder:
- "rooms" (Zahl, z.B. 3.5)
- "area_m2" (Zahl in Quadratmetern, z.B. 85)
- "town" (Schweizer Gemeindename im Kanton Zürich, z.B. "Winterthur")
Optionale Felder (sonst false):
- "luxurious"
- "furnished"
```
### 4.3 Expected Output Format
```json
{"rooms": 3.5, "area_m2": 85, "town": "Winterthur", "luxurious": false, "furnished": false}
```
### 4.4 Validation
`parse_json_response` enforces three checks before any value is used:
1. The response is non-empty.
2. `json.loads` succeeds (otherwise the raw text is shown in the error).
3. All required keys are present.
`extract_preferences` then verifies that `rooms` and `area_m2` are not
`None`, that `town` is non-empty, and that `match_town` resolves the
town to a canonical `bfs_name` (case-insensitive exact match first,
then a substring match against the BFS list). Any failure raises a
`ValueError` that surfaces in the German error message in the UI – there
is no silent regex fallback.
---
## 5. LLM Explanation Part
### 5.1 Goal
Produce a short, plain German explanation of the model's prediction.
The LLM **must not recompute** the price – it only describes the result
the regression model already produced.
### 5.2 Prompt Design
- **System prompt** tells the model it is explaining a rent estimate
from a machine-learning model, in German.
- The user message contains a JSON payload with the structured
preferences and the predicted rent in CHF.
- The model is instructed to return JSON with one key, `answer`,
containing 2–4 short German sentences.
- The answer must reference the user's rooms, area, and town and must
include exactly **one** uncertainty / limitation note (condition,
micro location, floor, year of renovation, …).
- No Markdown formatting.
### 5.3 Expected Output Format
```json
{"answer": "Für eine 3.5-Zimmer-Wohnung mit 85 m² in Winterthur schätzt das Modell rund 2100 CHF pro Monat. Die Schätzung basiert auf Wohnfläche und Ortsmerkmalen wie Steuerkraft und Distanz zur Stadt Zürich. Eine Unsicherheit ist, dass Zustand und Stockwerk nicht im Modell enthalten sind."}
```
The `answer` string is the text shown in the *Erklärung (LLM)* textbox.
---
## 6. End-to-End Pipeline
1. The user enters a German apartment description in the textbox.
2. `extract_preferences` calls the LLM and returns a validated dict
`{rooms, area_m2, town, luxurious, furnished}`.
3. Python validates the values with `parse_json_response` and
`match_town` – any failure raises a clear German error.
4. `predict_apartment_price` joins the BFS lookup, builds the
12-feature row, and calls `model.predict(...)`.
5. `generate_explanation` calls the LLM again with the preferences and
the prediction; the JSON `answer` field is extracted.
6. The Gradio app returns the structured preferences (JSON), the
rounded CHF prediction, and the explanation text.
If any step fails, the error message is shown in the *Erklärung* field
and the prediction is left empty – nothing is silently filled in.
---
## 7. Test Cases
| # | Test Input | Extracted Output Correct? | Prediction Returned? | Explanation Returned? | Notes |
|---|------------|---------------------------|----------------------|-----------------------|-------|
| 1 | `Ich suche eine 3.5-Zimmer-Wohnung mit 85 m2 in Winterthur.` | Yes | Yes (~CHF 2,100) | Yes | Baseline case from the assignment |
| 2 | `Ich brauche eine möblierte 2-Zimmer-Wohnung mit 55 m2 in Kloten.` | Yes (`furnished=true`) | Yes | Yes | Tests optional flag detection |
| 3 | `Ich hätte gerne eine luxuriöse 4.5-Zimmer-Wohnung mit 140 m2 in Küsnacht (ZH).` | Yes (`luxurious=true`) | Yes (~CHF 4,500) | Yes | Tests luxury flag and a town with parentheses |
| 4 | `Eine 5-Zimmer-Wohnung mit 130 m2 in Zürich wäre ideal.` | Yes | Yes | Yes | Tests `zurich_city=1` path |
| 5 | `Ich suche etwas in Bern.` | Pipeline raises a German error | No | Error message shown | Out-of-canton town → friendly failure, no silent fallback |
Local sanity check (calling `predict_apartment_price` directly, no LLM):
```text
3.5 rooms / 85 m² / Winterthur → CHF 2,103
4.5 rooms / 140 m² / Küsnacht (ZH) luxury → CHF 4,462
```
---
## 8. Errors and Problems
**Problem:** First test runs returned a 132-byte `model_gbm.pkl` and
`pickle.load` failed.
**Cause:** The copy of the file in `apartment-price-prediction/` was a
Git LFS pointer, not the real model.
**Fix:** Use the actual 1.4 MB model from
`ai-applications/end-of-module-block-1/model_gbm.pkl`.
**Problem:** First push to Hugging Face was rejected with
*"contains binary files. Please use Xet to store binary files."*
**Cause:** `model_gbm.pkl` was committed as a regular blob and the HF
pre-receive hook enforces Xet/LFS for `.pkl` files.
**Fix:** Reset the commit, upload the model with
`hf upload --repo-type space saettsam/conversational-agent
model_gbm.pkl model_gbm.pkl` (uses Xet), pull the new commit, then push
the rest of the files normally.
**Problem:** Town names like `Küsnacht (ZH)` or `Zürich` (umlaut) did
not match user input.
**Cause:** Strict, case-sensitive equality on the BFS list.
**Fix:** `match_town` lower-cases both sides and falls back to a
substring match against the canonical `bfs_name` list.
**Problem:** Missing `OPENAI_API_KEY` on the Space crashed the app on
the first user interaction with an opaque traceback.
**Cause:** The OpenAI client was being created at import time.
**Fix:** Lazy `get_openai_client()` raises a clear German error message
that surfaces directly in the UI textbox.
---
## 9. Deployment Notes
### 9.1 Files included
- `app.py`
- `model_gbm.pkl` (uploaded via Xet)
- `municipality_lookup.csv`
- `requirements.txt`
- `README.md`
- `documentation.md`
- `.gitattributes`
### 9.2 Secrets / Environment Variables
Configured in **Settings → Variables and secrets** of the Space:
- `OPENAI_API_KEY` (required)
- `OPENAI_MODEL` (optional, defaults to `gpt-4o-mini`)
### 9.3 Deployment Result
The Space builds with the standard Gradio template. The model file
(~1.4 MB) lives in Xet storage and loads on cold start. After the
secret is set, end-to-end latency is roughly 0.5 s for extraction,
negligible for the local model prediction, and ~1 s for the
explanation – about 1.5–2 s per German request in total.
### 9.4 Screenshots
Two screenshots from the running app, each showing a different German
input, the *Extrahierte Eingaben (LLM)* JSON, the *Geschätzte
Monatsmiete (CHF)* number, and the *Erklärung (LLM)* text:
![Beispiel 1](Beispiel1.png)
**Beispiel 1:** A first German apartment wish is entered. The LLM
extracts the structured JSON (`rooms`, `area_m2`, `town`, plus the
optional `luxurious` / `furnished` flags), the GradientBoostingRegressor
returns a CHF rent estimate, and the second LLM call produces the short
German explanation visible in the *Erklärung (LLM)* textbox – including
one uncertainty note about features not contained in the model.
![Beispiel 2](Beispiel2.png)
**Beispiel 2:** A second German apartment wish with different rooms,
area, and town is entered. Again the extracted JSON, the predicted
monthly rent, and the German explanation are all visible at the same
time, demonstrating that the end-to-end pipeline (LLM extraction →
model prediction → LLM explanation) works for multiple inputs.
---
## 10. Reflection
Combining a regression model with an LLM gives a friendly natural-language
front end without giving up the deterministic numerics – the model still
owns the price. The system is most fragile when the user names a town
outside the canton or omits a required value; strict JSON mode plus an
explicit `match_town` check keeps those failures visible instead of
producing a confidently wrong prediction. German input matters because
the BFS dataset uses Swiss spellings (`Zürich`, `Küsnacht (ZH)`) that an
English prompt drifts away from. The biggest missing inputs are
condition, year of renovation, floor / elevator, and balcony – features
that humans weigh heavily but the training data did not capture. Next
iteration: add confidence intervals from a quantile regressor and an
optional clarifying question when the LLM returns `null` for `area_m2`
or `rooms`.
---
## 11. Responsible Use Note
The prediction is a rough indication, not a market quote. The model was
trained on a snapshot of public listings and only sees twelve structured
features – condition, micro location, balcony, floor, elevator and many
other rent drivers are not represented. The LLM may also misread the
user text (e.g. confuse "etwa 85 m²" with another number); that is why
every prediction is shown together with the extracted JSON, so the user
can verify what the model actually saw. The app is intended for
educational and exploratory use only and must not be used as the sole
basis for any rental decision.