Spaces:
Sleeping
Sleeping
| # Documentation | |
| ## Week 12: Apartment Predictor (Saved Regression Model + LLM Workflow) | |
| Use this file to document what you built, tested, and learned in this exercise. | |
| Do not rename this file to `README.md`, because `README.md` is needed by Hugging Face Spaces. | |
| This file is part of the submission. Complete it after you have tested and deployed your app. | |
| --- | |
| ## 1. Project Summary | |
| **Short description of your app:** | |
| This app allows users to describe their apartment search in natural German language (e.g. number of rooms, size, and location). The system extracts structured input from the text, predicts a monthly rent using a regression model, and generates a short explanation. | |
| The regression model predicts the estimated monthly rent in CHF. The LLM is used for two tasks: extracting structured data from free text and generating a human-readable explanation of the prediction. | |
| --- | |
| ## 2. Files Used | |
| List the main files you worked with. | |
| | File | Purpose | | |
| |------|---------| | |
| | `ai_applications_exercise2.ipynb` | Notebook for initial understanding and testing | | |
| | `app_student.py` | Implementation of the full pipeline & deployable app for Hugging Face | | |
| | `random_forest_regression.pkl` | Pre-trained regression model | | |
| | `bfs_municipality_and_tax_data.csv` | Municipality features used for prediction | | |
| | `requirements.txt` | Python dependencies | | |
| | `documentation.md` | This documentation: Written documentation for the submission | | |
| --- | |
| ## 3. Numeric Prediction Part | |
| ### 3.1 Reused Model | |
| **Which saved model did you use?** | |
| `random_forest_regression.pkl` | |
| **What does the model predict?** | |
| The model predicts the estimated monthly rent (in CHF) for an apartment based on apartment characteristics and municipality data. | |
| **Which input features are used for prediction?** | |
| 1. `rooms` | |
| 2. `area_m2` | |
| 3. `pop` | |
| 4. `pop_dens` | |
| 5. `frg_pct` | |
| 6. `emp` | |
| 7. `tax_income` | |
| 8. `distance_to_zurich_center` (not available in dataset, approximated as 0) | |
| ### 3.2 Prediction Logic | |
| The user input is first converted into structured values (`rooms`, `area_m2`, `town`). | |
| The town is matched to the dataset, and municipality features are retrieved. | |
| Since the trained model expects 8 features, but the dataset does not include distance_to_zurich_center, this value is approximated as 0. The full feature vector is then passed to the model for prediction. | |
| --- | |
| ## 4. LLM Extraction Part | |
| ### 4.1 Goal | |
| The LLM extracts structured information from the user’s German text: | |
| - number of rooms | |
| - apartment size (m²) | |
| - town name | |
| ### 4.2 Prompt Design | |
| The LLM is instructed with: | |
| - a system prompt defining the task (extraction) | |
| - a requirement to return strict JSON only | |
| - explicit keys: rooms, area_m2, town | |
| - numeric values for rooms and area | |
| - no additional text or Markdown | |
| ### 4.3 Expected Output Format | |
| Document the ideal extraction output. | |
| ```json | |
| { | |
| "rooms":4, | |
| "area_m2":110, | |
| "town":"Zürich" | |
| } | |
| ``` | |
| ### 4.4 Validation | |
| The JSON response is validated in Python: | |
| - empty responses are rejected | |
| - JSON parsing is enforced | |
| - required keys are checked | |
| - Markdown code blocks are removed if present | |
| This ensures robustness against LLM formatting errors. | |
| --- | |
| ## 5. LLM Explanation Part | |
| ### 5.1 Goal | |
| The second LLM step generates a short explanation of the predicted rent. | |
| It does not calculate a new prediction. | |
| ### 5.2 Prompt Design | |
| The prompt includes: | |
| - structured preferences (rooms, area, town) | |
| - the predicted rent value | |
| - instruction to respond in German | |
| - requirement to include one uncertainty note | |
| - strict JSON output with key answer | |
| ### 5.3 Expected Output Format | |
| Example: | |
| ```json | |
| {"answer": "Die geschätzte Monatsmiete für eine 4-Zimmer-Wohnung mit 110 m² in Zürich beträgt etwa 4.310 CHF. Bitte beachten Sie, dass es sich um eine Schätzung handelt und der tatsächliche Mietpreis je nach Lage und Ausstattung variieren kann."} | |
| ``` | |
| --- | |
| ## 6. End-to-End Pipeline | |
| 1. User enters a German apartment request. | |
| 2. LLM extracts `rooms`, `area_m2`, and `town`. | |
| 3. Python validates the extracted values. | |
| 4. The regression model predicts the rent. | |
| 5. The LLM generates a short explanation. | |
| 6. The app returns JSON, prediction, and explanation | |
| --- | |
| ## 7. Test Cases | |
| Document at least 3 test inputs. | |
| | Test Input | Extracted Output Correct? | Prediction Returned? | Explanation Returned? | Notes | | |
| |------------|----------------------------|----------------------|-----------------------|-------| | |
| | `3.5 Zimmer, 80m2, Adlikon` | Yes | Yes | Yes | Short input works | | |
| | `Ich suche eine 3.5 Zimmer Wohnung mit 80m2 in Adlikon.` | Yes | Yes | Yes | Correct extraction and prediction | | |
| | `Ich suche eine 4 Zimmer Wohnung mit 110m2 in Zürich.`| Yes | Yes | Yes | Full pipeline works | | |
| --- | |
| ## 8. Errors and Problems | |
| **Problem1: Missing OpenAI API key** | |
| **Cause:** | |
| Environment variable not set correctly | |
| **Fix:** | |
| Used OPENAI_API_KEY and environment variables | |
| **Problem2: LLM returned invalid JSON** | |
| **Cause:** | |
| LLM returned Markdown-formatted JSON (```json) | |
| **Fix:** | |
| Removed Markdown in parser and improved prompt | |
| **Problem3: Feature mismatch (7 vs 8 features)** | |
| **Cause:** | |
| Model expected 8 features | |
| **Fix:** | |
| Added dummy value (0) for missing feature | |
| --- | |
| ## 9. Deployment Notes | |
| https://huggingface.co/spaces/lst0004/ApartmentPredictor2.0 | |
| ### 9.1 Files included | |
| - app_student.py | |
| - random_forest_regression.pkl | |
| - bfs_municipality_and_tax_data.csv | |
| - requirements.txt | |
| - documentation.md | |
| - README.md | |
| - .gitattributes | |
| ### 9.2 Secrets / Environment Variables | |
| - `OPENAI_API_KEY` | |
| ### 9.3 Deployment Result | |
| The app runs successfully on Hugging Face Spaces. | |
| The full pipeline works, including LLM extraction, prediction, and explanation. | |
| ### 9.4 Screenshots | |
|  | |
| Dieses Beispiel zeigt eine Eingabe für Adlikon. Die Werte wurden korrekt extrahiert und eine plausible Mietschätzung erzeugt. | |
|  | |
| Hier wird eine Anfrage für Zürich verarbeitet. Die Pipeline funktioniert ebenfalls vollständig inklusive Erklärung durch das LLM. | |
| --- | |
| ## 10. Reflection | |
| The combination of a regression model and an LLM worked well. The LLM simplifies the user interface by allowing natural language input. | |
| However, the system is fragile when the LLM returns incorrectly formatted JSON. Additionally, the model is limited because it does not include important factors such as apartment condition or micro-location. | |
| German input is important because the dataset contains Swiss town names. | |
| In the future, adding more features or improving extraction robustness would improve the system. | |
| --- | |
| ## 11. Responsible Use Note | |
| The prediction is only an estimate and should not be used as a definitive price. | |
| The model uses limited structured features and ignores important real-world factors such as condition, location within a city, and amenities. | |
| The LLM may also extract incorrect values from user input. | |
| Users should treat the output as a rough guideline rather than an exact prediction. | |