Spaces:
Sleeping
Sleeping
Merge branch 'dev'
Browse files
notebooks/01_data_exploration.ipynb
CHANGED
|
@@ -650,7 +650,9 @@
|
|
| 650 |
{
|
| 651 |
"cell_type": "markdown",
|
| 652 |
"metadata": {},
|
| 653 |
-
"source":
|
|
|
|
|
|
|
| 654 |
},
|
| 655 |
{
|
| 656 |
"cell_type": "code",
|
|
@@ -1503,14 +1505,14 @@
|
|
| 1503 |
{
|
| 1504 |
"cell_type": "markdown",
|
| 1505 |
"source": [
|
| 1506 |
-
"[β Back to Table of Contents](#Table-of-Contents)\n\n## 9. Challenge Requirements: Questions & Answers\n\nThe MLE challenge specifies evaluation criteria. Below we address each one with evidence from our pipeline.\n\n**Challenge context:** *\"You are analysing residential property prices in France [...] to estimate the current market price as price per squared meter (β¬/mΒ²)\"*\n\n**
|
| 1507 |
],
|
| 1508 |
"metadata": {}
|
| 1509 |
},
|
| 1510 |
{
|
| 1511 |
"cell_type": "markdown",
|
| 1512 |
"source": [
|
| 1513 |
-
"[β Back to Table of Contents](#Table-of-Contents)\n\n## 10. Top 10 Cities: Price by Property Type (Challenge Deliverable)\n\nThe challenge asks: *\"Produce a list of market price per square meter by property type for the top 10 biggest cities.\"*\n\nDVF has 4 `type_local` values. We keep only the 2 residential types
|
| 1514 |
],
|
| 1515 |
"metadata": {}
|
| 1516 |
},
|
|
|
|
| 650 |
{
|
| 651 |
"cell_type": "markdown",
|
| 652 |
"metadata": {},
|
| 653 |
+
"source": [
|
| 654 |
+
"[β Back to Table of Contents](#Table-of-Contents)\n\n## 4. Temporal Weighting Analysis\n\n### Why temporal weighting?\n\nThe challenge requires us to *\"make the best estimation of the market price taking into account transaction price volatility, transaction volume, data freshness and consistency.\"*\n\nThe French property market has changed significantly over our data period (2020-2025):\n- **2020-2021**: COVID-era boom (+6-7% per year nationally)\n- **2022**: Rising interest rates begin slowing the market\n- **2023-2024**: Price correction (-1.8% to -5.2% YoY in some areas)\n- **2025**: Stabilization\n\nA simple average would mix boom prices with correction prices equally. Temporal weighting solves this by making recent transactions count more.\n\n### Our approach: Exponential decay\n\nEach transaction receives a weight based on how recent it is:\n\n```\nweight = Ξ»^(months_since_reference_date)\n```\n\nWhere `Ξ» = 0.97` and `reference_date = 2025-01-01`. This gives a **half-life of ~23 months** β a transaction from 2 years ago contributes half as much as a transaction from today. See Section 11.1 for detailed justification of why Ξ» = 0.97.\n\nThe plots below show:\n1. **Distribution of weights** across all transactions\n2. **Distribution of transaction ages** (months since reference)\n3. **The exponential decay curve** β how weight decreases with age"
|
| 655 |
+
]
|
| 656 |
},
|
| 657 |
{
|
| 658 |
"cell_type": "code",
|
|
|
|
| 1505 |
{
|
| 1506 |
"cell_type": "markdown",
|
| 1507 |
"source": [
|
| 1508 |
+
"[β Back to Table of Contents](#Table-of-Contents)\n\n## 9. Challenge Requirements: Questions & Answers\n\nThe MLE challenge specifies evaluation criteria. Below we address each one with evidence from our pipeline.\n\n**Challenge context:** *\"You are analysing residential property prices in France [...] to estimate the current market price as price per squared meter (β¬/mΒ²)\"*\n\n**Key design decisions based on the challenge requirements:**\n- **Time frame**: The challenge specifies using available transaction data with a method that accounts for data freshness β We use exponential temporal decay (Ξ»=0.97/month)\n- **Property types**: The challenge focuses on residential properties β Appartement + Maison only (no commercial, no dΓ©pendance)\n- **Price calculation**: The challenge asks for a robust estimation method β Time-weighted trimmed mean with confidence scoring\n- **Missing data**: Alsace-Moselle (57, 67, 68) + Mayotte (976) have no DVF data (different land registry system) β excluded\n- **Top 10 cities**: By population β 9 cities shown (Strasbourg excluded β no DVF data, it's in Alsace)\n\n| # | Evaluation Criterion | Status | Evidence |\n|---|---------------------|--------|----------|\n| 1 | Is the colored map loading? | β
| MapLibre GL JS choropleth with color scale |\n| 2 | Is the map usable and not laggy? | β
| Pre-computed static JSON, vector tiles from gov servers, lazy-loaded sections |\n| 3 | Is the map refreshing aggregation level on zoom? | β
| 6 zoom thresholds auto-switch layers |\n| 4 | Are all 6 aggregation levels present? | β
| Country (1) β Region (17) β Department (97) β Commune (33,244) β Postcode (5,861) β Section (260,219) |\n| 5 | Are the price estimates plausible? | β
| Validated below against DVF official + RealAdvisor |\n| 6 | Is the data complete or was it subset? | β
| Full 2020-2025 dataset (4.65M transactions), only Alsace-Moselle excluded (no DVF data) |\n| 7 | Processing code: clean, clear, reusable | β
| Modular Python: config β downloader β cleaner β aggregator β top_cities β pipeline |\n| 8 | Architecture: robust and logical | β
| Polars lazy eval, streaming downloads, per-department section splitting |\n| 9 | App is hosted and functional | β
| Hugging Face Spaces (Docker + FastAPI) |\n| 10 | Top 10 cities by property type | β
| See Section 10 below |"
|
| 1509 |
],
|
| 1510 |
"metadata": {}
|
| 1511 |
},
|
| 1512 |
{
|
| 1513 |
"cell_type": "markdown",
|
| 1514 |
"source": [
|
| 1515 |
+
"[β Back to Table of Contents](#Table-of-Contents)\n\n## 10. Top 10 Cities: Price by Property Type (Challenge Deliverable)\n\nThe challenge asks: *\"Produce a list of market price per square meter by property type for the top 10 biggest cities.\"*\n\nDVF has 4 `type_local` values. We keep only the 2 residential types as the challenge specifies residential properties:\n\n| type_local | Description | Included? | Reason |\n|---|---|---|---|\n| **Appartement** | Apartments / flats | Yes | Residential |\n| **Maison** | Houses | Yes | Residential |\n| Dependance | Garages, cellars, parking | No | Not a dwelling |\n| Local industriel, commercial ou assimile | Shops, offices, warehouses | No | Commercial property |\n\n**\"tous\"** = combined WTM across both types, weighted by actual transaction volume (not a simple average of apt + maison). This matters because the mix varies enormously: Paris is ~99.5% apartments, while smaller cities have 70-80% houses."
|
| 1516 |
],
|
| 1517 |
"metadata": {}
|
| 1518 |
},
|