dcrey7 commited on
Commit
d796323
Β·
2 Parent(s): 492cf04 0f6a834

Merge branch 'dev'

Browse files
notebooks/01_data_exploration.ipynb CHANGED
@@ -650,7 +650,9 @@
650
  {
651
  "cell_type": "markdown",
652
  "metadata": {},
653
- "source": "[↑ Back to Table of Contents](#Table-of-Contents)\n\n## 4. Temporal Weighting Analysis\n\n### Why temporal weighting?\n\nCarlos's feedback: *\"Use the entire dataset and develop a method to make historical data relevant to current market prices.\"*\n\nThe French property market has changed significantly over our data period (2020-2025):\n- **2020-2021**: COVID-era boom (+6-7% per year nationally)\n- **2022**: Rising interest rates begin slowing the market\n- **2023-2024**: Price correction (-1.8% to -5.2% YoY in some areas)\n- **2025**: Stabilization\n\nA simple average would mix boom prices with correction prices equally. Temporal weighting solves this by making recent transactions count more.\n\n### Our approach: Exponential decay\n\nEach transaction receives a weight based on how recent it is:\n\n```\nweight = Ξ»^(months_since_reference_date)\n```\n\nWhere `Ξ» = 0.97` and `reference_date = 2025-01-01`. This gives a **half-life of ~23 months** β€” a transaction from 2 years ago contributes half as much as a transaction from today. See Section 11.1 for detailed justification of why Ξ» = 0.97.\n\nThe plots below show:\n1. **Distribution of weights** across all transactions\n2. **Distribution of transaction ages** (months since reference)\n3. **The exponential decay curve** β€” how weight decreases with age"
 
 
654
  },
655
  {
656
  "cell_type": "code",
@@ -1503,14 +1505,14 @@
1503
  {
1504
  "cell_type": "markdown",
1505
  "source": [
1506
- "[↑ Back to Table of Contents](#Table-of-Contents)\n\n## 9. Challenge Requirements: Questions & Answers\n\nThe MLE challenge specifies evaluation criteria. Below we address each one with evidence from our pipeline.\n\n**Challenge context:** *\"You are analysing residential property prices in France [...] to estimate the current market price as price per squared meter (€/mΒ²)\"*\n\n**Carlos's key feedback:**\n- **Time frame**: *\"Use the entire dataset and develop a method to make historical data relevant to current market prices\"* β†’ We use exponential temporal decay (Ξ»=0.97/month)\n- **Property types**: *\"Specifically focused on residential properties\"* β†’ Appartement + Maison only (no commercial, no dΓ©pendance)\n- **Price calculation**: *\"Propose a robust method\"* β†’ Time-weighted trimmed mean with confidence scoring\n- **Missing data**: *\"Fine to exclude\"* β†’ Alsace-Moselle (57, 67, 68) + Mayotte (976) excluded\n- **Top 10 cities**: *\"By population\"* β†’ 9 cities shown (Strasbourg excluded β€” no DVF data, it's in Alsace)\n\n| # | Evaluation Criterion | Status | Evidence |\n|---|---------------------|--------|----------|\n| 1 | Is the colored map loading? | βœ… | MapLibre GL JS choropleth with color scale |\n| 2 | Is the map usable and not laggy? | βœ… | Pre-computed static JSON, vector tiles from gov servers, lazy-loaded sections |\n| 3 | Is the map refreshing aggregation level on zoom? | βœ… | 6 zoom thresholds auto-switch layers |\n| 4 | Are all 6 aggregation levels present? | βœ… | Country (1) β†’ Region (17) β†’ Department (97) β†’ Commune (33,244) β†’ Postcode (5,861) β†’ Section (260,219) |\n| 5 | Are the price estimates plausible? | βœ… | Validated below against DVF official + RealAdvisor |\n| 6 | Is the data complete or was it subset? | βœ… | Full 2020-2025 dataset (4.65M transactions), only Alsace-Moselle excluded (no DVF data) |\n| 7 | Processing code: clean, clear, reusable | βœ… | Modular Python: config β†’ downloader β†’ cleaner β†’ aggregator β†’ top_cities β†’ pipeline |\n| 8 | Architecture: robust and logical | βœ… | Polars lazy eval, streaming downloads, per-department section splitting |\n| 9 | App is hosted and functional | βœ… | Hugging Face Spaces (Docker + FastAPI) |\n| 10 | Top 10 cities by property type | βœ… | See Section 10 below |"
1507
  ],
1508
  "metadata": {}
1509
  },
1510
  {
1511
  "cell_type": "markdown",
1512
  "source": [
1513
- "[↑ Back to Table of Contents](#Table-of-Contents)\n\n## 10. Top 10 Cities: Price by Property Type (Challenge Deliverable)\n\nThe challenge asks: *\"Produce a list of market price per square meter by property type for the top 10 biggest cities.\"*\n\nDVF has 4 `type_local` values. We keep only the 2 residential types per Carlos's feedback:\n\n| type_local | Description | Included? | Reason |\n|---|---|---|---|\n| **Appartement** | Apartments / flats | Yes | Residential |\n| **Maison** | Houses | Yes | Residential |\n| Dependance | Garages, cellars, parking | No | Not a dwelling |\n| Local industriel, commercial ou assimile | Shops, offices, warehouses | No | Commercial property |\n\n**\"tous\"** = combined WTM across both types, weighted by actual transaction volume (not a simple average of apt + maison). This matters because the mix varies enormously: Paris is ~99.5% apartments, while smaller cities have 70-80% houses."
1514
  ],
1515
  "metadata": {}
1516
  },
 
650
  {
651
  "cell_type": "markdown",
652
  "metadata": {},
653
+ "source": [
654
+ "[↑ Back to Table of Contents](#Table-of-Contents)\n\n## 4. Temporal Weighting Analysis\n\n### Why temporal weighting?\n\nThe challenge requires us to *\"make the best estimation of the market price taking into account transaction price volatility, transaction volume, data freshness and consistency.\"*\n\nThe French property market has changed significantly over our data period (2020-2025):\n- **2020-2021**: COVID-era boom (+6-7% per year nationally)\n- **2022**: Rising interest rates begin slowing the market\n- **2023-2024**: Price correction (-1.8% to -5.2% YoY in some areas)\n- **2025**: Stabilization\n\nA simple average would mix boom prices with correction prices equally. Temporal weighting solves this by making recent transactions count more.\n\n### Our approach: Exponential decay\n\nEach transaction receives a weight based on how recent it is:\n\n```\nweight = Ξ»^(months_since_reference_date)\n```\n\nWhere `Ξ» = 0.97` and `reference_date = 2025-01-01`. This gives a **half-life of ~23 months** β€” a transaction from 2 years ago contributes half as much as a transaction from today. See Section 11.1 for detailed justification of why Ξ» = 0.97.\n\nThe plots below show:\n1. **Distribution of weights** across all transactions\n2. **Distribution of transaction ages** (months since reference)\n3. **The exponential decay curve** β€” how weight decreases with age"
655
+ ]
656
  },
657
  {
658
  "cell_type": "code",
 
1505
  {
1506
  "cell_type": "markdown",
1507
  "source": [
1508
+ "[↑ Back to Table of Contents](#Table-of-Contents)\n\n## 9. Challenge Requirements: Questions & Answers\n\nThe MLE challenge specifies evaluation criteria. Below we address each one with evidence from our pipeline.\n\n**Challenge context:** *\"You are analysing residential property prices in France [...] to estimate the current market price as price per squared meter (€/mΒ²)\"*\n\n**Key design decisions based on the challenge requirements:**\n- **Time frame**: The challenge specifies using available transaction data with a method that accounts for data freshness β†’ We use exponential temporal decay (Ξ»=0.97/month)\n- **Property types**: The challenge focuses on residential properties β†’ Appartement + Maison only (no commercial, no dΓ©pendance)\n- **Price calculation**: The challenge asks for a robust estimation method β†’ Time-weighted trimmed mean with confidence scoring\n- **Missing data**: Alsace-Moselle (57, 67, 68) + Mayotte (976) have no DVF data (different land registry system) β†’ excluded\n- **Top 10 cities**: By population β†’ 9 cities shown (Strasbourg excluded β€” no DVF data, it's in Alsace)\n\n| # | Evaluation Criterion | Status | Evidence |\n|---|---------------------|--------|----------|\n| 1 | Is the colored map loading? | βœ… | MapLibre GL JS choropleth with color scale |\n| 2 | Is the map usable and not laggy? | βœ… | Pre-computed static JSON, vector tiles from gov servers, lazy-loaded sections |\n| 3 | Is the map refreshing aggregation level on zoom? | βœ… | 6 zoom thresholds auto-switch layers |\n| 4 | Are all 6 aggregation levels present? | βœ… | Country (1) β†’ Region (17) β†’ Department (97) β†’ Commune (33,244) β†’ Postcode (5,861) β†’ Section (260,219) |\n| 5 | Are the price estimates plausible? | βœ… | Validated below against DVF official + RealAdvisor |\n| 6 | Is the data complete or was it subset? | βœ… | Full 2020-2025 dataset (4.65M transactions), only Alsace-Moselle excluded (no DVF data) |\n| 7 | Processing code: clean, clear, reusable | βœ… | Modular Python: config β†’ downloader β†’ cleaner β†’ aggregator β†’ top_cities β†’ pipeline |\n| 8 | Architecture: robust and logical | βœ… | Polars lazy eval, streaming downloads, per-department section splitting |\n| 9 | App is hosted and functional | βœ… | Hugging Face Spaces (Docker + FastAPI) |\n| 10 | Top 10 cities by property type | βœ… | See Section 10 below |"
1509
  ],
1510
  "metadata": {}
1511
  },
1512
  {
1513
  "cell_type": "markdown",
1514
  "source": [
1515
+ "[↑ Back to Table of Contents](#Table-of-Contents)\n\n## 10. Top 10 Cities: Price by Property Type (Challenge Deliverable)\n\nThe challenge asks: *\"Produce a list of market price per square meter by property type for the top 10 biggest cities.\"*\n\nDVF has 4 `type_local` values. We keep only the 2 residential types as the challenge specifies residential properties:\n\n| type_local | Description | Included? | Reason |\n|---|---|---|---|\n| **Appartement** | Apartments / flats | Yes | Residential |\n| **Maison** | Houses | Yes | Residential |\n| Dependance | Garages, cellars, parking | No | Not a dwelling |\n| Local industriel, commercial ou assimile | Shops, offices, warehouses | No | Commercial property |\n\n**\"tous\"** = combined WTM across both types, weighted by actual transaction volume (not a simple average of apt + maison). This matters because the mix varies enormously: Paris is ~99.5% apartments, while smaller cities have 70-80% houses."
1516
  ],
1517
  "metadata": {}
1518
  },