# DWD Clean URL Architecture & SEO System This document describes the path-based URL system implemented for the DWD section of Climate Explorer. It serves as a **reference template** for implementing clean URLs on other sections of the site. ## URL Structure ``` /dwd/{resolution}/{state?}/{station?}/?{view}&{start}&{end} ``` ### Path Segments | Segment | Required | Example | Description | |---------|----------|---------|-------------| | `resolution` | Yes (defaults to `daily`) | `hourly` | Time resolution slug | | `state` | No | `bayern` | German state (Bundesland) slug | | `station` | No | `muenchen-flughafen` | Station name slug | ### Query Parameters (UI state only — not indexed) | Param | Default | Example | Description | |-------|---------|---------|-------------| | `view` | `map` | `dashboard-plots` | Active tab | | `start` | Resolution default | `2020-01-01` | Date range start | | `end` | Resolution default | `2026-04-26` | Date range end | ### URL Examples ``` # Base landing page (defaults to Daily) /dwd/ # Resolution pages /dwd/daily/ /dwd/hourly/ /dwd/10-minutes/ /dwd/monthly/ /dwd/annual/ # State pages /dwd/daily/bayern/ /dwd/hourly/sachsen/ /dwd/10-minutes/nordrhein-westfalen/ # Station pages /dwd/daily/bayern/muenchen-flughafen/ /dwd/hourly/sachsen/leipzig-holzhausen/ # With UI state (query params) /dwd/daily/bayern/muenchen-flughafen/?view=dashboard-plots&start=2020-01-01&end=2026-04-26 ``` ## Resolution Slugs | UI Label | URL Slug | Shiny Internal Value | |----------|----------|---------------------| | 10 Minutes | `10-minutes` | `10_minutes` | | Hourly | `hourly` | `hourly` | | Daily | `daily` | `daily` | | Monthly | `monthly` | `monthly` | | Annual | `annual` | `annual` | ## Slugify Algorithm State and station names are slugified using the same algorithm across all three layers (R, JS, Edge Function): ``` 1. Replace German umlauts: ü→ue, ö→oe, ä→ae, Ü→ue, Ö→oe, Ä→ae, ß→ss 2. Lowercase 3. Strip diacritics (R uses iconv ASCII//TRANSLIT; JS/TS use NFD + regex) 4. Replace non-alphanumeric chars with hyphens 5. Trim leading/trailing hyphens ``` Examples: - `München-Flughafen` → `muenchen-flughafen` - `Nordrhein-Westfalen` → `nordrhein-westfalen` - `Thüringen` → `thueringen` - `Baden-Württemberg` → `baden-wuerttemberg` > **Critical**: The slugify function must produce identical output in R (`scripts/export_seo_metadata.R`), JavaScript (`dwd-page.js`), and TypeScript (`rewrite-meta.ts`). Any mismatch causes 404s or broken links. Note that R uses `iconv(..., to = "ASCII//TRANSLIT")` while JS/TS use `NFD normalize + strip combining marks` — both produce the same result for German text. ## System Architecture The URL system spans four layers: ``` ┌─────────────────────────────────────────────────────┐ │ 1. SEO Metadata (Build Time) │ │ R script → dwd-seo-metadata.json │ │ Generates slug→metadata mappings for all │ │ stations, states, and resolutions │ ├─────────────────────────────────────────────────────┤ │ 2. Edge Function (Request Time) │ │ rewrite-meta.ts │ │ Parses URL → injects HTML body content, │ │ meta tags, JSON-LD, canonical URL │ ├─────────────────────────────────────────────────────┤ │ 3. Parent Page JS (Client Side) │ │ dwd-page.js │ │ Parses URL → configures iframe, │ │ listens to Shiny broadcasts → updates URL, │ │ title, and dynamic context block │ ├─────────────────────────────────────────────────────┤ │ 4. Shiny App (Iframe) │ │ server.R │ │ Receives URL params → broadcasts state │ │ changes via postMessage to parent page │ └─────────────────────────────────────────────────────┘ ``` ### 1. SEO Metadata Generation (Build Time) **Script**: `scripts/export_seo_metadata.R` (in the DWD project) **Output**: `dwd-seo-metadata.json` (in `climateexplorer/netlify/edge-functions/`) This R script reads all 5 resolution RDS cache files and generates a JSON file containing: - **Stations**: `{resolution}/{state-slug}/{station-slug}` → `{id, name, state, stateSlug, elevation, lat, lon, resolution, resolutionLabel, resolutionSlug, overallStart, overallEnd, availableParams}` - **States**: `{resolution}/{state-slug}` → `{state, stateSlug, resolution, resolutionLabel, resolutionSlug, stationCount, activeStationCount}` - **Resolutions**: `{resolution-slug}` → `{key, label, slug, stationCount, activeStationCount}` - **Slug map**: display name → slug (for legacy URL redirect lookups) To regenerate the metadata: ```bash # Run from the DWD app root directory (clima/2025/dwd/) Rscript scripts/export_seo_metadata.R # Copy the output to the climateexplorer project (clima/2024/climateexplorer/) cp dwd-seo-metadata.json ../../2024/climateexplorer/netlify/edge-functions/ ``` ### 2. Edge Function (Request Time) **File**: `climateexplorer/netlify/edge-functions/rewrite-meta.ts` When a request hits `/dwd/{resolution}/{state?}/{station?}/`: 1. `parseDwdPath()` extracts path segments 2. Looks up metadata from `dwd-seo-metadata.json` 3. Injects into the HTML response: - **``** — e.g., `"München-Flughafen, Bayern – Daily Climate Data | DWD Explorer"` - **`<meta name="description">`** — station-specific description - **`<link rel="canonical">`** — canonical URL - **OG/Twitter meta tags** - **JSON-LD breadcrumb** — structured data for Google - **Body content** (`<div id="dynamic-context">`) — rich HTML with station details, state lists, or country overview - **`window.__DWD_RESOLVED__`** — resolved metadata for the JS layer This is **server-side rendered** — Google sees full content without executing JavaScript. ### 3. Parent Page JavaScript (Client Side) **File**: `climateexplorer/dwd/dwd-page.js` On page load: 1. `parsePathParams()` extracts resolution/state/station from the URL 2. If `/dwd/` (no resolution), defaults to "Daily" 3. Builds iframe URL with Shiny query params 4. Uses `__DWD_RESOLVED__` metadata (from edge function) to pass real station IDs/names to iframe On Shiny state changes (via `postMessage`): 1. `handleIframeMessage()` receives broadcast from iframe 2. `updateBrowserUrl()` updates the browser URL (using `history.replaceState`) 3. `updatePageTitle()` updates the browser tab title 4. `updateDynamicContext()` updates the context block HTML ### 4. Shiny App Broadcasts (Iframe) **File**: `server.R` (in the DWD project) The `broadcast_state()` function sends a `postMessage` to the parent page with: ```r list( station = station_id, stationName = station_name, landname = state_name, # German state resolution = resolution, # UI label (e.g., "Daily") view = active_view, start = start_date, end = end_date, countryStationCount = ..., # Total stations for this resolution countryActiveCount = ..., # Active in current date range countryStateList = ..., # State breakdown for context block ... ) ``` **Broadcast triggers** (observers in server.R): 1. Tab/view changes 2. Station selection changes 3. Station deselection 4. Resolution changes 5. Date range changes 6. State filter changes (`ignoreNULL = FALSE` — fires on clear) ## Sitemap Integration ### indexed-pages.json **File**: `climateexplorer/netlify/edge-functions/indexed-pages.json` Defines the curated URLs to include in `sitemap.xml`: ```json { "/dwd": { "stations": [ { "path": "daily/bayern/muenchen-flughafen" }, { "path": "daily/sachsen/leipzig-holzhausen" } ], "regions": [ { "path": "daily/bayern" }, { "path": "daily/berlin" } ], "resolutions": [ { "path": "daily" }, { "path": "hourly" }, { "path": "10-minutes" } ] } } ``` ### Sitemap Normalization **Script**: `climateexplorer/scripts/normalize-sitemap.mjs` Runs after `quarto render` to inject curated URLs into `sitemap.xml`. The validator (`scripts/lib/indexed-pages-validator.mjs`) auto-approves path-based entries (entries with a `path` field). ### Google Discovery Chain The sitemap contains ~36 DWD seed URLs. Google discovers all other pages through internal links: ``` Sitemap: 5 resolution pages ──→ Each links to 16 states │ ┌────────────────────┘ ▼ 16 state pages ──→ Each links to all stations in that state │ ┌────────────────────┘ ▼ ~1400 station pages (per resolution) ``` Total discoverable pages: **~5,000+** across all resolutions. ## Legacy URL Redirect Old query-parameter URLs are automatically redirected to clean paths: ``` /dwd/?resolution=Daily&landname=Bayern&station=München-Flughafen → 301 redirect → /dwd/daily/bayern/muenchen-flughafen/ ``` Handled by the edge function's "DWD Legacy Query-Param Redirect" block. ## Applying to Other Sections To implement this pattern for another section (e.g., `/meteofrance/`, `/jma/`): ### 1. Define the URL hierarchy ``` /{section}/{resolution}/{region?}/{station?}/ ``` Choose meaningful slugs for resolutions, regions (departments, prefectures, countries), and stations. ### 2. Create SEO metadata Write an R script to generate `{section}-seo-metadata.json` with: - Station metadata (name, region, coordinates, data range, parameters) - Region metadata (station counts) - Resolution metadata (station counts) - Slug map (display name → URL slug) ### 3. Update the edge function Add a `parse{Section}Path()` function and inject body content + meta tags. ### 4. Create the page JavaScript Write a `{section}-page.js` that: - Parses path segments on load - Configures the iframe with Shiny query params - Listens for postMessage broadcasts and updates URL/title/context ### 5. Update Shiny's broadcast_state() Ensure the Shiny app sends state/region/station names in its broadcasts so the JS can construct correct URLs. ### 6. Update indexed-pages.json Add curated seed URLs for the section's resolutions, regions, and sample stations. ### 7. Verify ```bash # Run sitemap normalization QUARTO_PROJECT_OUTPUT_DIR=_site node scripts/normalize-sitemap.mjs # Run sitemap checks QUARTO_PROJECT_OUTPUT_DIR=_site node scripts/check-sitemap.mjs --fetch # Test edge function locally netlify dev ``` ## Key Files Reference | File | Location | Purpose | |------|----------|---------| | `export_seo_metadata.R` | `dwd/scripts/` | Generate SEO metadata JSON | | `dwd-seo-metadata.json` | `climateexplorer/netlify/edge-functions/` | Station/state/resolution metadata | | `rewrite-meta.ts` | `climateexplorer/netlify/edge-functions/` | Edge function (SSR injection) | | `dwd-page.js` | `climateexplorer/dwd/` | Client-side URL sync | | `server.R` | `dwd/` | Shiny broadcast_state() | | `indexed-pages.json` | `climateexplorer/netlify/edge-functions/` | Sitemap seed URLs | | `normalize-sitemap.mjs` | `climateexplorer/scripts/` | Sitemap URL injection | | `indexed-pages-validator.mjs` | `climateexplorer/scripts/lib/` | Validates curated URLs |