| # DWD Clean URL Architecture & SEO System |
|
|
| This document describes the path-based URL system implemented for the DWD section of Climate Explorer. It serves as a **reference template** for implementing clean URLs on other sections of the site. |
|
|
| ## URL Structure |
|
|
| ``` |
| /dwd/{resolution}/{state?}/{station?}/?{view}&{start}&{end} |
| ``` |
|
|
| ### Path Segments |
|
|
| | Segment | Required | Example | Description | |
| |---------|----------|---------|-------------| |
| | `resolution` | Yes (defaults to `daily`) | `hourly` | Time resolution slug | |
| | `state` | No | `bayern` | German state (Bundesland) slug | |
| | `station` | No | `muenchen-flughafen` | Station name slug | |
|
|
| ### Query Parameters (UI state only β not indexed) |
|
|
| | Param | Default | Example | Description | |
| |-------|---------|---------|-------------| |
| | `view` | `map` | `dashboard-plots` | Active tab | |
| | `start` | Resolution default | `2020-01-01` | Date range start | |
| | `end` | Resolution default | `2026-04-26` | Date range end | |
|
|
| ### URL Examples |
|
|
| ``` |
| # Base landing page (defaults to Daily) |
| /dwd/ |
| |
| # Resolution pages |
| /dwd/daily/ |
| /dwd/hourly/ |
| /dwd/10-minutes/ |
| /dwd/monthly/ |
| /dwd/annual/ |
| |
| # State pages |
| /dwd/daily/bayern/ |
| /dwd/hourly/sachsen/ |
| /dwd/10-minutes/nordrhein-westfalen/ |
| |
| # Station pages |
| /dwd/daily/bayern/muenchen-flughafen/ |
| /dwd/hourly/sachsen/leipzig-holzhausen/ |
| |
| # With UI state (query params) |
| /dwd/daily/bayern/muenchen-flughafen/?view=dashboard-plots&start=2020-01-01&end=2026-04-26 |
| ``` |
|
|
| ## Resolution Slugs |
|
|
| | UI Label | URL Slug | Shiny Internal Value | |
| |----------|----------|---------------------| |
| | 10 Minutes | `10-minutes` | `10_minutes` | |
| | Hourly | `hourly` | `hourly` | |
| | Daily | `daily` | `daily` | |
| | Monthly | `monthly` | `monthly` | |
| | Annual | `annual` | `annual` | |
|
|
| ## Slugify Algorithm |
|
|
| State and station names are slugified using the same algorithm across all three layers (R, JS, Edge Function): |
|
|
| ``` |
| 1. Replace German umlauts: ΓΌβue, ΓΆβoe, Γ€βae, Γβue, Γβoe, Γβae, Γβss |
| 2. Lowercase |
| 3. Strip diacritics (R uses iconv ASCII//TRANSLIT; JS/TS use NFD + regex) |
| 4. Replace non-alphanumeric chars with hyphens |
| 5. Trim leading/trailing hyphens |
| ``` |
|
|
| Examples: |
| - `MΓΌnchen-Flughafen` β `muenchen-flughafen` |
| - `Nordrhein-Westfalen` β `nordrhein-westfalen` |
| - `ThΓΌringen` β `thueringen` |
| - `Baden-WΓΌrttemberg` β `baden-wuerttemberg` |
|
|
| > **Critical**: The slugify function must produce identical output in R (`scripts/export_seo_metadata.R`), JavaScript (`dwd-page.js`), and TypeScript (`rewrite-meta.ts`). Any mismatch causes 404s or broken links. Note that R uses `iconv(..., to = "ASCII//TRANSLIT")` while JS/TS use `NFD normalize + strip combining marks` β both produce the same result for German text. |
|
|
| ## System Architecture |
|
|
| The URL system spans four layers: |
|
|
| ``` |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| β 1. SEO Metadata (Build Time) β |
| β R script β dwd-seo-metadata.json β |
| β Generates slugβmetadata mappings for all β |
| β stations, states, and resolutions β |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ |
| β 2. Edge Function (Request Time) β |
| β rewrite-meta.ts β |
| β Parses URL β injects HTML body content, β |
| β meta tags, JSON-LD, canonical URL β |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ |
| β 3. Parent Page JS (Client Side) β |
| β dwd-page.js β |
| β Parses URL β configures iframe, β |
| β listens to Shiny broadcasts β updates URL, β |
| β title, and dynamic context block β |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ |
| β 4. Shiny App (Iframe) β |
| β server.R β |
| β Receives URL params β broadcasts state β |
| β changes via postMessage to parent page β |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| ``` |
|
|
| ### 1. SEO Metadata Generation (Build Time) |
|
|
| **Script**: `scripts/export_seo_metadata.R` (in the DWD project) |
| **Output**: `dwd-seo-metadata.json` (in `climateexplorer/netlify/edge-functions/`) |
|
|
| This R script reads all 5 resolution RDS cache files and generates a JSON file containing: |
| - **Stations**: `{resolution}/{state-slug}/{station-slug}` β `{id, name, state, stateSlug, elevation, lat, lon, resolution, resolutionLabel, resolutionSlug, overallStart, overallEnd, availableParams}` |
| - **States**: `{resolution}/{state-slug}` β `{state, stateSlug, resolution, resolutionLabel, resolutionSlug, stationCount, activeStationCount}` |
| - **Resolutions**: `{resolution-slug}` β `{key, label, slug, stationCount, activeStationCount}` |
| - **Slug map**: display name β slug (for legacy URL redirect lookups) |
|
|
| To regenerate the metadata: |
| ```bash |
| # Run from the DWD app root directory (clima/2025/dwd/) |
| Rscript scripts/export_seo_metadata.R |
| |
| # Copy the output to the climateexplorer project (clima/2024/climateexplorer/) |
| cp dwd-seo-metadata.json ../../2024/climateexplorer/netlify/edge-functions/ |
| ``` |
|
|
| ### 2. Edge Function (Request Time) |
|
|
| **File**: `climateexplorer/netlify/edge-functions/rewrite-meta.ts` |
|
|
| When a request hits `/dwd/{resolution}/{state?}/{station?}/`: |
|
|
| 1. `parseDwdPath()` extracts path segments |
| 2. Looks up metadata from `dwd-seo-metadata.json` |
| 3. Injects into the HTML response: |
| - **`<title>`** β e.g., `"MΓΌnchen-Flughafen, Bayern β Daily Climate Data | DWD Explorer"` |
| - **`<meta name="description">`** β station-specific description |
| - **`<link rel="canonical">`** β canonical URL |
| - **OG/Twitter meta tags** |
| - **JSON-LD breadcrumb** β structured data for Google |
| - **Body content** (`<div id="dynamic-context">`) β rich HTML with station details, state lists, or country overview |
| - **`window.__DWD_RESOLVED__`** β resolved metadata for the JS layer |
| |
| This is **server-side rendered** β Google sees full content without executing JavaScript. |
| |
| ### 3. Parent Page JavaScript (Client Side) |
| |
| **File**: `climateexplorer/dwd/dwd-page.js` |
| |
| On page load: |
| 1. `parsePathParams()` extracts resolution/state/station from the URL |
| 2. If `/dwd/` (no resolution), defaults to "Daily" |
| 3. Builds iframe URL with Shiny query params |
| 4. Uses `__DWD_RESOLVED__` metadata (from edge function) to pass real station IDs/names to iframe |
| |
| On Shiny state changes (via `postMessage`): |
| 1. `handleIframeMessage()` receives broadcast from iframe |
| 2. `updateBrowserUrl()` updates the browser URL (using `history.replaceState`) |
| 3. `updatePageTitle()` updates the browser tab title |
| 4. `updateDynamicContext()` updates the context block HTML |
| |
| ### 4. Shiny App Broadcasts (Iframe) |
| |
| **File**: `server.R` (in the DWD project) |
| |
| The `broadcast_state()` function sends a `postMessage` to the parent page with: |
| ```r |
| list( |
| station = station_id, |
| stationName = station_name, |
| landname = state_name, # German state |
| resolution = resolution, # UI label (e.g., "Daily") |
| view = active_view, |
| start = start_date, |
| end = end_date, |
| countryStationCount = ..., # Total stations for this resolution |
| countryActiveCount = ..., # Active in current date range |
| countryStateList = ..., # State breakdown for context block |
| ... |
| ) |
| ``` |
| |
| **Broadcast triggers** (observers in server.R): |
| 1. Tab/view changes |
| 2. Station selection changes |
| 3. Station deselection |
| 4. Resolution changes |
| 5. Date range changes |
| 6. State filter changes (`ignoreNULL = FALSE` β fires on clear) |
|
|
| ## Sitemap Integration |
|
|
| ### indexed-pages.json |
|
|
| **File**: `climateexplorer/netlify/edge-functions/indexed-pages.json` |
|
|
| Defines the curated URLs to include in `sitemap.xml`: |
|
|
| ```json |
| { |
| "/dwd": { |
| "stations": [ |
| { "path": "daily/bayern/muenchen-flughafen" }, |
| { "path": "daily/sachsen/leipzig-holzhausen" } |
| ], |
| "regions": [ |
| { "path": "daily/bayern" }, |
| { "path": "daily/berlin" } |
| ], |
| "resolutions": [ |
| { "path": "daily" }, |
| { "path": "hourly" }, |
| { "path": "10-minutes" } |
| ] |
| } |
| } |
| ``` |
|
|
| ### Sitemap Normalization |
|
|
| **Script**: `climateexplorer/scripts/normalize-sitemap.mjs` |
|
|
| Runs after `quarto render` to inject curated URLs into `sitemap.xml`. The validator (`scripts/lib/indexed-pages-validator.mjs`) auto-approves path-based entries (entries with a `path` field). |
|
|
| ### Google Discovery Chain |
|
|
| The sitemap contains ~36 DWD seed URLs. Google discovers all other pages through internal links: |
|
|
| ``` |
| Sitemap: 5 resolution pages βββ Each links to 16 states |
| β |
| ββββββββββββββββββββββ |
| βΌ |
| 16 state pages βββ Each links to all stations in that state |
| β |
| ββββββββββββββββββββββ |
| βΌ |
| ~1400 station pages (per resolution) |
| ``` |
|
|
| Total discoverable pages: **~5,000+** across all resolutions. |
|
|
| ## Legacy URL Redirect |
|
|
| Old query-parameter URLs are automatically redirected to clean paths: |
|
|
| ``` |
| /dwd/?resolution=Daily&landname=Bayern&station=MΓΌnchen-Flughafen |
| β 301 redirect β |
| /dwd/daily/bayern/muenchen-flughafen/ |
| ``` |
|
|
| Handled by the edge function's "DWD Legacy Query-Param Redirect" block. |
|
|
| ## Applying to Other Sections |
|
|
| To implement this pattern for another section (e.g., `/meteofrance/`, `/jma/`): |
|
|
| ### 1. Define the URL hierarchy |
|
|
| ``` |
| /{section}/{resolution}/{region?}/{station?}/ |
| ``` |
|
|
| Choose meaningful slugs for resolutions, regions (departments, prefectures, countries), and stations. |
|
|
| ### 2. Create SEO metadata |
|
|
| Write an R script to generate `{section}-seo-metadata.json` with: |
| - Station metadata (name, region, coordinates, data range, parameters) |
| - Region metadata (station counts) |
| - Resolution metadata (station counts) |
| - Slug map (display name β URL slug) |
|
|
| ### 3. Update the edge function |
|
|
| Add a `parse{Section}Path()` function and inject body content + meta tags. |
|
|
| ### 4. Create the page JavaScript |
|
|
| Write a `{section}-page.js` that: |
| - Parses path segments on load |
| - Configures the iframe with Shiny query params |
| - Listens for postMessage broadcasts and updates URL/title/context |
|
|
| ### 5. Update Shiny's broadcast_state() |
| |
| Ensure the Shiny app sends state/region/station names in its broadcasts so the JS can construct correct URLs. |
| |
| ### 6. Update indexed-pages.json |
| |
| Add curated seed URLs for the section's resolutions, regions, and sample stations. |
| |
| ### 7. Verify |
| |
| ```bash |
| # Run sitemap normalization |
| QUARTO_PROJECT_OUTPUT_DIR=_site node scripts/normalize-sitemap.mjs |
| |
| # Run sitemap checks |
| QUARTO_PROJECT_OUTPUT_DIR=_site node scripts/check-sitemap.mjs --fetch |
| |
| # Test edge function locally |
| netlify dev |
| ``` |
| |
| ## Key Files Reference |
| |
| | File | Location | Purpose | |
| |------|----------|---------| |
| | `export_seo_metadata.R` | `dwd/scripts/` | Generate SEO metadata JSON | |
| | `dwd-seo-metadata.json` | `climateexplorer/netlify/edge-functions/` | Station/state/resolution metadata | |
| | `rewrite-meta.ts` | `climateexplorer/netlify/edge-functions/` | Edge function (SSR injection) | |
| | `dwd-page.js` | `climateexplorer/dwd/` | Client-side URL sync | |
| | `server.R` | `dwd/` | Shiny broadcast_state() | |
| | `indexed-pages.json` | `climateexplorer/netlify/edge-functions/` | Sitemap seed URLs | |
| | `normalize-sitemap.mjs` | `climateexplorer/scripts/` | Sitemap URL injection | |
| | `indexed-pages-validator.mjs` | `climateexplorer/scripts/lib/` | Validates curated URLs | |
|
|