# DWD Clean URL Architecture & SEO System
This document describes the path-based URL system implemented for the DWD section of Climate Explorer. It serves as a **reference template** for implementing clean URLs on other sections of the site.
## URL Structure
```
/dwd/{resolution}/{state?}/{station?}/?{view}&{start}&{end}
```
### Path Segments
| Segment | Required | Example | Description |
|---------|----------|---------|-------------|
| `resolution` | Yes (defaults to `daily`) | `hourly` | Time resolution slug |
| `state` | No | `bayern` | German state (Bundesland) slug |
| `station` | No | `muenchen-flughafen` | Station name slug |
### Query Parameters (UI state only — not indexed)
| Param | Default | Example | Description |
|-------|---------|---------|-------------|
| `view` | `map` | `dashboard-plots` | Active tab |
| `start` | Resolution default | `2020-01-01` | Date range start |
| `end` | Resolution default | `2026-04-26` | Date range end |
### URL Examples
```
# Base landing page (defaults to Daily)
/dwd/
# Resolution pages
/dwd/daily/
/dwd/hourly/
/dwd/10-minutes/
/dwd/monthly/
/dwd/annual/
# State pages
/dwd/daily/bayern/
/dwd/hourly/sachsen/
/dwd/10-minutes/nordrhein-westfalen/
# Station pages
/dwd/daily/bayern/muenchen-flughafen/
/dwd/hourly/sachsen/leipzig-holzhausen/
# With UI state (query params)
/dwd/daily/bayern/muenchen-flughafen/?view=dashboard-plots&start=2020-01-01&end=2026-04-26
```
## Resolution Slugs
| UI Label | URL Slug | Shiny Internal Value |
|----------|----------|---------------------|
| 10 Minutes | `10-minutes` | `10_minutes` |
| Hourly | `hourly` | `hourly` |
| Daily | `daily` | `daily` |
| Monthly | `monthly` | `monthly` |
| Annual | `annual` | `annual` |
## Slugify Algorithm
State and station names are slugified using the same algorithm across all three layers (R, JS, Edge Function):
```
1. Replace German umlauts: ü→ue, ö→oe, ä→ae, Ü→ue, Ö→oe, Ä→ae, ß→ss
2. Lowercase
3. Strip diacritics (R uses iconv ASCII//TRANSLIT; JS/TS use NFD + regex)
4. Replace non-alphanumeric chars with hyphens
5. Trim leading/trailing hyphens
```
Examples:
- `München-Flughafen` → `muenchen-flughafen`
- `Nordrhein-Westfalen` → `nordrhein-westfalen`
- `Thüringen` → `thueringen`
- `Baden-Württemberg` → `baden-wuerttemberg`
> **Critical**: The slugify function must produce identical output in R (`scripts/export_seo_metadata.R`), JavaScript (`dwd-page.js`), and TypeScript (`rewrite-meta.ts`). Any mismatch causes 404s or broken links. Note that R uses `iconv(..., to = "ASCII//TRANSLIT")` while JS/TS use `NFD normalize + strip combining marks` — both produce the same result for German text.
## System Architecture
The URL system spans four layers:
```
┌─────────────────────────────────────────────────────┐
│ 1. SEO Metadata (Build Time) │
│ R script → dwd-seo-metadata.json │
│ Generates slug→metadata mappings for all │
│ stations, states, and resolutions │
├─────────────────────────────────────────────────────┤
│ 2. Edge Function (Request Time) │
│ rewrite-meta.ts │
│ Parses URL → injects HTML body content, │
│ meta tags, JSON-LD, canonical URL │
├─────────────────────────────────────────────────────┤
│ 3. Parent Page JS (Client Side) │
│ dwd-page.js │
│ Parses URL → configures iframe, │
│ listens to Shiny broadcasts → updates URL, │
│ title, and dynamic context block │
├─────────────────────────────────────────────────────┤
│ 4. Shiny App (Iframe) │
│ server.R │
│ Receives URL params → broadcasts state │
│ changes via postMessage to parent page │
└─────────────────────────────────────────────────────┘
```
### 1. SEO Metadata Generation (Build Time)
**Script**: `scripts/export_seo_metadata.R` (in the DWD project)
**Output**: `dwd-seo-metadata.json` (in `climateexplorer/netlify/edge-functions/`)
This R script reads all 5 resolution RDS cache files and generates a JSON file containing:
- **Stations**: `{resolution}/{state-slug}/{station-slug}` → `{id, name, state, stateSlug, elevation, lat, lon, resolution, resolutionLabel, resolutionSlug, overallStart, overallEnd, availableParams}`
- **States**: `{resolution}/{state-slug}` → `{state, stateSlug, resolution, resolutionLabel, resolutionSlug, stationCount, activeStationCount}`
- **Resolutions**: `{resolution-slug}` → `{key, label, slug, stationCount, activeStationCount}`
- **Slug map**: display name → slug (for legacy URL redirect lookups)
To regenerate the metadata:
```bash
# Run from the DWD app root directory (clima/2025/dwd/)
Rscript scripts/export_seo_metadata.R
# Copy the output to the climateexplorer project (clima/2024/climateexplorer/)
cp dwd-seo-metadata.json ../../2024/climateexplorer/netlify/edge-functions/
```
### 2. Edge Function (Request Time)
**File**: `climateexplorer/netlify/edge-functions/rewrite-meta.ts`
When a request hits `/dwd/{resolution}/{state?}/{station?}/`:
1. `parseDwdPath()` extracts path segments
2. Looks up metadata from `dwd-seo-metadata.json`
3. Injects into the HTML response:
- **`
`** — e.g., `"München-Flughafen, Bayern – Daily Climate Data | DWD Explorer"`
- **``** — station-specific description
- **``** — canonical URL
- **OG/Twitter meta tags**
- **JSON-LD breadcrumb** — structured data for Google
- **Body content** (`
`) — rich HTML with station details, state lists, or country overview
- **`window.__DWD_RESOLVED__`** — resolved metadata for the JS layer
This is **server-side rendered** — Google sees full content without executing JavaScript.
### 3. Parent Page JavaScript (Client Side)
**File**: `climateexplorer/dwd/dwd-page.js`
On page load:
1. `parsePathParams()` extracts resolution/state/station from the URL
2. If `/dwd/` (no resolution), defaults to "Daily"
3. Builds iframe URL with Shiny query params
4. Uses `__DWD_RESOLVED__` metadata (from edge function) to pass real station IDs/names to iframe
On Shiny state changes (via `postMessage`):
1. `handleIframeMessage()` receives broadcast from iframe
2. `updateBrowserUrl()` updates the browser URL (using `history.replaceState`)
3. `updatePageTitle()` updates the browser tab title
4. `updateDynamicContext()` updates the context block HTML
### 4. Shiny App Broadcasts (Iframe)
**File**: `server.R` (in the DWD project)
The `broadcast_state()` function sends a `postMessage` to the parent page with:
```r
list(
station = station_id,
stationName = station_name,
landname = state_name, # German state
resolution = resolution, # UI label (e.g., "Daily")
view = active_view,
start = start_date,
end = end_date,
countryStationCount = ..., # Total stations for this resolution
countryActiveCount = ..., # Active in current date range
countryStateList = ..., # State breakdown for context block
...
)
```
**Broadcast triggers** (observers in server.R):
1. Tab/view changes
2. Station selection changes
3. Station deselection
4. Resolution changes
5. Date range changes
6. State filter changes (`ignoreNULL = FALSE` — fires on clear)
## Sitemap Integration
### indexed-pages.json
**File**: `climateexplorer/netlify/edge-functions/indexed-pages.json`
Defines the curated URLs to include in `sitemap.xml`:
```json
{
"/dwd": {
"stations": [
{ "path": "daily/bayern/muenchen-flughafen" },
{ "path": "daily/sachsen/leipzig-holzhausen" }
],
"regions": [
{ "path": "daily/bayern" },
{ "path": "daily/berlin" }
],
"resolutions": [
{ "path": "daily" },
{ "path": "hourly" },
{ "path": "10-minutes" }
]
}
}
```
### Sitemap Normalization
**Script**: `climateexplorer/scripts/normalize-sitemap.mjs`
Runs after `quarto render` to inject curated URLs into `sitemap.xml`. The validator (`scripts/lib/indexed-pages-validator.mjs`) auto-approves path-based entries (entries with a `path` field).
### Google Discovery Chain
The sitemap contains ~36 DWD seed URLs. Google discovers all other pages through internal links:
```
Sitemap: 5 resolution pages ──→ Each links to 16 states
│
┌────────────────────┘
▼
16 state pages ──→ Each links to all stations in that state
│
┌────────────────────┘
▼
~1400 station pages (per resolution)
```
Total discoverable pages: **~5,000+** across all resolutions.
## Legacy URL Redirect
Old query-parameter URLs are automatically redirected to clean paths:
```
/dwd/?resolution=Daily&landname=Bayern&station=München-Flughafen
→ 301 redirect →
/dwd/daily/bayern/muenchen-flughafen/
```
Handled by the edge function's "DWD Legacy Query-Param Redirect" block.
## Applying to Other Sections
To implement this pattern for another section (e.g., `/meteofrance/`, `/jma/`):
### 1. Define the URL hierarchy
```
/{section}/{resolution}/{region?}/{station?}/
```
Choose meaningful slugs for resolutions, regions (departments, prefectures, countries), and stations.
### 2. Create SEO metadata
Write an R script to generate `{section}-seo-metadata.json` with:
- Station metadata (name, region, coordinates, data range, parameters)
- Region metadata (station counts)
- Resolution metadata (station counts)
- Slug map (display name → URL slug)
### 3. Update the edge function
Add a `parse{Section}Path()` function and inject body content + meta tags.
### 4. Create the page JavaScript
Write a `{section}-page.js` that:
- Parses path segments on load
- Configures the iframe with Shiny query params
- Listens for postMessage broadcasts and updates URL/title/context
### 5. Update Shiny's broadcast_state()
Ensure the Shiny app sends state/region/station names in its broadcasts so the JS can construct correct URLs.
### 6. Update indexed-pages.json
Add curated seed URLs for the section's resolutions, regions, and sample stations.
### 7. Verify
```bash
# Run sitemap normalization
QUARTO_PROJECT_OUTPUT_DIR=_site node scripts/normalize-sitemap.mjs
# Run sitemap checks
QUARTO_PROJECT_OUTPUT_DIR=_site node scripts/check-sitemap.mjs --fetch
# Test edge function locally
netlify dev
```
## Key Files Reference
| File | Location | Purpose |
|------|----------|---------|
| `export_seo_metadata.R` | `dwd/scripts/` | Generate SEO metadata JSON |
| `dwd-seo-metadata.json` | `climateexplorer/netlify/edge-functions/` | Station/state/resolution metadata |
| `rewrite-meta.ts` | `climateexplorer/netlify/edge-functions/` | Edge function (SSR injection) |
| `dwd-page.js` | `climateexplorer/dwd/` | Client-side URL sync |
| `server.R` | `dwd/` | Shiny broadcast_state() |
| `indexed-pages.json` | `climateexplorer/netlify/edge-functions/` | Sitemap seed URLs |
| `normalize-sitemap.mjs` | `climateexplorer/scripts/` | Sitemap URL injection |
| `indexed-pages-validator.mjs` | `climateexplorer/scripts/lib/` | Validates curated URLs |