File size: 12,147 Bytes
dc6aa34
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c92100c
dc6aa34
c92100c
dc6aa34
 
 
 
 
 
 
 
 
 
c92100c
dc6aa34
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c92100c
 
 
 
 
 
 
 
53de293
c92100c
 
53de293
 
c92100c
dc6aa34
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
# DWD Clean URL Architecture & SEO System

This document describes the path-based URL system implemented for the DWD section of Climate Explorer. It serves as a **reference template** for implementing clean URLs on other sections of the site.

## URL Structure

```
/dwd/{resolution}/{state?}/{station?}/?{view}&{start}&{end}
```

### Path Segments

| Segment | Required | Example | Description |
|---------|----------|---------|-------------|
| `resolution` | Yes (defaults to `daily`) | `hourly` | Time resolution slug |
| `state` | No | `bayern` | German state (Bundesland) slug |
| `station` | No | `muenchen-flughafen` | Station name slug |

### Query Parameters (UI state only β€” not indexed)

| Param | Default | Example | Description |
|-------|---------|---------|-------------|
| `view` | `map` | `dashboard-plots` | Active tab |
| `start` | Resolution default | `2020-01-01` | Date range start |
| `end` | Resolution default | `2026-04-26` | Date range end |

### URL Examples

```
# Base landing page (defaults to Daily)
/dwd/

# Resolution pages
/dwd/daily/
/dwd/hourly/
/dwd/10-minutes/
/dwd/monthly/
/dwd/annual/

# State pages
/dwd/daily/bayern/
/dwd/hourly/sachsen/
/dwd/10-minutes/nordrhein-westfalen/

# Station pages
/dwd/daily/bayern/muenchen-flughafen/
/dwd/hourly/sachsen/leipzig-holzhausen/

# With UI state (query params)
/dwd/daily/bayern/muenchen-flughafen/?view=dashboard-plots&start=2020-01-01&end=2026-04-26
```

## Resolution Slugs

| UI Label | URL Slug | Shiny Internal Value |
|----------|----------|---------------------|
| 10 Minutes | `10-minutes` | `10_minutes` |
| Hourly | `hourly` | `hourly` |
| Daily | `daily` | `daily` |
| Monthly | `monthly` | `monthly` |
| Annual | `annual` | `annual` |

## Slugify Algorithm

State and station names are slugified using the same algorithm across all three layers (R, JS, Edge Function):

```
1. Replace German umlauts:  ΓΌβ†’ue, ΓΆβ†’oe, Γ€β†’ae, Γœβ†’ue, Γ–β†’oe, Γ„β†’ae, ΓŸβ†’ss
2. Lowercase
3. Strip diacritics (R uses iconv ASCII//TRANSLIT; JS/TS use NFD + regex)
4. Replace non-alphanumeric chars with hyphens
5. Trim leading/trailing hyphens
```

Examples:
- `MΓΌnchen-Flughafen` β†’ `muenchen-flughafen`
- `Nordrhein-Westfalen` β†’ `nordrhein-westfalen`
- `ThΓΌringen` β†’ `thueringen`
- `Baden-WΓΌrttemberg` β†’ `baden-wuerttemberg`

> **Critical**: The slugify function must produce identical output in R (`scripts/export_seo_metadata.R`), JavaScript (`dwd-page.js`), and TypeScript (`rewrite-meta.ts`). Any mismatch causes 404s or broken links. Note that R uses `iconv(..., to = "ASCII//TRANSLIT")` while JS/TS use `NFD normalize + strip combining marks` β€” both produce the same result for German text.

## System Architecture

The URL system spans four layers:

```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  1. SEO Metadata (Build Time)                       β”‚
β”‚     R script β†’ dwd-seo-metadata.json                β”‚
│     Generates slug→metadata mappings for all        │
β”‚     stations, states, and resolutions                β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  2. Edge Function (Request Time)                    β”‚
β”‚     rewrite-meta.ts                                 β”‚
β”‚     Parses URL β†’ injects HTML body content,         β”‚
β”‚     meta tags, JSON-LD, canonical URL               β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  3. Parent Page JS (Client Side)                    β”‚
β”‚     dwd-page.js                                     β”‚
β”‚     Parses URL β†’ configures iframe,                 β”‚
β”‚     listens to Shiny broadcasts β†’ updates URL,      β”‚
β”‚     title, and dynamic context block                 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  4. Shiny App (Iframe)                              β”‚
β”‚     server.R                                        β”‚
β”‚     Receives URL params β†’ broadcasts state          β”‚
β”‚     changes via postMessage to parent page           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

### 1. SEO Metadata Generation (Build Time)

**Script**: `scripts/export_seo_metadata.R` (in the DWD project)
**Output**: `dwd-seo-metadata.json` (in `climateexplorer/netlify/edge-functions/`)

This R script reads all 5 resolution RDS cache files and generates a JSON file containing:
- **Stations**: `{resolution}/{state-slug}/{station-slug}` β†’ `{id, name, state, stateSlug, elevation, lat, lon, resolution, resolutionLabel, resolutionSlug, overallStart, overallEnd, availableParams}`
- **States**: `{resolution}/{state-slug}` β†’ `{state, stateSlug, resolution, resolutionLabel, resolutionSlug, stationCount, activeStationCount}`
- **Resolutions**: `{resolution-slug}` β†’ `{key, label, slug, stationCount, activeStationCount}`
- **Slug map**: display name β†’ slug (for legacy URL redirect lookups)

To regenerate the metadata:
```bash
# Run from the DWD app root directory (clima/2025/dwd/)
Rscript scripts/export_seo_metadata.R

# Copy the output to the climateexplorer project (clima/2024/climateexplorer/)
cp dwd-seo-metadata.json ../../2024/climateexplorer/netlify/edge-functions/
```

### 2. Edge Function (Request Time)

**File**: `climateexplorer/netlify/edge-functions/rewrite-meta.ts`

When a request hits `/dwd/{resolution}/{state?}/{station?}/`:

1. `parseDwdPath()` extracts path segments
2. Looks up metadata from `dwd-seo-metadata.json`
3. Injects into the HTML response:
   - **`<title>`** β€” e.g., `"MΓΌnchen-Flughafen, Bayern – Daily Climate Data | DWD Explorer"`
   - **`<meta name="description">`** β€” station-specific description
   - **`<link rel="canonical">`** β€” canonical URL
   - **OG/Twitter meta tags**
   - **JSON-LD breadcrumb** β€” structured data for Google
   - **Body content** (`<div id="dynamic-context">`) β€” rich HTML with station details, state lists, or country overview
   - **`window.__DWD_RESOLVED__`** β€” resolved metadata for the JS layer

This is **server-side rendered** β€” Google sees full content without executing JavaScript.

### 3. Parent Page JavaScript (Client Side)

**File**: `climateexplorer/dwd/dwd-page.js`

On page load:
1. `parsePathParams()` extracts resolution/state/station from the URL
2. If `/dwd/` (no resolution), defaults to "Daily"
3. Builds iframe URL with Shiny query params
4. Uses `__DWD_RESOLVED__` metadata (from edge function) to pass real station IDs/names to iframe

On Shiny state changes (via `postMessage`):
1. `handleIframeMessage()` receives broadcast from iframe
2. `updateBrowserUrl()` updates the browser URL (using `history.replaceState`)
3. `updatePageTitle()` updates the browser tab title
4. `updateDynamicContext()` updates the context block HTML

### 4. Shiny App Broadcasts (Iframe)

**File**: `server.R` (in the DWD project)

The `broadcast_state()` function sends a `postMessage` to the parent page with:
```r
list(
    station = station_id,
    stationName = station_name,
    landname = state_name,        # German state
    resolution = resolution,       # UI label (e.g., "Daily")
    view = active_view,
    start = start_date,
    end = end_date,
    countryStationCount = ...,     # Total stations for this resolution
    countryActiveCount = ...,      # Active in current date range
    countryStateList = ...,        # State breakdown for context block
    ...
)
```

**Broadcast triggers** (observers in server.R):
1. Tab/view changes
2. Station selection changes
3. Station deselection
4. Resolution changes
5. Date range changes
6. State filter changes (`ignoreNULL = FALSE` β€” fires on clear)

## Sitemap Integration

### indexed-pages.json

**File**: `climateexplorer/netlify/edge-functions/indexed-pages.json`

Defines the curated URLs to include in `sitemap.xml`:

```json
{
  "/dwd": {
    "stations": [
      { "path": "daily/bayern/muenchen-flughafen" },
      { "path": "daily/sachsen/leipzig-holzhausen" }
    ],
    "regions": [
      { "path": "daily/bayern" },
      { "path": "daily/berlin" }
    ],
    "resolutions": [
      { "path": "daily" },
      { "path": "hourly" },
      { "path": "10-minutes" }
    ]
  }
}
```

### Sitemap Normalization

**Script**: `climateexplorer/scripts/normalize-sitemap.mjs`

Runs after `quarto render` to inject curated URLs into `sitemap.xml`. The validator (`scripts/lib/indexed-pages-validator.mjs`) auto-approves path-based entries (entries with a `path` field).

### Google Discovery Chain

The sitemap contains ~36 DWD seed URLs. Google discovers all other pages through internal links:

```
Sitemap: 5 resolution pages ──→ Each links to 16 states
                                         β”‚
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                    β–Ό
         16 state pages ──→ Each links to all stations in that state
                                         β”‚
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                    β–Ό
         ~1400 station pages (per resolution)
```

Total discoverable pages: **~5,000+** across all resolutions.

## Legacy URL Redirect

Old query-parameter URLs are automatically redirected to clean paths:

```
/dwd/?resolution=Daily&landname=Bayern&station=MΓΌnchen-Flughafen
  β†’ 301 redirect β†’
/dwd/daily/bayern/muenchen-flughafen/
```

Handled by the edge function's "DWD Legacy Query-Param Redirect" block.

## Applying to Other Sections

To implement this pattern for another section (e.g., `/meteofrance/`, `/jma/`):

### 1. Define the URL hierarchy

```
/{section}/{resolution}/{region?}/{station?}/
```

Choose meaningful slugs for resolutions, regions (departments, prefectures, countries), and stations.

### 2. Create SEO metadata

Write an R script to generate `{section}-seo-metadata.json` with:
- Station metadata (name, region, coordinates, data range, parameters)
- Region metadata (station counts)
- Resolution metadata (station counts)
- Slug map (display name β†’ URL slug)

### 3. Update the edge function

Add a `parse{Section}Path()` function and inject body content + meta tags.

### 4. Create the page JavaScript

Write a `{section}-page.js` that:
- Parses path segments on load
- Configures the iframe with Shiny query params
- Listens for postMessage broadcasts and updates URL/title/context

### 5. Update Shiny's broadcast_state()

Ensure the Shiny app sends state/region/station names in its broadcasts so the JS can construct correct URLs.

### 6. Update indexed-pages.json

Add curated seed URLs for the section's resolutions, regions, and sample stations.

### 7. Verify

```bash
# Run sitemap normalization
QUARTO_PROJECT_OUTPUT_DIR=_site node scripts/normalize-sitemap.mjs

# Run sitemap checks
QUARTO_PROJECT_OUTPUT_DIR=_site node scripts/check-sitemap.mjs --fetch

# Test edge function locally
netlify dev
```

## Key Files Reference

| File | Location | Purpose |
|------|----------|---------|
| `export_seo_metadata.R` | `dwd/scripts/` | Generate SEO metadata JSON |
| `dwd-seo-metadata.json` | `climateexplorer/netlify/edge-functions/` | Station/state/resolution metadata |
| `rewrite-meta.ts` | `climateexplorer/netlify/edge-functions/` | Edge function (SSR injection) |
| `dwd-page.js` | `climateexplorer/dwd/` | Client-side URL sync |
| `server.R` | `dwd/` | Shiny broadcast_state() |
| `indexed-pages.json` | `climateexplorer/netlify/edge-functions/` | Sitemap seed URLs |
| `normalize-sitemap.mjs` | `climateexplorer/scripts/` | Sitemap URL injection |
| `indexed-pages-validator.mjs` | `climateexplorer/scripts/lib/` | Validates curated URLs |