dwd / docs /URL_ARCHITECTURE.md
alexdum's picture
docs: update metadata regeneration paths and directory references in URL_ARCHITECTURE.md
53de293

DWD Clean URL Architecture & SEO System

This document describes the path-based URL system implemented for the DWD section of Climate Explorer. It serves as a reference template for implementing clean URLs on other sections of the site.

URL Structure

/dwd/{resolution}/{state?}/{station?}/?{view}&{start}&{end}

Path Segments

Segment Required Example Description
resolution Yes (defaults to daily) hourly Time resolution slug
state No bayern German state (Bundesland) slug
station No muenchen-flughafen Station name slug

Query Parameters (UI state only β€” not indexed)

Param Default Example Description
view map dashboard-plots Active tab
start Resolution default 2020-01-01 Date range start
end Resolution default 2026-04-26 Date range end

URL Examples

# Base landing page (defaults to Daily)
/dwd/

# Resolution pages
/dwd/daily/
/dwd/hourly/
/dwd/10-minutes/
/dwd/monthly/
/dwd/annual/

# State pages
/dwd/daily/bayern/
/dwd/hourly/sachsen/
/dwd/10-minutes/nordrhein-westfalen/

# Station pages
/dwd/daily/bayern/muenchen-flughafen/
/dwd/hourly/sachsen/leipzig-holzhausen/

# With UI state (query params)
/dwd/daily/bayern/muenchen-flughafen/?view=dashboard-plots&start=2020-01-01&end=2026-04-26

Resolution Slugs

UI Label URL Slug Shiny Internal Value
10 Minutes 10-minutes 10_minutes
Hourly hourly hourly
Daily daily daily
Monthly monthly monthly
Annual annual annual

Slugify Algorithm

State and station names are slugified using the same algorithm across all three layers (R, JS, Edge Function):

1. Replace German umlauts:  ΓΌβ†’ue, ΓΆβ†’oe, Γ€β†’ae, Γœβ†’ue, Γ–β†’oe, Γ„β†’ae, ΓŸβ†’ss
2. Lowercase
3. Strip diacritics (R uses iconv ASCII//TRANSLIT; JS/TS use NFD + regex)
4. Replace non-alphanumeric chars with hyphens
5. Trim leading/trailing hyphens

Examples:

  • MΓΌnchen-Flughafen β†’ muenchen-flughafen
  • Nordrhein-Westfalen β†’ nordrhein-westfalen
  • ThΓΌringen β†’ thueringen
  • Baden-WΓΌrttemberg β†’ baden-wuerttemberg

Critical: The slugify function must produce identical output in R (scripts/export_seo_metadata.R), JavaScript (dwd-page.js), and TypeScript (rewrite-meta.ts). Any mismatch causes 404s or broken links. Note that R uses iconv(..., to = "ASCII//TRANSLIT") while JS/TS use NFD normalize + strip combining marks β€” both produce the same result for German text.

System Architecture

The URL system spans four layers:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  1. SEO Metadata (Build Time)                       β”‚
β”‚     R script β†’ dwd-seo-metadata.json                β”‚
│     Generates slug→metadata mappings for all        │
β”‚     stations, states, and resolutions                β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  2. Edge Function (Request Time)                    β”‚
β”‚     rewrite-meta.ts                                 β”‚
β”‚     Parses URL β†’ injects HTML body content,         β”‚
β”‚     meta tags, JSON-LD, canonical URL               β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  3. Parent Page JS (Client Side)                    β”‚
β”‚     dwd-page.js                                     β”‚
β”‚     Parses URL β†’ configures iframe,                 β”‚
β”‚     listens to Shiny broadcasts β†’ updates URL,      β”‚
β”‚     title, and dynamic context block                 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  4. Shiny App (Iframe)                              β”‚
β”‚     server.R                                        β”‚
β”‚     Receives URL params β†’ broadcasts state          β”‚
β”‚     changes via postMessage to parent page           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

1. SEO Metadata Generation (Build Time)

Script: scripts/export_seo_metadata.R (in the DWD project) Output: dwd-seo-metadata.json (in climateexplorer/netlify/edge-functions/)

This R script reads all 5 resolution RDS cache files and generates a JSON file containing:

  • Stations: {resolution}/{state-slug}/{station-slug} β†’ {id, name, state, stateSlug, elevation, lat, lon, resolution, resolutionLabel, resolutionSlug, overallStart, overallEnd, availableParams}
  • States: {resolution}/{state-slug} β†’ {state, stateSlug, resolution, resolutionLabel, resolutionSlug, stationCount, activeStationCount}
  • Resolutions: {resolution-slug} β†’ {key, label, slug, stationCount, activeStationCount}
  • Slug map: display name β†’ slug (for legacy URL redirect lookups)

To regenerate the metadata:

# Run from the DWD app root directory (clima/2025/dwd/)
Rscript scripts/export_seo_metadata.R

# Copy the output to the climateexplorer project (clima/2024/climateexplorer/)
cp dwd-seo-metadata.json ../../2024/climateexplorer/netlify/edge-functions/

2. Edge Function (Request Time)

File: climateexplorer/netlify/edge-functions/rewrite-meta.ts

When a request hits /dwd/{resolution}/{state?}/{station?}/:

  1. parseDwdPath() extracts path segments
  2. Looks up metadata from dwd-seo-metadata.json
  3. Injects into the HTML response:
    • <title> β€” e.g., "MΓΌnchen-Flughafen, Bayern – Daily Climate Data | DWD Explorer"
    • <meta name="description"> β€” station-specific description
    • <link rel="canonical"> β€” canonical URL
    • OG/Twitter meta tags
    • JSON-LD breadcrumb β€” structured data for Google
    • Body content (<div id="dynamic-context">) β€” rich HTML with station details, state lists, or country overview
    • window.__DWD_RESOLVED__ β€” resolved metadata for the JS layer

This is server-side rendered β€” Google sees full content without executing JavaScript.

3. Parent Page JavaScript (Client Side)

File: climateexplorer/dwd/dwd-page.js

On page load:

  1. parsePathParams() extracts resolution/state/station from the URL
  2. If /dwd/ (no resolution), defaults to "Daily"
  3. Builds iframe URL with Shiny query params
  4. Uses __DWD_RESOLVED__ metadata (from edge function) to pass real station IDs/names to iframe

On Shiny state changes (via postMessage):

  1. handleIframeMessage() receives broadcast from iframe
  2. updateBrowserUrl() updates the browser URL (using history.replaceState)
  3. updatePageTitle() updates the browser tab title
  4. updateDynamicContext() updates the context block HTML

4. Shiny App Broadcasts (Iframe)

File: server.R (in the DWD project)

The broadcast_state() function sends a postMessage to the parent page with:

list(
    station = station_id,
    stationName = station_name,
    landname = state_name,        # German state
    resolution = resolution,       # UI label (e.g., "Daily")
    view = active_view,
    start = start_date,
    end = end_date,
    countryStationCount = ...,     # Total stations for this resolution
    countryActiveCount = ...,      # Active in current date range
    countryStateList = ...,        # State breakdown for context block
    ...
)

Broadcast triggers (observers in server.R):

  1. Tab/view changes
  2. Station selection changes
  3. Station deselection
  4. Resolution changes
  5. Date range changes
  6. State filter changes (ignoreNULL = FALSE β€” fires on clear)

Sitemap Integration

indexed-pages.json

File: climateexplorer/netlify/edge-functions/indexed-pages.json

Defines the curated URLs to include in sitemap.xml:

{
  "/dwd": {
    "stations": [
      { "path": "daily/bayern/muenchen-flughafen" },
      { "path": "daily/sachsen/leipzig-holzhausen" }
    ],
    "regions": [
      { "path": "daily/bayern" },
      { "path": "daily/berlin" }
    ],
    "resolutions": [
      { "path": "daily" },
      { "path": "hourly" },
      { "path": "10-minutes" }
    ]
  }
}

Sitemap Normalization

Script: climateexplorer/scripts/normalize-sitemap.mjs

Runs after quarto render to inject curated URLs into sitemap.xml. The validator (scripts/lib/indexed-pages-validator.mjs) auto-approves path-based entries (entries with a path field).

Google Discovery Chain

The sitemap contains ~36 DWD seed URLs. Google discovers all other pages through internal links:

Sitemap: 5 resolution pages ──→ Each links to 16 states
                                         β”‚
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                    β–Ό
         16 state pages ──→ Each links to all stations in that state
                                         β”‚
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                    β–Ό
         ~1400 station pages (per resolution)

Total discoverable pages: ~5,000+ across all resolutions.

Legacy URL Redirect

Old query-parameter URLs are automatically redirected to clean paths:

/dwd/?resolution=Daily&landname=Bayern&station=MΓΌnchen-Flughafen
  β†’ 301 redirect β†’
/dwd/daily/bayern/muenchen-flughafen/

Handled by the edge function's "DWD Legacy Query-Param Redirect" block.

Applying to Other Sections

To implement this pattern for another section (e.g., /meteofrance/, /jma/):

1. Define the URL hierarchy

/{section}/{resolution}/{region?}/{station?}/

Choose meaningful slugs for resolutions, regions (departments, prefectures, countries), and stations.

2. Create SEO metadata

Write an R script to generate {section}-seo-metadata.json with:

  • Station metadata (name, region, coordinates, data range, parameters)
  • Region metadata (station counts)
  • Resolution metadata (station counts)
  • Slug map (display name β†’ URL slug)

3. Update the edge function

Add a parse{Section}Path() function and inject body content + meta tags.

4. Create the page JavaScript

Write a {section}-page.js that:

  • Parses path segments on load
  • Configures the iframe with Shiny query params
  • Listens for postMessage broadcasts and updates URL/title/context

5. Update Shiny's broadcast_state()

Ensure the Shiny app sends state/region/station names in its broadcasts so the JS can construct correct URLs.

6. Update indexed-pages.json

Add curated seed URLs for the section's resolutions, regions, and sample stations.

7. Verify

# Run sitemap normalization
QUARTO_PROJECT_OUTPUT_DIR=_site node scripts/normalize-sitemap.mjs

# Run sitemap checks
QUARTO_PROJECT_OUTPUT_DIR=_site node scripts/check-sitemap.mjs --fetch

# Test edge function locally
netlify dev

Key Files Reference

File Location Purpose
export_seo_metadata.R dwd/scripts/ Generate SEO metadata JSON
dwd-seo-metadata.json climateexplorer/netlify/edge-functions/ Station/state/resolution metadata
rewrite-meta.ts climateexplorer/netlify/edge-functions/ Edge function (SSR injection)
dwd-page.js climateexplorer/dwd/ Client-side URL sync
server.R dwd/ Shiny broadcast_state()
indexed-pages.json climateexplorer/netlify/edge-functions/ Sitemap seed URLs
normalize-sitemap.mjs climateexplorer/scripts/ Sitemap URL injection
indexed-pages-validator.mjs climateexplorer/scripts/lib/ Validates curated URLs