| """ |
| Analysis Guide Tool |
| ==================== |
| Provides methodological guidance for climate data analysis using python_repl. |
| |
| This tool returns TEXT INSTRUCTIONS (not executable code!) for: |
| - What approach to take |
| - How to structure the analysis |
| - Quality checks and pitfalls |
| - Best practices for visualization |
| |
| The agent uses python_repl to execute the actual analysis. |
| """ |
|
|
| from typing import Literal |
| from pydantic import BaseModel, Field |
| from langchain_core.tools import StructuredTool |
|
|
|
|
| |
| |
| |
|
|
| ANALYSIS_GUIDES = { |
| |
| |
| |
| "load_data": """ |
| ## Loading ERA5 Data |
| |
| ### When to use |
| - Initializing any analysis |
| - Loading downloaded Zarr data |
| |
| ### Workflow |
| 1. **Load data** — Use `xr.open_dataset('path', engine='zarr')` or `xr.open_zarr('path')`. |
| 2. **Inspect dataset** — Check coordinates and available variables. |
| 3. **Convert units** before any analysis: |
| - Temp (`t2`, `d2`, `skt`, `sst`, `stl1`): subtract 273.15 → °C |
| - Precip (`tp`, `cp`, `lsp`): multiply by 1000 → mm |
| - Pressure (`sp`, `mslp`): divide by 100 → hPa |
| |
| ### Quality Checklist |
| - [ ] Data loaded lazily (avoid `.load()` on large datasets) |
| - [ ] Units converted before aggregations |
| - [ ] Coordinate names verified (latitude vs lat, etc.) |
| |
| ### Common Pitfalls |
| - ⚠️ Loading multi-year global data into memory causes OOM. Keep operations lazy until subsetted. |
| - ⚠️ Some Zarr stores have `valid_time` instead of `time` — check with `.coords`. |
| - ⚠️ CRITICAL — LONGITUDE WRAPPING: ERA5 natively uses 0-360° longitudes. If your region is in the Western Hemisphere (Americas, Atlantic) or crosses the Prime Meridian, you MUST convert longitudes to -180/+180 BEFORE slicing or plotting. Use `ds.assign_coords(longitude=(((ds.longitude + 180) % 360) - 180)).sortby('longitude')`. Failing to do this causes maps to be 95% blank space with data crushed into a tiny sliver. |
| - ⚠️ UNIT SAFETY: When computing temperature DIFFERENCES or ANOMALIES, do NOT subtract 273.15 from the result. A temperature difference in Kelvin is numerically identical to a difference in °C. Subtracting 273.15 from an anomaly produces absurd ±200°C values. |
| """, |
|
|
| "spatial_subset": """ |
| ## Spatial Subsetting |
| |
| ### When to use |
| - Focusing on a specific region, country, or routing bounding box |
| - Reducing data size before heavy analysis |
| |
| ### Workflow |
| 1. **Determine bounds** — Find min/max latitude and longitude. |
| 2. **Check coordinate orientation** — ERA5 latitude is often descending (90 to -90). |
| 3. **Slice data** — `.sel(latitude=slice(north, south), longitude=slice(west, east))`. |
| |
| ### Quality Checklist |
| - [ ] Latitude sliced from North to South (max to min) for descending coords |
| - [ ] Longitudes match dataset format (convert -180/180 ↔ 0/360 if needed) |
| - [ ] Result is not empty — verify with `.shape` |
| |
| ### Common Pitfalls |
| - ⚠️ Slicing `slice(south, north)` on descending coords → empty result. |
| - ⚠️ Crossing the prime meridian in 0-360 coords requires concatenating two slices. |
| - ⚠️ Use `.sel(method='nearest')` for point extraction, not exact matching. |
| - ⚠️ If requested bounds use negative longitudes (e.g., -120 to -80 for US West Coast), ensure you have applied the longitude wrapping from `load_data` FIRST. Otherwise slicing negative values on a 0-360 dataset returns empty data. |
| - ⚠️ Always check latitude orientation: use `slice(north, south)` if descending, `slice(south, north)` if ascending. Verify with `ds.latitude[0] > ds.latitude[-1]`. |
| """, |
|
|
| "temporal_subset": """ |
| ## Temporal Subsetting & Aggregation |
| |
| ### When to use |
| - Isolating specific events, months, or seasons |
| - Downsampling hourly data to daily/monthly |
| |
| ### Workflow |
| 1. **Time slice** — `.sel(time=slice('2023-01-01', '2023-12-31'))`. |
| 2. **Filter** — Seasons: `.sel(time=ds.time.dt.season == 'DJF')`. |
| 3. **Resample** — `.resample(time='1D').mean()` for daily means. |
| |
| ### Quality Checklist |
| - [ ] Aggregation matches variable: `.mean()` for T/wind, `.sum()` for precip |
| - [ ] Leap years handled if using day-of-year grouping |
| |
| ### Common Pitfalls |
| - ⚠️ DJF wraps across years — verify start/end boundaries. |
| - ⚠️ `.resample()` (continuous) ≠ `.groupby()` (climatological). Don't mix them up. |
| - ⚠️ Radiation variables (`ssr`, `ssrd`) are accumulated — need differencing, not averaging. |
| - ⚠️ Hourly data is massive. Resample to daily ('1D') or monthly ('MS') IMMEDIATELY after spatial subsetting to avoid Memory/Timeout errors. |
| """, |
|
|
| |
| |
| |
| "anomalies": """ |
| ## Anomaly Analysis |
| |
| ### When to use |
| - "How unusual was this period?" |
| - Comparing current conditions to "normal" |
| - Any "above/below average" question |
| |
| ### Workflow |
| 1. **Define baseline** — ≥10 years (30 ideal). E.g. 1991-2020. |
| 2. **Compute climatology** — `clim = ds.groupby('time.month').mean('time')`. |
| 3. **Subtract** — `anomaly = ds.groupby('time.month') - clim`. |
| 4. **Convert units** — Report in °C, mm, m/s (not K, m, Pa). |
| 5. **Assess magnitude** — Compare to σ of the baseline period. |
| |
| ### Data Strategy for Large-Area Anomaly Maps |
| ⚠️ For spatial areas ≥30°×30° (e.g., tropical Pacific), do NOT download 30 years one-by-one! |
| 1. Download target period with 1 `retrieve_era5_data` call |
| 2. Download 3-5 recent years of the same month as baseline (3-5 calls) |
| 3. Average baseline files in `python_repl` → climatology |
| 4. Subtract climatology from target → anomaly map |
| A 5-year baseline is sufficient for spatial anomaly maps. |
| |
| ### Quality Checklist |
| - [ ] Baseline ≥10 years (for temporal/small-area analysis; 3-5 years OK for spatial maps) |
| - [ ] Same calendar grouping for clim and analysis |
| - [ ] Units converted for readability |
| - [ ] Spatial context: is anomaly regional or localized? |
| |
| ### Common Pitfalls |
| - ⚠️ Short baselines amplify noise. |
| - ⚠️ Daily climatologies with <30yr baseline are noisy → use monthly grouping. |
| - ⚠️ Be explicit: spatial anomaly vs temporal anomaly. |
| - ⚠️ CRITICAL MATH BUG: Anomaly in Kelvin EXACTLY EQUALS Anomaly in Celsius. DO NOT subtract 273.15 from a temperature anomaly! If you compute `(SST_K - CLIM_K) - 273.15`, you get absurd ±200°C anomalies. Either convert both to °C BEFORE subtracting, or leave the difference as-is. |
| |
| ### Interpretation |
| - Positive = warmer/wetter/windier than normal. |
| - ±1σ = common, ±2σ = unusual (5%), ±3σ = extreme (0.3%). |
| - Maps: MUST use `RdBu_r` centered at zero via `TwoSlopeNorm`. |
| """, |
|
|
| "zscore": """ |
| ## Z-Score Analysis (Standardized Anomalies) |
| |
| ### When to use |
| - Comparing extremity across different variables |
| - Standardizing for regions with different variability |
| - Identifying statistically significant departures |
| |
| ### Workflow |
| 1. **Compute baseline mean** — Grouped by month for seasonality. |
| 2. **Compute baseline std** — Same period, same grouping. |
| 3. **Standardize** — `z = (value - mean) / std`. |
| |
| ### Quality Checklist |
| - [ ] Standard deviation is non-zero everywhere |
| - [ ] Baseline period matches for mean and std |
| |
| ### Common Pitfalls |
| - ⚠️ Precipitation is NOT normally distributed — use SPI or percentiles instead of raw Z-scores. |
| - ⚠️ Z-scores near coastlines can be extreme due to mixed land/ocean std. |
| |
| ### Interpretation |
| - Z = 0: average. ±1: normal (68%). ±2: unusual (5%). ±3: extreme (0.3%). |
| """, |
|
|
| "trend_analysis": """ |
| ## Linear Trend Analysis |
| |
| ### When to use |
| - "Is it getting warmer/wetter over time?" |
| - Detecting long-term climate change signals |
| |
| ### Workflow |
| 1. **Downsample** — Convert to annual/seasonal means first. |
| 2. **Regress** — `scipy.stats.linregress` or `np.polyfit(degree=1)`. |
| 3. **Significance** — Extract p-value for the slope. |
| 4. **Scale** — Multiply annual slope by 10 → "per decade". |
| |
| ### Quality Checklist |
| - [ ] Period ≥20-30 years for meaningful trends |
| - [ ] Seasonal cycle removed before fitting |
| - [ ] Significance tested (p < 0.05) |
| - [ ] Report trend as units/decade |
| |
| ### Common Pitfalls |
| - ⚠️ Trend on daily data without removing seasonality → dominated by summer/winter swings. |
| - ⚠️ Short series have uncertain trends — report confidence intervals. |
| - ⚠️ Autocorrelation can inflate significance — consider using Mann-Kendall test. |
| - ⚠️ If p > 0.05, you MUST explicitly state the trend is NOT statistically significant. Do not present insignificant trends as real signals. |
| |
| ### Interpretation |
| - Report as °C/decade. Use stippling on maps for significant areas. |
| """, |
|
|
| "eof_analysis": """ |
| ## EOF/PCA Analysis |
| |
| ### When to use |
| - Finding dominant spatial patterns (ENSO, NAO, PDO) |
| - Dimensionality reduction of spatiotemporal data |
| |
| ### Workflow |
| 1. **Deseasonalize** — Compute anomalies to remove the seasonal cycle. |
| 2. **Latitude weighting** — Multiply by `np.sqrt(np.cos(np.deg2rad(lat)))`. |
| 3. **Decompose** — PCA on flattened space dimensions. |
| 4. **Reconstruct** — Map PCs back to spatial grid (EOFs). |
| |
| ### Quality Checklist |
| - [ ] Seasonal cycle removed |
| - [ ] Latitude weighting applied |
| - [ ] Variance explained (%) calculated per mode |
| - [ ] Physical interpretation attempted for leading modes |
| - [ ] Maps of EOF patterns MUST include coastlines (use Cartopy) for geographic context |
| - [ ] Variance explained (%) MUST be explicitly displayed in each plot title |
| |
| ### Common Pitfalls |
| - ⚠️ Unweighted EOFs inflate polar regions artificially. |
| - ⚠️ EOFs are mathematical constructs — not guaranteed to correspond to physical modes. |
| |
| ### Interpretation |
| - EOF1: dominant spatial pattern. PC1: its temporal evolution. |
| - If EOF1 explains >20% variance, it's highly dominant. |
| """, |
|
|
| "correlation_analysis": """ |
| ## Correlation Analysis |
| |
| ### When to use |
| - Spatial/temporal correlation mapping |
| - Lead-lag analysis (e.g., SST vs downstream precipitation) |
| - Teleconnection exploration |
| |
| ### Workflow |
| 1. **Deseasonalize both variables** — Remove seasonal cycle from both. |
| 2. **Align time coordinates** — Ensure identical time axes. |
| 3. **Correlate** — `xr.corr(var1, var2, dim='time')`. |
| 4. **Lead-lag** — Use `.shift(time=N)` month offsets to test delayed responses. |
| 5. **Significance** — Compute p-values, mask insignificant areas. |
| |
| ### Quality Checklist |
| - [ ] Both variables deseasonalized |
| - [ ] p-values computed (p < 0.05 for significance) |
| - [ ] Sample size adequate (≥30 time points) |
| |
| ### Common Pitfalls |
| - ⚠️ Correlating raw data captures the seasonal cycle — everything correlates with summer. |
| - ⚠️ Spatial autocorrelation inflates field significance — apply Bonferroni or FDR correction. |
| |
| ### Interpretation |
| - R² gives variance explained. Lead-lag peak indicates response time. |
| - Plot spatial R maps with `RdBu_r`, stipple significant areas. |
| """, |
|
|
| "composite_analysis": """ |
| ## Composite Analysis |
| |
| ### When to use |
| - Average conditions during El Niño vs La Niña years |
| - Spatial fingerprint of specific extreme events |
| - "What does the atmosphere look like when X happens?" |
| |
| ### Workflow |
| 1. **Define events** — Boolean mask of times exceeding a threshold (e.g., Niño3.4 > 0.5°C). |
| 2. **Subset data** — `.where(mask, drop=True)`. |
| 3. **Average** — Time mean of the subset = composite. |
| 4. **Compare** — Subtract climatological mean → composite anomaly. |
| |
| ### Quality Checklist |
| - [ ] Sample size ≥10 events for robustness |
| - [ ] Baseline climatology matches the season of the events |
| - [ ] Significance tested via bootstrap or t-test |
| |
| ### Common Pitfalls |
| - ⚠️ Compositing n=2 events → noise, not a physical signal. |
| - ⚠️ Mixing seasons in composite (El Niño in DJF vs JJA) obscures the signal. |
| |
| ### Interpretation |
| - Shows the typical anomaly expected when event occurs. |
| - Plot with `RdBu_r` diverging colormap. Stipple significant areas. |
| """, |
|
|
| "diurnal_cycle": """ |
| ## Diurnal Cycle Analysis |
| |
| ### When to use |
| - Hourly variability within days (afternoon convection, nighttime cooling) |
| - Solar radiation patterns |
| |
| ### Workflow |
| 1. **Group by hour** — `ds.groupby('time.hour').mean('time')`. |
| 2. **Convert to local time** — ERA5 is UTC. `Local = UTC + Longitude/15`. |
| 3. **Calculate amplitude** — `diurnal_range = max('hour') - min('hour')`. |
| |
| ### Quality Checklist |
| - [ ] Input data is hourly (not daily/monthly) |
| - [ ] UTC → local time conversion applied before labeling "afternoon"/"morning" |
| |
| ### Common Pitfalls |
| - ⚠️ Averaging global data by UTC hour mixes day and night across longitudes. |
| - ⚠️ Cloud cover (`tcc`) and radiation (`ssrd`) have strong diurnal signals — always check. |
| |
| ### Interpretation |
| - `blh` and `t2` peak mid-afternoon. Convective precip (`cp`) peaks late afternoon over land, early morning over oceans. |
| """, |
|
|
| "seasonal_decomposition": """ |
| ## Seasonal Decomposition |
| |
| ### When to use |
| - Separating the seasonal cycle from interannual variability |
| - Visualizing how a specific year deviates from the normal curve |
| |
| ### Workflow |
| 1. **Compute climatology** — `.groupby('time.month').mean('time')`. |
| 2. **Extract anomalies** — Subtract climatology from raw data. |
| 3. **Smooth trend** — Apply 12-month rolling mean to extract multi-year trends. |
| |
| ### Quality Checklist |
| - [ ] Baseline robust (≥10 years) |
| - [ ] Residual = raw - seasonal - trend (should be ~white noise) |
| |
| ### Common Pitfalls |
| - ⚠️ Day-of-year climatologies over short baselines are noisy — smooth with 15-day window. |
| |
| ### Interpretation |
| - Separates variance into: seasonal (predictable), trend (long-term), residual (weather noise). |
| """, |
|
|
| "spectral_analysis": """ |
| ## Spectral Analysis |
| |
| ### When to use |
| - Periodicity detection (ENSO 3-7yr, MJO 30-60d, annual/semi-annual) |
| - Confirming suspected oscillatory behavior |
| |
| ### Workflow |
| 1. **Prepare 1D series** — Spatial average or single point. |
| 2. **Detrend** — Remove linear trend AND seasonal cycle. |
| 3. **Compute spectrum** — `scipy.signal.welch` or `periodogram`. |
| 4. **Plot as Period** — X-axis = 1/frequency (years or days), not raw frequency. |
| |
| ### Quality Checklist |
| - [ ] No NaNs in time series (interpolate or drop) |
| - [ ] Time coordinate evenly spaced |
| - [ ] Seasonal cycle removed |
| |
| ### Common Pitfalls |
| - ⚠️ Seasonal cycle dominates spectrum if not removed — drowns everything else. |
| - ⚠️ Short records can't resolve low-frequency oscillations (need ≥3× the period). |
| |
| ### Interpretation |
| - Peaks = dominant cycles. ENSO: 3-7yr. QBO: ~28mo. MJO: 30-60d. Annual: 12mo. |
| """, |
|
|
| "spatial_statistics": """ |
| ## Spatial Statistics & Area Averaging |
| |
| ### When to use |
| - Computing a single time series for a geographic region |
| - Area-weighted means for reporting |
| - Field significance testing |
| |
| ### Workflow |
| 1. **Latitude weights** — `weights = np.cos(np.deg2rad(ds.latitude))`. |
| 2. **Apply** — `ds.weighted(weights).mean(dim=['latitude', 'longitude'])`. |
| 3. **Land/sea mask** — Apply if needed (e.g., ocean-only SST average). |
| |
| ### Quality Checklist |
| - [ ] Latitude weighting applied BEFORE spatial averaging |
| - [ ] Land-sea mask applied where relevant |
| - [ ] Units preserved correctly |
| |
| ### Common Pitfalls |
| - ⚠️ Unweighted averages bias toward poles (smaller grid cells over-counted). |
| - ⚠️ Global mean SST must exclude land points. |
| |
| ### Interpretation |
| - Produces physically accurate area-averaged time series. |
| """, |
|
|
| "multi_variable": """ |
| ## Multi-Variable Derived Quantities |
| |
| ### When to use |
| - Combining ERA5 variables for derived metrics |
| |
| ### Common Derivations |
| 1. **Wind speed** — `wspd = np.sqrt(u10**2 + v10**2)` (or u100/v100 for hub-height). |
| 2. **Wind direction** — `wdir = (270 - np.degrees(np.arctan2(v10, u10))) % 360`. |
| 3. **Relative humidity** — From `t2` and `d2` using Magnus formula. |
| 4. **Heat index** — Combine `t2` and `d2` (Steadman formula). |
| 5. **Vapour transport** — `IVT ≈ tcwv * wspd` (surface proxy). |
| 6. **Total precip check** — `tp ≈ cp + lsp`. |
| |
| ### Quality Checklist |
| - [ ] Variables share identical grids (time, lat, lon) |
| - [ ] Units matched before combining (both in °C, both in m/s, etc.) |
| |
| ### Common Pitfalls |
| - ⚠️ `mean(speed) ≠ speed_of_means` — always compute speed FIRST, then average. |
| - ⚠️ Wind direction requires proper 4-quadrant atan2, not naive arctan. |
| |
| ### Interpretation |
| - Derived metrics often better represent human/environmental impact than raw fields. |
| """, |
|
|
| "climatology_normals": """ |
| ## Climatology Normals (WMO Standard) |
| |
| ### When to use |
| - Computing 30-year normals |
| - Calculating "departure from normal" |
| |
| ### Workflow |
| 1. **Select base period** — Standard WMO epoch: 1991-2020 (or 1981-2010). |
| 2. **Compute monthly averages** — `normals = baseline.groupby('time.month').mean('time')`. |
| 3. **Departure** — `departure = current.groupby('time.month') - normals`. |
| |
| ### Quality Checklist |
| - [ ] Exactly 30 years used |
| - [ ] Same months compared (don't mix Feb normals with March data) |
| |
| ### Common Pitfalls |
| - ⚠️ Moving baselines make comparisons with WMO climate reports inconsistent. |
| |
| ### Interpretation |
| - "Normal" = statistical baseline. Departures express how much current conditions deviate. |
| """, |
|
|
| |
| |
| |
| "climate_indices": """ |
| ## Climate Indices |
| |
| ### When to use |
| - Assessing ENSO, NAO, PDO, AMO teleconnections |
| - Correlating local weather with large-scale modes |
| |
| ### Key Indices |
| - **ENSO (Niño 3.4)**: `sst` anomaly, 5°S-5°N, 170°W-120°W. El Niño > +0.5°C, La Niña < -0.5°C. |
| - **NAO**: `mslp` difference, Azores High minus Icelandic Low. Positive → mild European winters. |
| - **PDO**: Leading EOF of North Pacific `sst` (north of 20°N). 20-30yr phases. |
| - **AMO**: Detrended North Atlantic `sst` average. ~60-70yr cycle. |
| |
| ### Workflow |
| 1. **Extract region** — Use standard geographic bounds. |
| 2. **Compute anomaly** — Area-averaged, against 30yr baseline. |
| 3. **Smooth** — 3-to-5 month rolling mean. |
| |
| ### Quality Checklist |
| - [ ] Standard geographic bounds strictly followed |
| - [ ] Rolling mean applied to filter weather noise |
| - [ ] Latitude-weighted area average |
| |
| ### Common Pitfalls |
| - ⚠️ Without rolling mean, the index is too noisy for classification. |
| - ⚠️ Using incorrect region bounds produces a different (invalid) index. |
| - ⚠️ **MJO PROXY FAILURE:** Do NOT use Skin Temperature (`skt`) or SST to track the MJO over the ocean. The signal is effectively zero (~0.1°C variance). Always use Precipitation (`tp`), Total Column Water Vapour (`tcwv`), or Total Cloud Cover (`tcc`). |
| |
| ### Additional Indices |
| - **IOD (Indian Ocean Dipole)**: `sst` anomaly diff between Western (50-70°E, 10°S-10°N) and Eastern (90-110°E, 10°S-0°) poles. |
| """, |
|
|
| "extremes": """ |
| ## Extreme Event Analysis |
| |
| ### When to use |
| - Heat/cold extremes, heavy precipitation, tail-risk assessment |
| - Threshold exceedance frequency |
| |
| ### Workflow |
| 1. **Define threshold** — Absolute (e.g., T > 35°C) or percentile-based (> 95th pctl of baseline). |
| 2. **Create mask** — Boolean where condition is met. |
| 3. **Count** — Sum over time for extreme days per year/month. |
| 4. **Trend** — Check if frequency is increasing over time. |
| |
| ### Quality Checklist |
| - [ ] Percentiles from robust baseline (≥30 years) |
| - [ ] Use daily data, not monthly averages |
| - [ ] Units converted before applying thresholds |
| |
| ### Common Pitfalls |
| - ⚠️ 99th percentile on monthly averages misses true daily extremes entirely. |
| - ⚠️ Absolute thresholds (e.g., 35°C) are region-dependent — 35°C is normal in Sahara, extreme in London. |
| |
| ### Interpretation |
| - Increasing frequency of extremes = non-linear climate change impact. |
| - Report as "N days/year exceeding threshold" or "return period shortened from X to Y years". |
| """, |
|
|
| "drought_analysis": """ |
| ## Drought Analysis |
| |
| ### When to use |
| - Prolonged precipitation deficits |
| - Agricultural/hydrological impact assessment |
| - SPI (Standardized Precipitation Index) proxy |
| |
| ### Workflow |
| 1. **Extract precip** — Use `tp` in mm (×1000 from meters). |
| 2. **Accumulate** — Rolling sums: `tp.rolling(time=3).sum()` for 3-month SPI. |
| 3. **Standardize** — `(accumulated - mean) / std` → SPI proxy. |
| 4. **Cross-check** — Verify with `swvl1` (soil moisture) for ground-truth. |
| |
| ### Quality Checklist |
| - [ ] Monthly data used (not hourly) |
| - [ ] Baseline ≥30 years for stable statistics |
| - [ ] Multiple accumulation periods tested (1, 3, 6, 12 months) |
| |
| ### Common Pitfalls |
| - ⚠️ Absolute precipitation deficits are meaningless in deserts — always standardize. |
| - ⚠️ Gamma distribution fit (proper SPI) is better than raw Z-score for precip. |
| - ⚠️ CRITICAL BASELINE LENGTH: You MUST use a minimum 30-year baseline (e.g., 1991-2020) to compute the mean and std for SPI standardization. Computing z-scores on a 5-year period (e.g., using 2020-2024 as both study and reference period) is statistically invalid and creates artificial extreme spikes. |
| |
| ### Interpretation |
| - SPI < -1.0: Moderate drought. < -1.5: Severe. < -2.0: Extreme. |
| """, |
|
|
| "heatwave_detection": """ |
| ## Heatwave Detection |
| |
| ### When to use |
| - Identifying heatwave events using standard definitions |
| - Assessing heat-related risk periods |
| |
| ### Workflow |
| 1. **Daily data** — Must be daily resolution (resample hourly if needed). |
| 2. **Threshold** — 90th percentile of `t2` per calendar day from baseline. |
| 3. **Exceedance mask** — `is_hot = t2_daily > threshold_90`. |
| 4. **Streak detection** — Find ≥3 consecutive hot days using rolling sum ≥ 3. |
| |
| ### Quality Checklist |
| - [ ] Daily data (not monthly!) |
| - [ ] `t2` converted to °C |
| - [ ] Threshold is per-calendar-day (not a single annual value) |
| - [ ] Duration criterion applied (≥3 days) |
| |
| ### Common Pitfalls |
| - ⚠️ Monthly data — physically impossible to detect heatwaves. |
| - ⚠️ A single hot day is not a heatwave — duration matters. |
| - ⚠️ Nighttime temperatures (`t2` at 00/06 UTC) also matter for health impact. |
| - ⚠️ Using a flat seasonal anomaly threshold (e.g., "Summer Mean > +2°C") is NOT a heatwave detection method. This produces unphysical spatial artifacts. Heatwaves are discrete DAILY extreme events requiring per-calendar-day thresholds. |
| |
| ### Marine Heatwave Extension |
| - For ocean/SST heatwaves, use daily mean SST (not daily max). |
| - Marine heatwaves require ≥5 consecutive days above the 90th percentile threshold. |
| - Use a long baseline (e.g., 1991-2020) with a ±5-day smoothed calendar-day threshold. |
| |
| ### Interpretation |
| - Heatwaves require BOTH intensity (high T) AND duration (consecutive days). |
| - Report: number of events per year, mean duration, max intensity. |
| """, |
|
|
| "atmospheric_rivers": """ |
| ## Atmospheric Rivers Detection |
| |
| ### When to use |
| - Detecting AR events from integrated vapour transport proxy |
| - Extreme precipitation risk at landfall |
| |
| ### Workflow |
| 1. **Extract** — `tcwv` + `u10`, `v10`. |
| 2. **Compute IVT proxy** — `ivt = tcwv * np.sqrt(u10**2 + v10**2)`. |
| 3. **Threshold** — IVT proxy > 250 kg/m/s (approximate). |
| 4. **Shape check** — Feature should be elongated (>2000km long, <1000km wide). |
| |
| ### Quality Checklist |
| - [ ] Acknowledge this is surface-wind proxy (true IVT needs pressure-level data) |
| - [ ] Cross-validate with heavy `tp` at landfall |
| - [ ] Check for persistent (≥24h) plume features |
| |
| ### Common Pitfalls |
| - ⚠️ Tropical moisture pools are NOT ARs — wind-speed multiplier is essential to distinguish. |
| - ⚠️ This surface proxy underestimates true IVT — use conservative thresholds. |
| |
| ### Interpretation |
| - High `tcwv` + strong directed wind at coast = extreme flood risk. |
| - Map with `YlGnBu` for moisture intensity. |
| """, |
|
|
| "blocking_events": """ |
| ## Atmospheric Blocking Detection |
| |
| ### When to use |
| - Identifying persistent high-pressure blocks from MSLP |
| - Explaining prolonged heatwaves, droughts, or cold spells |
| |
| ### Workflow |
| 1. **Extract** — `mslp` in hPa (÷100 from Pa). |
| 2. **Compute anomalies** — Daily anomalies from climatology. |
| 3. **Detect** — Find positive anomalies > 1.5σ persisting ≥5 days. |
| 4. **Location** — Focus on mid-to-high latitudes (40-70°N typically). |
| |
| ### Quality Checklist |
| - [ ] 3-5 day rolling mean applied to filter transient ridges |
| - [ ] Persistence criterion enforced (≥5 days) |
| - [ ] Mid-latitude focus |
| |
| ### Common Pitfalls |
| - ⚠️ Fast-moving ridges are NOT blocks — persistence is key. |
| - ⚠️ Blocks in the Southern Hemisphere are rarer and weaker. |
| |
| ### Interpretation |
| - Blocks force storms to detour, causing prolonged rain on flanks and drought/heat underneath. |
| """, |
|
|
| "energy_budget": """ |
| ## Surface Energy Budget |
| |
| ### When to use |
| - Analyzing radiation balance and surface heating |
| - Solar energy potential assessment |
| |
| ### Workflow |
| 1. **Extract radiation** — `ssrd` (incoming solar), `ssr` (net solar after reflection). |
| 2. **Convert units** — J/m² to W/m² by dividing by accumulation period (3600s for hourly). |
| 3. **Compute albedo proxy** — `albedo ≈ 1 - (ssr / ssrd)` where ssrd > 0. |
| 4. **Seasonal patterns** — Group by month to see radiation cycle. |
| |
| ### Quality Checklist |
| - [ ] Accumulation period properly accounted for (hourly vs daily sums) |
| - [ ] Division by zero protected (nighttime ssrd = 0) |
| - [ ] Units clearly stated: W/m² or MJ/m²/day |
| |
| ### Common Pitfalls |
| - ⚠️ ERA5 radiation is ACCUMULATED over the forecast step — must difference consecutive steps for instantaneous values. |
| - ⚠️ `ssr` already accounts for clouds and albedo — don't double-correct. |
| |
| ### Interpretation |
| - Higher `ssrd` - High solar potential. Low `ssr/ssrd` ratio → high cloudiness or reflective surface (snow/ice). |
| """, |
|
|
| "wind_energy": """ |
| ## Wind Energy Assessment |
| |
| ### When to use |
| - Wind power density analysis |
| - Turbine hub-height wind resource mapping |
| |
| ### Workflow |
| 1. **Use hub-height winds** — `u100`, `v100` (100m, not 10m surface winds). |
| 2. **Compute speed** — `wspd100 = np.sqrt(u100**2 + v100**2)`. |
| 3. **Power density** — `P = 0.5 * rho * wspd100**3` where rho ≈ 1.225 kg/m³. |
| 4. **Capacity factor** — Fraction of time wind exceeds cut-in speed (~3 m/s) and stays below cut-out (~25 m/s). |
| 5. **Weibull fit** — Fit shape (k) and scale (A) parameters to the wind speed distribution. |
| |
| ### Quality Checklist |
| - [ ] Using 100m winds, NOT 10m (turbines don't operate at surface) |
| - [ ] Power density in W/m² |
| - [ ] Seasonal variation checked (winter vs summer) |
| |
| ### Common Pitfalls |
| - ⚠️ Using 10m winds severely underestimates wind energy potential. |
| - ⚠️ Mean wind speed misleads — power depends on speed CUBED, so variability matters enormously. |
| |
| ### Interpretation |
| - Power density >400 W/m² = excellent wind resource. |
| - Report Weibull k parameter: k < 2 = gusty/variable, k > 3 = steady flow. |
| """, |
|
|
| "moisture_budget": """ |
| ## Moisture Budget Analysis |
| |
| ### When to use |
| - Understanding precipitation sources |
| - Tracking moisture plumes and convergence zones |
| |
| ### Workflow |
| 1. **Extract** — `tcwv` (precipitable water), `tcw` (total column water incl. liquid/ice). |
| 2. **Temporal evolution** — Track `tcwv` changes to infer moisture convergence. |
| 3. **Relate to precip** — Compare `tcwv` peaks with `tp` to see conversion efficiency. |
| 4. **Spatial patterns** — Map `tcwv` to identify moisture corridors. |
| |
| ### Quality Checklist |
| - [ ] Distinguish `tcwv` (vapour only) from `tcw` (vapour + liquid + ice) |
| - [ ] Units: kg/m² (equivalent to mm of water) |
| |
| ### Common Pitfalls |
| - ⚠️ High `tcwv` doesn't guarantee rain — need a lifting mechanism. |
| - ⚠️ `tcw - tcwv` gives cloud water + ice content (proxy for cloud thickness). |
| |
| ### Interpretation |
| - `tcwv` > 50 kg/m² in tropics = moisture-laden atmosphere primed for heavy precip. |
| """, |
|
|
| "convective_potential": """ |
| ## Convective Potential (Thunderstorm Risk) |
| |
| ### When to use |
| - Thunderstorm forecasting and climatology |
| - Severe weather risk assessment |
| |
| ### Workflow |
| 1. **Extract CAPE** — Already available as `cape` variable (J/kg). |
| 2. **Classify risk** — Low (<300), Moderate (300-1000), High (1000-2500), Extreme (>2500 J/kg). |
| 3. **Combine with moisture** — High CAPE + high `tcwv` → heavy convective storms. |
| 4. **Check trigger** — Fronts, orography, or strong daytime heating (`t2` diurnal cycle). |
| |
| ### Quality Checklist |
| - [ ] CAPE alone is insufficient — need a trigger mechanism |
| - [ ] Check `blh` (boundary layer height) — deep BLH aids convective initiation |
| |
| ### Common Pitfalls |
| - ⚠️ CAPE = potential energy, not a guarantee. High CAPE + strong capping inversion = no storms. |
| - ⚠️ CAPE is most meaningful in afternoon hours — avoid pre-dawn values. |
| |
| ### Interpretation |
| - CAPE > 1000 J/kg with deep BLH (>2km) and high `tcwv` = significant thunderstorm risk. |
| """, |
|
|
| "snow_cover": """ |
| ## Snow Cover & Melt Analysis |
| |
| ### When to use |
| - Tracking snow accumulation and melt timing |
| - Climate change impacts on snowpack |
| |
| ### Workflow |
| 1. **Extract** — `sd` (Snow Depth in m water equivalent). |
| 2. **Seasonal cycle** — Track start/end of snow season per grid point. |
| 3. **Melt timing** — Find the date when `sd` drops below threshold. |
| 4. **Trend** — Check if snow season is shortening over decades. |
| 5. **Compare with `stl1`/`t2`** — Warming soil accelerates melt. |
| |
| ### Quality Checklist |
| - [ ] Units: meters of water equivalent |
| - [ ] Focus on mid/high latitudes and mountain regions |
| - [ ] Inter-annual variability large — use multi-year analysis |
| |
| ### Common Pitfalls |
| - ⚠️ ERA5 snow depth is modeled, not observed — cross-reference with station data. |
| - ⚠️ Rain-on-snow events can cause rapid melt not captured well in reanalysis. |
| |
| ### Interpretation |
| - Earlier melt = less summer water supply. Map with `Blues`, reversed for snowless areas. |
| """, |
|
|
| |
| |
| |
| "visualization_spatial": """ |
| ## Spatial Map Visualization |
| |
| ### When to use |
| - Mapping absolute climate fields (Temp, Wind, Precip, Pressure) |
| |
| ### Workflow |
| 1. **Figure** — `fig, ax = plt.subplots(figsize=(12, 8))`. |
| 2. **Meshgrid** — `lons, lats = np.meshgrid(data.longitude, data.latitude)`. |
| 3. **Plot** — `ax.pcolormesh(lons, lats, data, cmap=..., shading='auto')`. |
| 4. **Colorbar** — ALWAYS: `plt.colorbar(mesh, ax=ax, label='Units', shrink=0.8)`. |
| 5. **Cartopy** — Optional: add coastlines, land fill. Graceful fallback if not installed. |
| |
| ### Quality Checklist |
| - [ ] Figure 12×8 for maps |
| - [ ] Colormap matches variable: |
| - Temp: `RdYlBu_r` | Wind: `YlOrRd` | Precip: `YlGnBu` |
| - Pressure: `viridis` | Cloud: `Greys` | Anomalies: `RdBu_r` |
| - [ ] NEVER use `jet` |
| - [ ] Colorbar has label with units |
| - [ ] CARTOPY IS MANDATORY: Always use `cartopy.crs` projections with `ax.coastlines()` and `ax.add_feature(cfeature.BORDERS)`. Maps without coastlines appear as meaningless color blobs. |
| - [ ] Always pass `transform=ccrs.PlateCarree()` to `pcolormesh`/`contourf` when using Cartopy. |
| - [ ] For Arctic regions (latitude > 60°N), use `ccrs.NorthPolarStereo()` instead of `PlateCarree` to avoid extreme distortion. |
| - [ ] For US regional maps, add `cfeature.STATES` for state boundaries. |
| - [ ] NEVER use `Greys` colormap for humidity or precipitation. Use `YlGnBu` or `BrBG`. |
| - [ ] Categorical/binary maps (like hotspot masks) should use a categorical legend, not a continuous 0-1 colorbar. |
| |
| ### Common Pitfalls |
| - ⚠️ Diverging cmap on absolute data is misleading — diverging only for anomalies. |
| - ⚠️ Missing `shading='auto'` triggers deprecation warning. |
| """, |
|
|
| "visualization_timeseries": """ |
| ## Time Series Visualization |
| |
| ### When to use |
| - Temporal evolution of a variable at a point or region |
| |
| ### Workflow |
| 1. **Area average** — `ts = data.mean(dim=['latitude', 'longitude'])` (with lat weighting!). |
| 2. **Figure** — `fig, ax = plt.subplots(figsize=(10, 6))`. |
| 3. **Raw line** — `ax.plot(ts.time, ts, linewidth=1.5)`. |
| 4. **Smoothing** — Add rolling mean overlay with contrasting color. |
| 5. **Date formatting** — `fig.autofmt_xdate(rotation=30)`. |
| |
| ### Quality Checklist |
| - [ ] Figure 10×6 |
| - [ ] Y-axis has explicit units |
| - [ ] Legend included if multiple lines |
| - [ ] Trend line if requested: dashed with slope annotation |
| |
| ### Enhancements |
| - **Uncertainty band**: `ax.fill_between(time, mean-std, mean+std, alpha=0.2)` |
| - **Event markers**: `ax.axvline(date, color='red', ls='--')` |
| - **Twin axis**: `ax2 = ax.twinx()` for second variable |
| - **Date formatting**: Always use proper date labels (e.g., `mdates.DateFormatter('%b %d')`), NEVER raw day-of-month integers (1, 2, ... 31). |
| - **Y-axis range**: Do not set y-limits too narrow to artificially exaggerate peaks. Keep ranges physically reasonable. |
| - **Dual axes coloring**: If using `ax2 = ax.twinx()`, color the y-tick labels to match the corresponding line colors. |
| - **Grid lines**: Always add `ax.grid(True, alpha=0.3)` for precise value comparison. |
| |
| ### Common Pitfalls |
| - ⚠️ Hourly data over 10+ years → unreadable block of ink. Resample to daily first. |
| """, |
|
|
| "visualization_anomaly_map": """ |
| ## Anomaly Map Visualization |
| |
| ### When to use |
| - Diverging data: departures, trends, z-scores |
| - Any map that has positive AND negative values |
| |
| ### Workflow |
| 1. **Center at zero** — `from matplotlib.colors import TwoSlopeNorm`. |
| 2. **Norm** — `norm = TwoSlopeNorm(vmin=data.min(), vcenter=0, vmax=data.max())`. |
| 3. **Plot** — `pcolormesh(..., cmap='RdBu_r', norm=norm)`. |
| 4. **Stippling** — Overlay significance: `contourf(..., levels=[0, 0.05], hatches=['...'], colors='none')`. |
| |
| ### Quality Checklist |
| - [ ] Zero is EXACTLY white/neutral in the colorbar |
| - [ ] Warm/dry = Red; Cool/wet = Blue |
| - [ ] Precip anomalies: consider `BrBG` instead of `RdBu_r` |
| |
| ### Common Pitfalls |
| - ⚠️ Without `TwoSlopeNorm`, skewed data makes 0 appear colored → reader is misled. |
| - ⚠️ Symmetric vmin/vmax (`vmax = max(abs(data))`) can also work but wastes color range. |
| - ⚠️ CARTOPY IS MANDATORY for anomaly maps — always add `ax.coastlines()` and `ax.add_feature(cfeature.BORDERS)`. |
| - ⚠️ ROBUST COLORBAR LIMITS: NEVER use raw `data.min()` and `data.max()` for anomaly map limits. A single outlier cell can result in ±200°C scale making the map unreadable. Always use percentile-based limits: `vmax = np.nanpercentile(np.abs(data), 98)`. |
| - ⚠️ Always pass `transform=ccrs.PlateCarree()` when plotting with Cartopy. |
| """, |
|
|
| "visualization_wind": """ |
| ## Wind & Vector Visualization |
| |
| ### When to use |
| - Circulation patterns, wind fields, quiver/streamline plots |
| |
| ### Workflow |
| 1. **Speed background** — `wspd` with `pcolormesh` + `YlOrRd`. |
| 2. **Subsample vectors** — `skip = (slice(None, None, 5), slice(None, None, 5))` to avoid solid black. |
| 3. **Quiver** — `ax.quiver(lons[skip], lats[skip], u[skip], v[skip], color='black')`. |
| 4. **Alternative** — `ax.streamplot()` for flow visualization (less cluttered). |
| |
| ### Quality Checklist |
| - [ ] Background heatmap shows magnitude |
| - [ ] Vectors sparse enough to be readable |
| - [ ] Wind barbs: `ax.barbs()` for meteorological display |
| |
| ### Common Pitfalls |
| - ⚠️ Full-resolution quiver = completely black, unreadable mess. MUST subsample vectors. |
| - ⚠️ Check arrow scaling — default autoscale can make light winds invisible. |
| - ⚠️ REFERENCE ARROW MANDATORY: Always add `ax.quiverkey(q, 0.9, 1.05, 10, '10 m/s', labelpos='E')`. Without this, arrow magnitudes are uninterpretable. |
| - ⚠️ CARTOPY IS MANDATORY: Add `ax.coastlines()` and `ax.add_feature(cfeature.BORDERS)` to all wind maps. |
| - ⚠️ Always pass `transform=ccrs.PlateCarree()` to quiver/streamplot when using Cartopy. |
| |
| ### Interpretation |
| - Arrows = direction, background color = magnitude. Cyclonic rotation = storm. |
| """, |
|
|
| "visualization_comparison": """ |
| ## Multi-Panel Comparison |
| |
| ### When to use |
| - Before/after, two periods, difference maps |
| - Multi-variable side-by-side |
| |
| ### Workflow |
| 1. **Grid** — `fig, axes = plt.subplots(1, 3, figsize=(18, 6))`. |
| 2. **Panels 1 & 2** — Absolute values with SHARED `vmin`/`vmax`. |
| 3. **Panel 3** — Difference (A-B) with diverging cmap centered at zero. |
| |
| ### Quality Checklist |
| - [ ] Panels 1 & 2 share EXACT same vmin/vmax (otherwise visual comparison is invalid) |
| - [ ] Panel 3 has its own divergent colorbar centered at zero |
| - [ ] Titles clearly label what each panel shows |
| |
| ### Common Pitfalls |
| - ⚠️ Auto-scaled panels = impossible to compare visually. Always lock limits. |
| - ⚠️ Use Cartopy projections for ALL map panels: `subplot_kw={'projection': ccrs.PlateCarree()}`. Add `ax.coastlines()` to each. |
| - ⚠️ Always pass `transform=ccrs.PlateCarree()` to each panel's plotting call. |
| """, |
|
|
| "visualization_profile": """ |
| ## Hovmöller Diagrams |
| |
| ### When to use |
| - Lat-time or lon-time cross-sections |
| - Tracking wave propagation, ITCZ migration, monsoon onset |
| |
| ### Workflow |
| 1. **Average out one dimension** — e.g., average across latitudes to get (lon, time). |
| 2. **Transpose** — X=Time, Y=Lon/Lat. |
| 3. **Plot** — `contourf` or `pcolormesh`, figure 12×6. |
| |
| ### Quality Checklist |
| - [ ] X-axis uses date formatting |
| - [ ] Y-axis labels state the averaged geographic slice |
| - [ ] Colormap matches variable type |
| |
| ### Common Pitfalls |
| - ⚠️ Swapping axes makes the diagram unintuitive. Time → X-axis convention. |
| |
| ### Proxy Selection for Hovmöller |
| - ⚠️ For MJO tracking over the ocean, DO NOT use Skin Temperature (`skt`) — the signal is too weak (~0.1°C). Use Convective Precipitation (`cp`), Total Precipitation (`tp`), Total Column Water Vapour (`tcwv`), or Total Cloud Cover (`tcc`). |
| - ⚠️ Remove the seasonal cycle (subtract 30-day running mean) to isolate intraseasonal signals like MJO (30-60 day periods). |
| - ⚠️ Longitude axis must use standard geographic convention (-180 to +180), not 0-360. |
| |
| ### Interpretation |
| - Diagonal banding = propagating waves/systems. Vertical banding = stationary patterns. |
| """, |
|
|
| "visualization_distribution": """ |
| ## Distribution Visualization |
| |
| ### When to use |
| - Histograms, PDFs, box plots |
| - Comparing two time periods or regions |
| |
| ### Workflow |
| 1. **Flatten** — `.values.flatten()`, drop NaNs. |
| 2. **Shared bins** — `np.linspace(min, max, 50)`. |
| 3. **Plot** — `ax.hist(data, bins=bins, alpha=0.5, density=True, label='Period')`. |
| 4. **Median/mean markers** — Vertical lines with annotation. |
| |
| ### Quality Checklist |
| - [ ] `density=True` for comparing different-sized samples |
| - [ ] `alpha=0.5` for overlapping distributions |
| - [ ] Legend when comparing multiple distributions |
| |
| ### Common Pitfalls |
| - ⚠️ Raw counts (not density) skew comparison between periods with different sample sizes. |
| - ⚠️ Too few bins = lost detail. Too many = noisy. 30-50 bins is usually good. |
| |
| ### Interpretation |
| - Rightward shift = warming. Flatter + wider = more variability = more extremes. |
| """, |
|
|
| "visualization_animation": """ |
| ## Animated/Sequential Maps |
| |
| ### When to use |
| - Monthly/seasonal evolution of a field |
| - Event lifecycle (genesis → peak → decay) |
| |
| ### Workflow |
| 1. **Global limits** — Find absolute vmin/vmax across ALL timesteps. |
| 2. **Multi-panel grid** — `fig, axes = plt.subplots(2, 3, figsize=(18, 12))` for 6 timesteps. |
| 3. **Lock colorbars** — Same vmin/vmax on every panel. |
| 4. **Shared colorbar** — Remove per-panel colorbars, add one at the bottom. |
| |
| ### Quality Checklist |
| - [ ] Colorbar limits LOCKED across all panels (no jumping colors) |
| - [ ] Timestamps clearly labeled on each panel |
| - [ ] Static grid preferred over video (headless environment) |
| |
| ### Common Pitfalls |
| - ⚠️ Auto-scaled panels flash/jump between frames — always lock limits. |
| - ⚠️ MP4/GIF generation may fail in headless — use PNG grids instead. |
| - ⚠️ Use Cartopy projections for ALL map panels: `subplot_kw={'projection': ccrs.PlateCarree()}`. Add `.coastlines()` to each axis. |
| """, |
|
|
| "visualization_dashboard": """ |
| ## Summary Dashboard |
| |
| ### When to use |
| - Comprehensive overview: map + time series + statistics in one figure |
| - Publication-ready event summaries |
| |
| ### Workflow |
| 1. **Layout** — `fig = plt.figure(figsize=(16, 10))` + `matplotlib.gridspec`. |
| 2. **Top row** — Large spatial map (anomaly or mean field). |
| 3. **Bottom left** — Time series of regional mean. |
| 4. **Bottom right** — Distribution histogram or box plot. |
| |
| ### Quality Checklist |
| - [ ] `plt.tight_layout()` or `constrained_layout=True` to prevent overlap |
| - [ ] Consistent color theme across all panels |
| - [ ] Clear panel labels (a, b, c) |
| |
| ### Common Pitfalls |
| - ⚠️ Cramming too much into small figure → illegible text. Scale figure size up. |
| - ⚠️ Different aspect ratios between map and time series need explicit gridspec ratios. |
| - ⚠️ MIXED PROJECTION DANGER: Cartopy projections must ONLY be applied to MAP axes. If you add `projection=ccrs.PlateCarree()` to a time series or histogram panel, it will break the plot. Use `fig.add_subplot(gs[...], projection=ccrs.PlateCarree())` ONLY for spatial map panels. |
| - ⚠️ For dashboards showing Americas/Atlantic regions (e.g., Hurricane Otis), always wrap longitudes to -180/+180 and use `ax.set_extent([west, east, south, north])`. |
| """, |
|
|
| "visualization_contour": """ |
| ## Contour & Isobar Plots |
| |
| ### When to use |
| - Pressure maps with isobars |
| - Temperature isotherms |
| - Any smoothly varying field where specific levels matter |
| |
| ### Workflow |
| 1. **Define levels** — `levels = np.arange(990, 1040, 4)` for MSLP isobars. |
| 2. **Filled contour** — `ax.contourf(lons, lats, data, levels=levels, cmap=...)`. |
| 3. **Contour lines** — `cs = ax.contour(lons, lats, data, levels=levels, colors='black', linewidths=0.5)`. |
| 4. **Labels** — `ax.clabel(cs, inline=True, fontsize=8)`. |
| |
| ### Quality Checklist |
| - [ ] Level spacing is physically meaningful (e.g., 4 hPa for MSLP) |
| - [ ] Contour labels don't overlap |
| - [ ] Filled + line contours combined for best readability |
| |
| ### Common Pitfalls |
| - ⚠️ Too many levels → cluttered, unreadable. 10-15 levels max. |
| - ⚠️ Non-uniform level spacing requires manual colorbar ticks. |
| - ⚠️ CARTOPY IS MANDATORY: Use `subplot_kw={'projection': ccrs.PlateCarree()}` and add `ax.coastlines()`. |
| - ⚠️ Always pass `transform=ccrs.PlateCarree()` to `contour` and `contourf` calls. |
| |
| ### Interpretation |
| - Tightly packed isobars = strong pressure gradient = high winds. |
| """, |
|
|
| "visualization_correlation_map": """ |
| ## Spatial Correlation Maps |
| |
| ### When to use |
| - Showing where a variable correlates with an index (e.g., ENSO vs global precip) |
| - Teleconnection mapping |
| |
| ### Workflow |
| 1. **Compute index** — 1D time series (e.g., Niño3.4 SST anomaly). |
| 2. **Correlate** — `xr.corr(index, spatial_field, dim='time')` → 2D R-map. |
| 3. **Significance** — Compute p-values from sample size and R. |
| 4. **Plot** — Map R values with `RdBu_r` centered at zero. Stipple p < 0.05. |
| |
| ### Quality Checklist |
| - [ ] Both index and field deseasonalized |
| - [ ] R-map centered at zero (TwoSlopeNorm or symmetric limits) |
| - [ ] Significant areas stippled or hatched |
| - [ ] Sample size ≥30 stated |
| |
| ### Common Pitfalls |
| - ⚠️ Raw data correlations dominated by shared seasonal cycle. |
| - ⚠️ Field significance: many grid points → some will be significant by chance. Apply FDR correction. |
| |
| ### Interpretation |
| - R > 0: in-phase with index. R < 0: out-of-phase. |R| > 0.5 = strong relationship. |
| """, |
|
|
| |
| |
| |
| "maritime_route": """ |
| ## Maritime Route Risk Analysis |
| |
| ### When to use |
| - Analyzing weather risks along calculated shipping lanes |
| - Voyage planning and hazard assessment |
| |
| ### Workflow |
| 1. **Route** — Call `calculate_maritime_route` → waypoints + bounding box. |
| 2. **Data** — Download `u10`, `v10` for route bbox, target month, last 3 years. |
| 3. **Wind speed** — `wspd = np.sqrt(u10**2 + v10**2)`. |
| 4. **Extract** — Loop waypoints: `.sel(lat=lat, lon=lon, method='nearest')`. |
| 5. **Risk classify** — Safe (<10), Caution (10-17), Danger (17-24), Extreme (>24 m/s). |
| 6. **Statistics** — P95 wind speed at each waypoint, % time in each risk category. |
| |
| ### Quality Checklist |
| - [ ] Bounding box from route tool used DIRECTLY (don't convert coords) |
| - [ ] 3-year period for climatological context, not just one date |
| - [ ] Risk categories applied at waypoint level |
| |
| ### Common Pitfalls |
| - ⚠️ Global hourly downloads → timeout. Subset tightly to route bbox. |
| - ⚠️ Don't use bounding box mean — extract AT waypoints for route-specific risk. |
| """, |
|
|
| "maritime_visualization": """ |
| ## Maritime Route Risk Visualization |
| |
| ### When to use |
| - Plotting route risk maps with waypoint-level risk coloring |
| |
| ### Workflow |
| 1. **Background** — Map mean `wspd` with `pcolormesh` + `YlOrRd`. |
| 2. **Route line** — Dashed line connecting waypoints. |
| 3. **Waypoint scatter** — Color by risk: Green (<10), Amber (10-17), Coral (17-24), Red (>24 m/s). |
| 4. **Labels** — "ORIGIN" and "DEST" annotations. |
| 5. **Legend** — Custom 4-category legend (mandatory). |
| |
| ### Quality Checklist |
| - [ ] 4-category risk legend ALWAYS included |
| - [ ] Origin/Destination labeled |
| - [ ] Colormap: `YlOrRd` for wind speed |
| - [ ] Saved to PLOTS_DIR |
| |
| ### Common Pitfalls |
| - ⚠️ No legend → colored dots are meaningless to the user. |
| - ⚠️ Route line + waypoints must be on top (high zorder) to not be hidden by background. |
| - ⚠️ CARTOPY IS MANDATORY: Always add `ax.coastlines()` — without land boundaries it is impossible to see where the route passes relative to coastlines (e.g., Suez Canal, Malacca Strait). |
| - ⚠️ Always pass `transform=ccrs.PlateCarree()` to route scatter/line plotting calls when using Cartopy. |
| """, |
| } |
|
|
|
|
| |
| |
| |
|
|
| class AnalysisGuideArgs(BaseModel): |
| """Arguments for analysis guide retrieval.""" |
|
|
| topic: Literal[ |
| |
| "load_data", |
| "spatial_subset", |
| "temporal_subset", |
| |
| "anomalies", |
| "zscore", |
| "trend_analysis", |
| "eof_analysis", |
| |
| "correlation_analysis", |
| "composite_analysis", |
| "diurnal_cycle", |
| "seasonal_decomposition", |
| "spectral_analysis", |
| "spatial_statistics", |
| "multi_variable", |
| "climatology_normals", |
| |
| "climate_indices", |
| "extremes", |
| "drought_analysis", |
| "heatwave_detection", |
| "atmospheric_rivers", |
| "blocking_events", |
| |
| "energy_budget", |
| "wind_energy", |
| "moisture_budget", |
| "convective_potential", |
| "snow_cover", |
| |
| "visualization_spatial", |
| "visualization_timeseries", |
| "visualization_anomaly_map", |
| "visualization_wind", |
| "visualization_comparison", |
| "visualization_profile", |
| "visualization_distribution", |
| "visualization_animation", |
| "visualization_dashboard", |
| "visualization_contour", |
| "visualization_correlation_map", |
| |
| "maritime_route", |
| "maritime_visualization", |
| ] = Field( |
| description="Analysis topic to get guidance for" |
| ) |
|
|
|
|
| |
| |
| |
|
|
| def get_analysis_guide(topic: str) -> str: |
| """ |
| Get methodological guidance for climate data analysis. |
| |
| Returns text instructions for using python_repl to perform the analysis. |
| """ |
| guide = ANALYSIS_GUIDES.get(topic) |
|
|
| if not guide: |
| available = ", ".join(sorted(ANALYSIS_GUIDES.keys())) |
| return f"Unknown topic: {topic}. Available: {available}" |
|
|
| return f""" |
| # Analysis Guide: {topic.replace('_', ' ').title()} |
| |
| {guide} |
| |
| --- |
| Use python_repl to implement this analysis with your downloaded ERA5 data. |
| """ |
|
|
|
|
| |
| |
| |
|
|
| analysis_guide_tool = StructuredTool.from_function( |
| func=get_analysis_guide, |
| name="get_analysis_guide", |
| description=""" |
| Get methodological guidance for climate data analysis. |
| |
| Returns workflow steps, quality checklists, and pitfall warnings for: |
| - Data: load_data, spatial_subset, temporal_subset |
| - Statistics: anomalies, zscore, trend_analysis, eof_analysis |
| - Advanced: correlation_analysis, composite_analysis, diurnal_cycle, |
| seasonal_decomposition, spectral_analysis, spatial_statistics, |
| multi_variable, climatology_normals |
| - Climate: climate_indices, extremes, drought_analysis, heatwave_detection, |
| atmospheric_rivers, blocking_events |
| - Domain: energy_budget, wind_energy, moisture_budget, convective_potential, snow_cover |
| - Visualization: visualization_spatial, visualization_timeseries, |
| visualization_anomaly_map, visualization_wind, visualization_comparison, |
| visualization_profile, visualization_distribution, visualization_animation, |
| visualization_dashboard, visualization_contour, visualization_correlation_map |
| - Maritime: maritime_route, maritime_visualization |
| |
| Use this BEFORE writing analysis code in python_repl. |
| """, |
| args_schema=AnalysisGuideArgs, |
| ) |
|
|
|
|
| |
| visualization_guide_tool = StructuredTool.from_function( |
| func=get_analysis_guide, |
| name="get_visualization_guide", |
| description=""" |
| Get publication-grade visualization instructions for ERA5 climate data. |
| |
| CALL THIS BEFORE creating any plot to get: |
| - Correct colormap choices |
| - Standard value ranges |
| - Required map elements |
| - Best practices |
| |
| Available visualization topics: |
| - visualization_spatial: Maps with proper projections |
| - visualization_timeseries: Time series plots |
| - visualization_anomaly_map: Diverging anomaly maps |
| - visualization_wind: Quiver/streamline plots |
| - visualization_comparison: Multi-panel comparisons |
| - visualization_profile: Hovmöller diagrams |
| - visualization_distribution: Histograms/PDFs |
| - visualization_animation: Sequential map grids |
| - visualization_dashboard: Multi-panel summaries |
| - visualization_contour: Isobar/isotherm plots |
| - visualization_correlation_map: Spatial correlation maps |
| - maritime_visualization: Route risk maps |
| """, |
| args_schema=AnalysisGuideArgs, |
| ) |
|
|