yx21e commited on
Commit
d3bbb53
·
verified ·
1 Parent(s): d3bc17d

Remove manuscript TeX and table source artifacts

Browse files
Files changed (41) hide show
  1. README.md +6 -7
  2. artifacts/manifests/paper_outputs.sha256 +0 -20
  3. artifacts/manifests/paper_outputs.yml +1 -41
  4. artifacts/results/fireprone_contract_progression_table.generated.tex +0 -69
  5. artifacts/results/selection_regret_full_head_table.generated.tex +0 -2
  6. artifacts/results/selection_regret_main_table.generated.tex +0 -24
  7. artifacts/results/selection_regret_scope_sweep_20260505.generated.tex +0 -24
  8. artifacts/results/selection_regret_tolerance_family_table.generated.tex +0 -2
  9. docs/artifact_map.md +19 -44
  10. docs/huggingface_release_design.md +10 -4
  11. paper/main.tex +0 -141
  12. paper/manuscript_final.pdf +0 -3
  13. paper/references.bib +0 -465
  14. paper/sections/0_abstract.tex +0 -4
  15. paper/sections/1_intro.tex +0 -77
  16. paper/sections/2_backbone.tex +0 -39
  17. paper/sections/3_prelim.tex +0 -84
  18. paper/sections/4_experiments.tex +0 -435
  19. paper/sections/5_conclusion.tex +0 -31
  20. paper/sections/appendix.tex +0 -733
  21. paper_outputs/figures/fig_selection_regret_rq2.tikz +0 -120
  22. paper_outputs/tables/tab_app_analog_rank_depth.tex +0 -24
  23. paper_outputs/tables/tab_app_burned_area_median_acre.tex +0 -24
  24. paper_outputs/tables/tab_app_contract_params_full.tex +0 -22
  25. paper_outputs/tables/tab_app_head_architectures.tex +0 -36
  26. paper_outputs/tables/tab_app_heat_event_pr.tex +0 -24
  27. paper_outputs/tables/tab_app_matching_rule_params.tex +0 -17
  28. paper_outputs/tables/tab_app_occupancy_ppr_scope.tex +0 -27
  29. paper_outputs/tables/tab_app_scope_params.tex +0 -19
  30. paper_outputs/tables/tab_app_seed_robustness.tex +0 -36
  31. paper_outputs/tables/tab_app_smoke_high_event.tex +0 -24
  32. paper_outputs/tables/tab_app_spread_ap_by_scope.tex +0 -24
  33. paper_outputs/tables/tab_appendix_selection_regret_tolerance.tex +0 -2
  34. paper_outputs/tables/tab_fireprone_contract_progression.tex +0 -69
  35. paper_outputs/tables/tab_primary_results.tex +0 -62
  36. paper_outputs/tables/tab_selection_regret_full_head.tex +0 -2
  37. paper_outputs/tables/tab_selection_regret_scope.tex +0 -24
  38. paper_outputs/tables/tab_selection_regret_scope_sweep.tex +0 -24
  39. paper_outputs/tables/tab_supporting_results.tex +0 -120
  40. scripts/audit_release.py +7 -13
  41. scripts/reproduce_paper_outputs.py +4 -6
README.md CHANGED
@@ -17,7 +17,7 @@ pretty_name: WildFIRE-FM
17
 
18
  ![WildFIRE-FM summary](assets/wildfire_fm_model_card.svg)
19
 
20
- **WildFIRE-FM** is a wildfire-specialized regional reference backbone for 12-hour gridded wildfire occupancy prediction on a 5 km California grid. It is released with five seeded PyTorch checkpoints, model code, final-paper artifacts, and data-source notes. The raw data are **not** redistributed.
21
 
22
  The model is intended as a reproducible reference backbone for fixed-contract wildfire evaluation, not as a general global wildfire forecasting product. It was trained with regional weather, active-fire supervision, static fuel/canopy/exposure layers, and event-level wildfire resources used by supporting tasks in the paper.
23
 
@@ -29,7 +29,7 @@ The model is intended as a reproducible reference backbone for fixed-contract wi
29
 
30
  **Model code.** The compact U-Net definition is provided in `models/wildfire_fm/modeling_unet.py`, with a short loading example below.
31
 
32
- **Paper artifacts.** The final manuscript PDF and the final paper figures/tables are included under `paper/` and `paper_outputs/`. Compact CSV/JSON summaries are under `artifacts/results/`.
33
 
34
  **Data notes.** Data sources and access entry points are documented in `data_sources/DATA_SOURCES.md`; users must obtain source data from the original providers.
35
 
@@ -80,7 +80,7 @@ The paper evaluates WildFIRE-FM and ten Earth-FM comparators under fixed task co
80
  - **Smoke PM2.5 RMSE:** `4.4646 ± 0.0060`, where lower is better.
81
  - **Extreme-heat RMSE-C:** `0.2179 ± 0.0043`, where lower is better.
82
 
83
- The full final-paper tables are included as TeX blocks under `paper_outputs/tables/`.
84
 
85
  ### Fixed-Contract Checks From The Final Paper
86
 
@@ -111,7 +111,7 @@ See `data_sources/DATA_SOURCES.md` for source roles and access links.
111
 
112
  ## Reproducing Released Paper Outputs
113
 
114
- The lightweight check verifies the released final-paper artifacts from compact summaries. It does not require raw data or GPUs.
115
 
116
  ```bash
117
  python3 scripts/reproduce_paper_outputs.py
@@ -123,9 +123,8 @@ Full raw-data reruns require separately downloaded source data, local feature ca
123
 
124
  ```text
125
  models/wildfire_fm/ model code, manifests, and checkpoint metadata
126
- paper/ final manuscript PDF and LaTeX source snapshot
127
- paper_outputs/ final paper figures and TeX table blocks
128
- artifacts/results/ compact CSV/JSON summaries for released outputs
129
  experiments/ sanitized raw-rerun references and Slurm template
130
  data_sources/ source-data roles and access notes
131
  scripts/ artifact verification and figure/table rebuild helpers
 
17
 
18
  ![WildFIRE-FM summary](assets/wildfire_fm_model_card.svg)
19
 
20
+ **WildFIRE-FM** is a wildfire-specialized regional reference backbone for 12-hour gridded wildfire occupancy prediction on a 5 km California grid. It is released with five seeded PyTorch checkpoints, model code, final-paper figure previews, numeric summaries, and data-source notes. The raw data are **not** redistributed.
21
 
22
  The model is intended as a reproducible reference backbone for fixed-contract wildfire evaluation, not as a general global wildfire forecasting product. It was trained with regional weather, active-fire supervision, static fuel/canopy/exposure layers, and event-level wildfire resources used by supporting tasks in the paper.
23
 
 
29
 
30
  **Model code.** The compact U-Net definition is provided in `models/wildfire_fm/modeling_unet.py`, with a short loading example below.
31
 
32
+ **Evaluation artifacts.** Final-paper figure previews and sanitized compact CSV/JSON summaries are included under `assets/`, `paper_outputs/`, and `artifacts/results/`. Manuscript TeX/PDF files are intentionally not included in this model release.
33
 
34
  **Data notes.** Data sources and access entry points are documented in `data_sources/DATA_SOURCES.md`; users must obtain source data from the original providers.
35
 
 
80
  - **Smoke PM2.5 RMSE:** `4.4646 ± 0.0060`, where lower is better.
81
  - **Extreme-heat RMSE-C:** `0.2179 ± 0.0043`, where lower is better.
82
 
83
+ The public release includes sanitized CSV/JSON summaries used to audit the displayed values. Manuscript table TeX is not included.
84
 
85
  ### Fixed-Contract Checks From The Final Paper
86
 
 
111
 
112
  ## Reproducing Released Paper Outputs
113
 
114
+ The lightweight check verifies the released sanitized artifacts from compact summaries. It does not require raw data or GPUs.
115
 
116
  ```bash
117
  python3 scripts/reproduce_paper_outputs.py
 
123
 
124
  ```text
125
  models/wildfire_fm/ model code, manifests, and checkpoint metadata
126
+ paper_outputs/ final-paper figure PDFs retained for reproducibility
127
+ artifacts/results/ sanitized compact CSV/JSON summaries for released outputs
 
128
  experiments/ sanitized raw-rerun references and Slurm template
129
  data_sources/ source-data roles and access notes
130
  scripts/ artifact verification and figure/table rebuild helpers
artifacts/manifests/paper_outputs.sha256 CHANGED
@@ -4,31 +4,11 @@ ca11c75c03078a9be26421b527ab5a49f5fc43ce8e5edd7da14af120a247b67c assets/primary
4
  5552fb6cca6a0a683592e724b4bd562f923cf99c04e2abdb846546b1d67aecc4 assets/selection_regret_final.png
5
  34807e65ca71365a26a3b74cae70e6b40ae6f2151110e12c53e0efa9f8b726aa assets/supporting_rank_map_final.png
6
  024505248c8ba2bbb50d36d0b015d7fd7fbf5577b8b34faadda0efc972c6d3e8 assets/wildfire_fm_model_card.svg
7
- c342978b2f0f25cf6e430b860702895bbb3b512145c8c6e38aa2233b416d835e paper/manuscript_final.pdf
8
  b369d13e0419fa8272ccdc994b6642f3b141248a879c030218e387c583537eb2 paper_outputs/figures/fig_fireprone_contract_progression_compact.pdf
9
  e3110c70c3cf8ecb8671163a401a155920266e3f907f9c6baf08e27ec6e6c410 paper_outputs/figures/fig_primary_rank_change_map.pdf
10
  4e5b791ba4d136f722bd75a61097203836819ce9411def1caac4cc1e6d881275 paper_outputs/figures/fig_rank_heatmap1.pdf
11
- b2e56403e2774c457dd12c4685e2dc7492e22e32df46fcc5c37b3087110f2439 paper_outputs/figures/fig_selection_regret_rq2.tikz
12
  fabb8b55aac901199cc03773741a26685becffd074f52568c93bee517c2c42c0 paper_outputs/figures/fig_selection_regret_scatter.pdf
13
  bc4d35ad9cb4c1f9ba8f31c7c340d9684c9dd2d55f5a2e60604a2b58b90cbe40 paper_outputs/figures/fig_task_contract_tiles.pdf
14
  c382f5d69f25cc2f5db174601a33d0fd0928b44910a2a4b1c131954bd42113d9 paper_outputs/figures/fig_task_rank_map.pdf
15
  015ab951b0af5c130e4894092a5dd0bb0fd62e710467163a9df8246d8cf369f4 paper_outputs/figures/matching.pdf
16
  7dca6ae4a9b179693802f47d24dd66734c0f332b372a7976832a0d429333b755 paper_outputs/figures/overview_wildfire.pdf
17
- e8abbd2668517f5cae14933ed943fe103e74132886b0ff48ecd1685978549504 paper_outputs/tables/tab_app_analog_rank_depth.tex
18
- 81db28aace3366625f1cfd5935892eb5af672d5ecd8327e6dcba00b7b04e2b3c paper_outputs/tables/tab_app_burned_area_median_acre.tex
19
- 4a93401ef355c02eb0cc6b2e9a1506f9ed9d912301ec6829581247e40991bdfb paper_outputs/tables/tab_app_contract_params_full.tex
20
- 3c5398c28e6243b1784b27d2e9eab1a5c60e6e6d2cfd14a79aa6fd1e0499b871 paper_outputs/tables/tab_app_head_architectures.tex
21
- f740b8f076490e852efa88fa8180ca08bb6b12901ff3ec3687c7e5c0b236da4e paper_outputs/tables/tab_app_heat_event_pr.tex
22
- 86e97a394ceae8cc6eafd6d1021b44d13a117378ead87bfee662cc90a1e0e54b paper_outputs/tables/tab_app_matching_rule_params.tex
23
- 0b1ad4587dd440fdabf771000b1c971daa9222e946a3404c9beae10dd7ea67c6 paper_outputs/tables/tab_app_occupancy_ppr_scope.tex
24
- 4e79672c28a938cd9ba1bc0e423e7169eca389251a22357aff6fe84d3cbfa889 paper_outputs/tables/tab_app_scope_params.tex
25
- 6850ee131e203f66392c79f17f59214672b362274f42285b252b83ac0ede1eb3 paper_outputs/tables/tab_app_seed_robustness.tex
26
- 1ca91ca451f846e59cb62ea64a616780c698b9dee80918a05467bd6c40df2dd5 paper_outputs/tables/tab_app_smoke_high_event.tex
27
- cd65372622e8dd388adb1122a3e93b22d2090fba836405b08a078d5159b182de paper_outputs/tables/tab_app_spread_ap_by_scope.tex
28
- 2b168c92af29ae40c324e9660d48177ea0c79e4559a3c2aa571d53043ee83b53 paper_outputs/tables/tab_appendix_selection_regret_tolerance.tex
29
- c822daa85e29dde4ac92b4be34f4d41040fa04da3a2674bdc4d0494dbaaceb69 paper_outputs/tables/tab_fireprone_contract_progression.tex
30
- 6672c62a150d83a351f4fa23ac04537d9aaae01af6056f689437d9b7d8bcee40 paper_outputs/tables/tab_primary_results.tex
31
- d11d82273acb389b46c8fc1d15c1e37f1f90332ae9d1fb7b8eb5ff0f8847dc2d paper_outputs/tables/tab_selection_regret_full_head.tex
32
- 11f230e0462ded2821f3d5d45421d8b8278b61695d76799246d2e8bf873e2789 paper_outputs/tables/tab_selection_regret_scope.tex
33
- 3b1277700ececdbb4107667a5d4166a75224a84282810ec5d21bbf2ebc7fa163 paper_outputs/tables/tab_selection_regret_scope_sweep.tex
34
- 717555b2584658c936aa8fc27b63f1068dc5f796a297bcef0576cf020b3ddaf8 paper_outputs/tables/tab_supporting_results.tex
 
4
  5552fb6cca6a0a683592e724b4bd562f923cf99c04e2abdb846546b1d67aecc4 assets/selection_regret_final.png
5
  34807e65ca71365a26a3b74cae70e6b40ae6f2151110e12c53e0efa9f8b726aa assets/supporting_rank_map_final.png
6
  024505248c8ba2bbb50d36d0b015d7fd7fbf5577b8b34faadda0efc972c6d3e8 assets/wildfire_fm_model_card.svg
 
7
  b369d13e0419fa8272ccdc994b6642f3b141248a879c030218e387c583537eb2 paper_outputs/figures/fig_fireprone_contract_progression_compact.pdf
8
  e3110c70c3cf8ecb8671163a401a155920266e3f907f9c6baf08e27ec6e6c410 paper_outputs/figures/fig_primary_rank_change_map.pdf
9
  4e5b791ba4d136f722bd75a61097203836819ce9411def1caac4cc1e6d881275 paper_outputs/figures/fig_rank_heatmap1.pdf
 
10
  fabb8b55aac901199cc03773741a26685becffd074f52568c93bee517c2c42c0 paper_outputs/figures/fig_selection_regret_scatter.pdf
11
  bc4d35ad9cb4c1f9ba8f31c7c340d9684c9dd2d55f5a2e60604a2b58b90cbe40 paper_outputs/figures/fig_task_contract_tiles.pdf
12
  c382f5d69f25cc2f5db174601a33d0fd0928b44910a2a4b1c131954bd42113d9 paper_outputs/figures/fig_task_rank_map.pdf
13
  015ab951b0af5c130e4894092a5dd0bb0fd62e710467163a9df8246d8cf369f4 paper_outputs/figures/matching.pdf
14
  7dca6ae4a9b179693802f47d24dd66734c0f332b372a7976832a0d429333b755 paper_outputs/figures/overview_wildfire.pdf
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
artifacts/manifests/paper_outputs.yml CHANGED
@@ -1,4 +1,4 @@
1
- # Auto-generated output manifest for the Hugging Face release.
2
  outputs:
3
  - path: assets/overview_final.png
4
  sha256: 6db4f4aff90da8709edff97e4782aee9b4c5e8feefec7c7431a4ec8787cfe57c
@@ -12,16 +12,12 @@ outputs:
12
  sha256: 34807e65ca71365a26a3b74cae70e6b40ae6f2151110e12c53e0efa9f8b726aa
13
  - path: assets/wildfire_fm_model_card.svg
14
  sha256: 024505248c8ba2bbb50d36d0b015d7fd7fbf5577b8b34faadda0efc972c6d3e8
15
- - path: paper/manuscript_final.pdf
16
- sha256: c342978b2f0f25cf6e430b860702895bbb3b512145c8c6e38aa2233b416d835e
17
  - path: paper_outputs/figures/fig_fireprone_contract_progression_compact.pdf
18
  sha256: b369d13e0419fa8272ccdc994b6642f3b141248a879c030218e387c583537eb2
19
  - path: paper_outputs/figures/fig_primary_rank_change_map.pdf
20
  sha256: e3110c70c3cf8ecb8671163a401a155920266e3f907f9c6baf08e27ec6e6c410
21
  - path: paper_outputs/figures/fig_rank_heatmap1.pdf
22
  sha256: 4e5b791ba4d136f722bd75a61097203836819ce9411def1caac4cc1e6d881275
23
- - path: paper_outputs/figures/fig_selection_regret_rq2.tikz
24
- sha256: b2e56403e2774c457dd12c4685e2dc7492e22e32df46fcc5c37b3087110f2439
25
  - path: paper_outputs/figures/fig_selection_regret_scatter.pdf
26
  sha256: fabb8b55aac901199cc03773741a26685becffd074f52568c93bee517c2c42c0
27
  - path: paper_outputs/figures/fig_task_contract_tiles.pdf
@@ -32,39 +28,3 @@ outputs:
32
  sha256: 015ab951b0af5c130e4894092a5dd0bb0fd62e710467163a9df8246d8cf369f4
33
  - path: paper_outputs/figures/overview_wildfire.pdf
34
  sha256: 7dca6ae4a9b179693802f47d24dd66734c0f332b372a7976832a0d429333b755
35
- - path: paper_outputs/tables/tab_app_analog_rank_depth.tex
36
- sha256: e8abbd2668517f5cae14933ed943fe103e74132886b0ff48ecd1685978549504
37
- - path: paper_outputs/tables/tab_app_burned_area_median_acre.tex
38
- sha256: 81db28aace3366625f1cfd5935892eb5af672d5ecd8327e6dcba00b7b04e2b3c
39
- - path: paper_outputs/tables/tab_app_contract_params_full.tex
40
- sha256: 4a93401ef355c02eb0cc6b2e9a1506f9ed9d912301ec6829581247e40991bdfb
41
- - path: paper_outputs/tables/tab_app_head_architectures.tex
42
- sha256: 3c5398c28e6243b1784b27d2e9eab1a5c60e6e6d2cfd14a79aa6fd1e0499b871
43
- - path: paper_outputs/tables/tab_app_heat_event_pr.tex
44
- sha256: f740b8f076490e852efa88fa8180ca08bb6b12901ff3ec3687c7e5c0b236da4e
45
- - path: paper_outputs/tables/tab_app_matching_rule_params.tex
46
- sha256: 86e97a394ceae8cc6eafd6d1021b44d13a117378ead87bfee662cc90a1e0e54b
47
- - path: paper_outputs/tables/tab_app_occupancy_ppr_scope.tex
48
- sha256: 0b1ad4587dd440fdabf771000b1c971daa9222e946a3404c9beae10dd7ea67c6
49
- - path: paper_outputs/tables/tab_app_scope_params.tex
50
- sha256: 4e79672c28a938cd9ba1bc0e423e7169eca389251a22357aff6fe84d3cbfa889
51
- - path: paper_outputs/tables/tab_app_seed_robustness.tex
52
- sha256: 6850ee131e203f66392c79f17f59214672b362274f42285b252b83ac0ede1eb3
53
- - path: paper_outputs/tables/tab_app_smoke_high_event.tex
54
- sha256: 1ca91ca451f846e59cb62ea64a616780c698b9dee80918a05467bd6c40df2dd5
55
- - path: paper_outputs/tables/tab_app_spread_ap_by_scope.tex
56
- sha256: cd65372622e8dd388adb1122a3e93b22d2090fba836405b08a078d5159b182de
57
- - path: paper_outputs/tables/tab_appendix_selection_regret_tolerance.tex
58
- sha256: 2b168c92af29ae40c324e9660d48177ea0c79e4559a3c2aa571d53043ee83b53
59
- - path: paper_outputs/tables/tab_fireprone_contract_progression.tex
60
- sha256: c822daa85e29dde4ac92b4be34f4d41040fa04da3a2674bdc4d0494dbaaceb69
61
- - path: paper_outputs/tables/tab_primary_results.tex
62
- sha256: 6672c62a150d83a351f4fa23ac04537d9aaae01af6056f689437d9b7d8bcee40
63
- - path: paper_outputs/tables/tab_selection_regret_full_head.tex
64
- sha256: d11d82273acb389b46c8fc1d15c1e37f1f90332ae9d1fb7b8eb5ff0f8847dc2d
65
- - path: paper_outputs/tables/tab_selection_regret_scope.tex
66
- sha256: 11f230e0462ded2821f3d5d45421d8b8278b61695d76799246d2e8bf873e2789
67
- - path: paper_outputs/tables/tab_selection_regret_scope_sweep.tex
68
- sha256: 3b1277700ececdbb4107667a5d4166a75224a84282810ec5d21bbf2ebc7fa163
69
- - path: paper_outputs/tables/tab_supporting_results.tex
70
- sha256: 717555b2584658c936aa8fc27b63f1068dc5f796a297bcef0576cf020b3ddaf8
 
1
+ # Auto-generated public-output manifest for the Hugging Face release.
2
  outputs:
3
  - path: assets/overview_final.png
4
  sha256: 6db4f4aff90da8709edff97e4782aee9b4c5e8feefec7c7431a4ec8787cfe57c
 
12
  sha256: 34807e65ca71365a26a3b74cae70e6b40ae6f2151110e12c53e0efa9f8b726aa
13
  - path: assets/wildfire_fm_model_card.svg
14
  sha256: 024505248c8ba2bbb50d36d0b015d7fd7fbf5577b8b34faadda0efc972c6d3e8
 
 
15
  - path: paper_outputs/figures/fig_fireprone_contract_progression_compact.pdf
16
  sha256: b369d13e0419fa8272ccdc994b6642f3b141248a879c030218e387c583537eb2
17
  - path: paper_outputs/figures/fig_primary_rank_change_map.pdf
18
  sha256: e3110c70c3cf8ecb8671163a401a155920266e3f907f9c6baf08e27ec6e6c410
19
  - path: paper_outputs/figures/fig_rank_heatmap1.pdf
20
  sha256: 4e5b791ba4d136f722bd75a61097203836819ce9411def1caac4cc1e6d881275
 
 
21
  - path: paper_outputs/figures/fig_selection_regret_scatter.pdf
22
  sha256: fabb8b55aac901199cc03773741a26685becffd074f52568c93bee517c2c42c0
23
  - path: paper_outputs/figures/fig_task_contract_tiles.pdf
 
28
  sha256: 015ab951b0af5c130e4894092a5dd0bb0fd62e710467163a9df8246d8cf369f4
29
  - path: paper_outputs/figures/overview_wildfire.pdf
30
  sha256: 7dca6ae4a9b179693802f47d24dd66734c0f332b372a7976832a0d429333b755
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
artifacts/results/fireprone_contract_progression_table.generated.tex DELETED
@@ -1,69 +0,0 @@
1
- \begin{table*}[t]
2
- \centering
3
- \scriptsize
4
- \setlength{\tabcolsep}{4pt}
5
- \caption{Occupancy scores across global and fire-prone scopes. Global uses the full validation/test domain; top-\(k\) rows use train-defined fire-prone masks from historical fire frequency. Values are \(F_1\) percentages from the same validation-selected strict threshold. Tolerance is spatial-only; union adds temporal and spatial matching. Difference is union minus strict. Rows report five-seed mean with small std. Values use four decimals.}
6
- \label{tab:fireprone_contract_progression}
7
- \begin{adjustbox}{max width=\textwidth}
8
- \begin{tabular}{@{}llcccc@{}}
9
- \toprule
10
- Backbone & Scope & Strict \(F_1\uparrow\) & Tolerance \(F_1\uparrow\) & Union \(F_1\uparrow\) & Difference \(\uparrow\) \\
11
- \midrule
12
- \textcolor{blue}{FireWx-FM ref.} & global & \ms{0.4550}{0.1410} & \ms{29.7480}{1.2870} & \ms{59.0660}{2.7370} & \ms{58.6110}{2.6950} \\
13
- & top 5\% & \ms{3.5600}{0.8810} & \ms{39.2620}{1.4010} & \ms{72.8280}{2.5780} & \ms{69.2680}{1.9960} \\
14
- & top 10\% & \ms{3.5580}{0.8800} & \ms{39.1660}{1.3910} & \ms{72.5200}{2.5670} & \ms{68.9630}{1.9890} \\
15
- & top 20\% & \ms{3.5300}{0.8700} & \ms{38.2850}{1.2950} & \ms{69.7230}{2.4660} & \ms{66.1930}{1.9270} \\
16
- \addlinespace[1pt]
17
- Prithvi-WxC & global & \ms{0.0550}{0.0040} & \ms{7.1600}{0.6600} & \ms{20.1900}{1.8300} & \ms{20.1300}{1.8300} \\
18
- & top 5\% & \ms{1.4100}{1.1600} & \ms{19.2600}{4.5000} & \ms{42.5800}{4.5500} & \ms{41.1700}{3.4800} \\
19
- & top 10\% & \ms{1.2400}{1.3200} & \ms{14.8800}{8.4400} & \ms{32.6900}{13.2100} & \ms{31.4500}{11.9100} \\
20
- & top 20\% & \ms{1.1500}{1.3800} & \ms{13.1500}{9.4600} & \ms{28.1300}{15.2900} & \ms{26.9800}{13.9200} \\
21
- \addlinespace[1pt]
22
- Aurora & global & \ms{0.0700}{0.0100} & \ms{8.5000}{1.9600} & \ms{23.1000}{4.9400} & \ms{23.0400}{4.9300} \\
23
- & top 5\% & \ms{0.9900}{0.9300} & \ms{15.1300}{6.0800} & \ms{35.4800}{11.0200} & \ms{34.5000}{10.3700} \\
24
- & top 10\% & \ms{0.7800}{1.0500} & \ms{12.7400}{6.5600} & \ms{30.5300}{10.8800} & \ms{29.7500}{9.8700} \\
25
- & top 20\% & \ms{0.6700}{1.1000} & \ms{10.5300}{7.4300} & \ms{24.9400}{12.5800} & \ms{24.2800}{11.4900} \\
26
- \addlinespace[1pt]
27
- ClimaX & global & \ms{0.3500}{0.0800} & \ms{29.7500}{3.6100} & \ms{60.1500}{7.5900} & \ms{59.8000}{7.5500} \\
28
- & top 5\% & \ms{1.2900}{0.1100} & \ms{34.5800}{2.3800} & \ms{69.2200}{5.7200} & \ms{67.9200}{5.7300} \\
29
- & top 10\% & \ms{1.2500}{0.1600} & \ms{34.3300}{2.2900} & \ms{68.5700}{5.5400} & \ms{67.3200}{5.5500} \\
30
- & top 20\% & \ms{1.0300}{0.2700} & \ms{30.2100}{4.2900} & \ms{60.0600}{7.5700} & \ms{59.0400}{7.5900} \\
31
- \addlinespace[1pt]
32
- StormCast & global & \ms{0.0560}{0.0110} & \ms{8.2000}{2.1900} & \ms{22.3800}{5.4300} & \ms{22.3200}{5.4200} \\
33
- & top 5\% & \ms{0.9600}{0.8000} & \ms{15.3200}{5.5300} & \ms{36.1900}{9.7300} & \ms{35.2300}{9.1800} \\
34
- & top 10\% & \ms{0.7300}{0.9300} & \ms{12.6700}{6.3300} & \ms{30.4700}{10.6500} & \ms{29.7500}{9.7500} \\
35
- & top 20\% & \ms{0.5800}{0.9100} & \ms{10.4200}{7.3400} & \ms{24.6600}{12.4000} & \ms{24.0800}{11.5000} \\
36
- \addlinespace[1pt]
37
- AlphaEarth & global & \ms{2.0600}{0.4400} & \ms{29.4500}{6.0100} & \ms{37.4300}{9.9500} & \ms{35.3700}{10.0300} \\
38
- & top 5\% & \ms{6.9100}{0.8500} & \ms{42.8800}{4.6100} & \ms{51.7400}{8.7300} & \ms{44.8300}{9.0800} \\
39
- & top 10\% & \ms{6.6400}{0.9900} & \ms{41.9000}{5.9500} & \ms{50.5700}{10.0100} & \ms{43.9300}{9.9200} \\
40
- & top 20\% & \ms{6.1900}{1.1300} & \ms{38.8300}{7.5000} & \ms{46.3800}{12.1700} & \ms{40.1900}{11.6800} \\
41
- \addlinespace[1pt]
42
- DLWP & global & \ms{0.1700}{0.0400} & \ms{14.9100}{3.2400} & \ms{28.1900}{6.9700} & \ms{28.0200}{6.9300} \\
43
- & top 5\% & \ms{1.8100}{0.4800} & \ms{31.7200}{3.2900} & \ms{55.4600}{5.2900} & \ms{53.6500}{5.4800} \\
44
- & top 10\% & \ms{1.6100}{0.6000} & \ms{27.6600}{5.9200} & \ms{47.1300}{8.0100} & \ms{45.5200}{7.7900} \\
45
- & top 20\% & \ms{1.5200}{0.9000} & \ms{20.9400}{4.8000} & \ms{34.9300}{7.8500} & \ms{33.4100}{7.8800} \\
46
- \addlinespace[1pt]
47
- FCN & global & \ms{0.2800}{0.0800} & \ms{19.5100}{3.3400} & \ms{40.0600}{9.3700} & \ms{39.7800}{9.3400} \\
48
- & top 5\% & \ms{1.6200}{0.5100} & \ms{29.3800}{2.7600} & \ms{54.3000}{7.4100} & \ms{52.6800}{7.4400} \\
49
- & top 10\% & \ms{1.1800}{0.5100} & \ms{22.4200}{3.9800} & \ms{43.4500}{9.2500} & \ms{42.2700}{9.0300} \\
50
- & top 20\% & \ms{1.0000}{0.4300} & \ms{16.9800}{3.9400} & \ms{34.0900}{8.2600} & \ms{33.0900}{7.9300} \\
51
- \addlinespace[1pt]
52
- FengWu & global & \ms{0.2600}{0.0800} & \ms{12.0000}{6.0200} & \ms{24.1000}{13.6300} & \ms{23.8400}{13.5700} \\
53
- & top 5\% & \ms{1.5700}{0.3600} & \ms{16.2800}{3.7000} & \ms{30.1100}{5.0100} & \ms{28.5400}{4.7700} \\
54
- & top 10\% & \ms{1.2400}{0.5300} & \ms{12.9500}{5.6100} & \ms{24.1900}{8.6900} & \ms{22.9400}{8.1900} \\
55
- & top 20\% & \ms{1.1200}{0.5000} & \ms{11.9500}{5.0700} & \ms{22.7900}{7.9100} & \ms{21.6700}{7.4400} \\
56
- \addlinespace[1pt]
57
- FuXi & global & \ms{0.3800}{0.1200} & \ms{21.0300}{4.8200} & \ms{37.2900}{9.4500} & \ms{36.9100}{9.4300} \\
58
- & top 5\% & \ms{2.0300}{0.6800} & \ms{31.8900}{4.7300} & \ms{53.9300}{8.3800} & \ms{51.9000}{8.6900} \\
59
- & top 10\% & \ms{1.6500}{0.7300} & \ms{24.0100}{5.7800} & \ms{40.2100}{9.9300} & \ms{38.5600}{9.7700} \\
60
- & top 20\% & \ms{1.3600}{0.6800} & \ms{21.9500}{5.8600} & \ms{36.7300}{10.0300} & \ms{35.3700}{9.9200} \\
61
- \addlinespace[1pt]
62
- Pangu-Weather & global & \ms{0.2800}{0.1100} & \ms{17.0900}{4.0500} & \ms{35.6400}{9.0300} & \ms{35.3600}{9.0800} \\
63
- & top 5\% & \ms{1.3700}{0.3100} & \ms{22.2200}{6.8600} & \ms{43.4200}{13.2400} & \ms{42.0600}{13.0600} \\
64
- & top 10\% & \ms{1.0900}{0.3500} & \ms{18.9300}{5.9300} & \ms{38.5300}{11.7200} & \ms{37.4400}{11.5300} \\
65
- & top 20\% & \ms{0.8800}{0.3600} & \ms{17.0200}{5.4900} & \ms{34.5700}{10.2900} & \ms{33.6800}{10.1300} \\
66
- \bottomrule
67
- \end{tabular}
68
- \end{adjustbox}
69
- \end{table*}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
artifacts/results/selection_regret_full_head_table.generated.tex DELETED
@@ -1,2 +0,0 @@
1
- % Full per-head rows are kept in the supplementary CSV files.
2
- % The manuscript uses the all-backbone selection-regret summaries instead.
 
 
 
artifacts/results/selection_regret_main_table.generated.tex DELETED
@@ -1,24 +0,0 @@
1
- \begin{table*}[!t]
2
- \centering
3
- \small
4
- \setlength{\tabcolsep}{4pt}
5
- \caption{Fixed-feature selection-regret check across evaluation scopes. Values are percentage-point regret \(\delta = D(h_D)-D(h_R)\) under union-\(F_1\), where \(h_R\) is selected by PR-AUC and \(h_D\) by the decision metric. Top-\(k\) columns use train-defined fire-prone scopes. Rows report mean with small std over five seeds; \(0.0000\) means the two selectors give the same decision score for all seeds.}
6
- \label{tab:selection_regret_diagnostic}
7
- \begin{tabular}{lcccc}
8
- \toprule
9
- \textbf{Feature source} & \textbf{\(\Omega=\)global} & \textbf{\(\Omega=\)top 5\%} & \textbf{\(\Omega=\)top 10\%} & \textbf{\(\Omega=\)top 20\%} \\
10
- \midrule
11
- \textcolor{blue}{FireWx-FM ref.} & \ms{7.3831}{7.4536} & \ms{0.3664}{0.6812} & \ms{1.2275}{1.2665} & \ms{2.9385}{2.7513} \\
12
- Prithvi-WxC & 0.0000 & 0.0000 & 0.0000 & 0.0000 \\
13
- Aurora & \ms{4.9455}{10.6974} & \ms{15.4283}{34.4987} & \ms{13.9934}{31.2903} & \ms{14.3706}{32.1337} \\
14
- ClimaX & \ms{0.1296}{0.1775} & 0.0000 & 0.0000 & 0.0000 \\
15
- StormCast & 0.0000 & 0.0000 & 0.0000 & 0.0000 \\
16
- DLWP & 0.0000 & \ms{1.6716}{1.6079} & \ms{2.8465}{2.6938} & \ms{4.4634}{4.3561} \\
17
- FCN & 0.0000 & \ms{0.4510}{1.0071} & \ms{0.4200}{0.9390} & \ms{1.1680}{1.9872} \\
18
- FengWu & 0.0000 & \ms{0.8796}{0.5532} & \ms{0.4023}{0.5511} & \ms{0.5222}{0.6239} \\
19
- FuXi & 0.0000 & \ms{1.3545}{2.0970} & \ms{0.1656}{0.3703} & \ms{0.2833}{0.3681} \\
20
- Pangu-Weather & 0.0000 & \ms{0.7593}{0.8974} & \ms{0.3048}{0.5054} & \ms{0.1868}{0.3255} \\
21
- AlphaEarth & \ms{17.2217}{8.8492} & \ms{6.3846}{4.9653} & \ms{6.5738}{6.8970} & \ms{3.8804}{5.9483} \\
22
- \bottomrule
23
- \end{tabular}
24
- \end{table*}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
artifacts/results/selection_regret_scope_sweep_20260505.generated.tex DELETED
@@ -1,24 +0,0 @@
1
- \begin{table*}[!t]
2
- \centering
3
- \small
4
- \setlength{\tabcolsep}{4pt}
5
- \caption{Fixed-feature selection-regret sweep across evaluation scopes. Values are percentage-point regret \(\delta = D(h_D)-D(h_R)\) under union-\(F_1\). Top-\(k\) scopes are train-defined fire-prone masks. Rows report mean with small std over five seeds.}
6
- \label{tab:selection_regret_scope_sweep}
7
- \begin{tabular}{lcccc}
8
- \toprule
9
- \textbf{Feature source} & \textbf{\(\Omega=\)global} & \textbf{\(\Omega=\)top 5\%} & \textbf{\(\Omega=\)top 10\%} & \textbf{\(\Omega=\)top 20\%} \\
10
- \midrule
11
- \textcolor{blue}{FireWx-FM ref.} & \ms{7.3831}{7.4536} & \ms{0.3664}{0.6812} & \ms{1.2275}{1.2665} & \ms{2.9385}{2.7513} \\
12
- Prithvi-WxC & 0.0000 & 0.0000 & 0.0000 & 0.0000 \\
13
- Aurora & \ms{4.9455}{10.6974} & \ms{15.4283}{34.4987} & \ms{13.9934}{31.2903} & \ms{14.3706}{32.1337} \\
14
- ClimaX & \ms{0.1296}{0.1775} & 0.0000 & 0.0000 & 0.0000 \\
15
- StormCast & 0.0000 & 0.0000 & 0.0000 & 0.0000 \\
16
- DLWP & 0.0000 & \ms{1.6716}{1.6079} & \ms{2.8465}{2.6938} & \ms{4.4634}{4.3561} \\
17
- FCN & 0.0000 & \ms{0.4510}{1.0071} & \ms{0.4200}{0.9390} & \ms{1.1680}{1.9872} \\
18
- FengWu & 0.0000 & \ms{0.8796}{0.5532} & \ms{0.4023}{0.5511} & \ms{0.5222}{0.6239} \\
19
- FuXi & 0.0000 & \ms{1.3545}{2.0970} & \ms{0.1656}{0.3703} & \ms{0.2833}{0.3681} \\
20
- Pangu-Weather & 0.0000 & \ms{0.7593}{0.8974} & \ms{0.3048}{0.5054} & \ms{0.1868}{0.3255} \\
21
- AlphaEarth & \ms{17.2217}{8.8492} & \ms{6.3846}{4.9653} & \ms{6.5738}{6.8970} & \ms{3.8804}{5.9483} \\
22
- \bottomrule
23
- \end{tabular}
24
- \end{table*}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
artifacts/results/selection_regret_tolerance_family_table.generated.tex DELETED
@@ -1,2 +0,0 @@
1
- % Replaced by the all-backbone value table in sections/appendix.tex
2
- % (Table~\ref{tab:appendix_selection_regret_tolerance}).
 
 
 
docs/artifact_map.md CHANGED
@@ -1,56 +1,31 @@
1
- # Paper Artifact Map
2
 
3
- This map links every table and figure label in the current manuscript to the
4
- public release artifact and its provenance. Final output checksums are stored in
5
- `artifacts/manifests/paper_outputs.sha256`.
6
 
7
- ## Figures
8
 
9
- | Paper label | Release file | Provenance |
10
  |---|---|---|
11
- | `fig:toy_occupancy_contract` | `paper_outputs/figures/matching.pdf` | Static vector schematic used by the manuscript. |
12
- | `fig:task_contract_tiles` | `paper_outputs/figures/fig_task_contract_tiles.pdf` | Static contract-map figure used by the manuscript. |
13
- | `fig:selection_regret_diagnostic` | `paper_outputs/figures/fig_selection_regret_rq2.tikz` | Rebuilt by `scripts/build_selection_regret_rq2_figure.py` from `artifacts/results/selection_regret_scope_sweep_20260505.csv`. |
14
- | `fig:fireprone_contract_progression` | `paper_outputs/figures/fig_fireprone_contract_progression_compact.pdf` | Rebuilt by `scripts/build_fireprone_contract_progression_figure.py` from `artifacts/results/fireprone_contract_progression_summary.json`. |
15
- | `fig:task_comparator_normalized_map` | `paper_outputs/figures/fig_task_rank_map.pdf` | Rebuilt by `scripts/build_task_rank_map.py` from `tab_primary_results.tex` and `tab_supporting_results.tex`. |
 
 
16
 
17
- ## Main Tables
18
 
19
- | Paper label | Release file | Provenance |
20
- |---|---|---|
21
- | `tab:primary_results` | `paper_outputs/tables/tab_primary_results.tex` | Frozen paper-output TeX extracted from the current manuscript source and verified by checksum. Raw reruns require the task scripts and non-redistributed feature caches. |
22
- | `tab:supporting_results` | `paper_outputs/tables/tab_supporting_results.tex` | Frozen paper-output TeX extracted from the current manuscript source and verified by checksum. Raw reruns require the task scripts and non-redistributed feature caches. |
23
-
24
- ## Appendix Tables
25
 
26
- | Paper label | Release file | Provenance |
27
- |---|---|---|
28
- | `tab:app_matching_rule_params` | `paper_outputs/tables/tab_app_matching_rule_params.tex` | Contract parameter table from manuscript source, verified by checksum. |
29
- | `tab:app_contract_params_full` | `paper_outputs/tables/tab_app_contract_params_full.tex` | Contract parameter table from manuscript source, verified by checksum. |
30
- | `tab:app_scope_params` | `paper_outputs/tables/tab_app_scope_params.tex` | Scope parameter table from manuscript source, verified by checksum. |
31
- | `tab:fireprone_contract_progression` | `paper_outputs/tables/tab_fireprone_contract_progression.tex` | Values from `artifacts/results/fireprone_contract_progression_summary.json`. |
32
- | `tab:appendix_selection_regret_tolerance` | `paper_outputs/tables/tab_appendix_selection_regret_tolerance.tex` | Values from selection-regret summary artifacts. |
33
- | `tab:app_occupancy_ppr_scope` | `paper_outputs/tables/tab_app_occupancy_ppr_scope.tex` | Values from `artifacts/results/fireprone_contract_progression_summary.json`. |
34
- | `tab:app_spread_ap_by_scope` | `paper_outputs/tables/tab_app_spread_ap_by_scope.tex` | Frozen paper-output TeX extracted from current manuscript source, verified by checksum. |
35
- | `tab:app_burned_area_median_acre` | `paper_outputs/tables/tab_app_burned_area_median_acre.tex` | Frozen paper-output TeX extracted from current manuscript source, verified by checksum. |
36
- | `tab:app_analog_rank_depth` | `paper_outputs/tables/tab_app_analog_rank_depth.tex` | Frozen paper-output TeX extracted from current manuscript source, verified by checksum. |
37
- | `tab:app_smoke_high_event` | `paper_outputs/tables/tab_app_smoke_high_event.tex` | Frozen paper-output TeX extracted from current manuscript source, verified by checksum. |
38
- | `tab:app_heat_event_pr` | `paper_outputs/tables/tab_app_heat_event_pr.tex` | Frozen paper-output TeX extracted from current manuscript source, verified by checksum. |
39
- | `tab:app_seed_robustness` | `paper_outputs/tables/tab_app_seed_robustness.tex` | Seed summary table from manuscript source, verified by checksum. |
40
- | `tab:app_head_architectures` | `paper_outputs/tables/tab_app_head_architectures.tex` | Architecture description table from manuscript source, verified by checksum. |
41
-
42
- ## Reproduction Commands
43
 
44
  ```bash
45
  python3 scripts/reproduce_paper_outputs.py
46
  ```
47
 
48
- This command rebuilds the outputs that depend only on released summary files,
49
- checks all final paper-output hashes, and runs the release audit.
50
-
51
- ## Raw Rerun Boundary
52
-
53
- Some tables depend on raw gridded data, event data, or backbone feature caches
54
- that are not redistributed. For public release, we provide the compact summary
55
- artifacts used to reproduce the displayed paper values and document the raw data
56
- sources separately.
 
1
+ # Public Artifact Map
2
 
3
+ This map describes the public Hugging Face release boundary. Manuscript TeX,
4
+ BibTeX, table TeX, TikZ source, and paper PDFs are intentionally excluded.
 
5
 
6
+ ## Included Public Artifacts
7
 
8
+ | Area | Release files | Notes |
9
  |---|---|---|
10
+ | Model code | `models/wildfire_fm/modeling_unet.py` | Compact U-Net used by the released checkpoints. |
11
+ | Checkpoint metadata | `models/wildfire_fm/checkpoint_manifest.json` | Lists five seeded checkpoint paths, SHA-256 hashes, and byte sizes. |
12
+ | Figure previews | `assets/*.png`, `assets/*.svg` | Hub-page visuals and final-paper figure previews. |
13
+ | Figure PDFs | `paper_outputs/figures/*.pdf` | Selected final-paper figures retained for visual reproducibility. |
14
+ | Numeric summaries | `artifacts/results/*.csv`, `artifacts/results/*.json` | Sanitized compact summaries; local machine paths removed. |
15
+ | Data notes | `data_sources/DATA_SOURCES.md` | Source roles and access entry points; raw data are not redistributed. |
16
+ | Raw rerun references | `experiments/` | Sanitized scripts/templates requiring user-provided data and paths. |
17
 
18
+ ## Excluded Manuscript Artifacts
19
 
20
+ The release does not include `paper/`, `paper_outputs/tables/`, generated table
21
+ TeX, `.tikz`, `.bib`, or manuscript PDF files. The public arXiv paper should be
22
+ linked separately after finalization.
 
 
 
23
 
24
+ ## Verification
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
 
26
  ```bash
27
  python3 scripts/reproduce_paper_outputs.py
28
  ```
29
 
30
+ This command checks public artifact hashes and audits that manuscript/source
31
+ artifacts and local paths are absent.
 
 
 
 
 
 
 
docs/huggingface_release_design.md CHANGED
@@ -12,11 +12,11 @@ reproducibility material rather than being the main organizing principle.
12
  checkpoint locations, quick loading code, data-source boundaries, limitations,
13
  and citation text.
14
  - `assets/` contains lightweight visuals for the Hub page plus PNG previews of
15
- final-paper figures.
16
  - `models/wildfire_fm/` contains model code, manifests, and checkpoint metadata.
17
- - `paper_outputs/` stores final TeX, TikZ, and PDF artifacts used by the
18
- manuscript.
19
- - `artifacts/results/` stores compact CSV/JSON summaries that can be public.
20
  - `data_sources/` documents external data resources without redistributing them.
21
  - `experiments/` contains raw-rerun reference scripts and Slurm templates.
22
 
@@ -25,3 +25,9 @@ reproducibility material rather than being the main organizing principle.
25
  The repository is a model release with reproducibility artifacts, not a raw-data
26
  mirror. Full raw-data reruns require separately obtained source data, local
27
  feature caches, and cluster-specific paths.
 
 
 
 
 
 
 
12
  checkpoint locations, quick loading code, data-source boundaries, limitations,
13
  and citation text.
14
  - `assets/` contains lightweight visuals for the Hub page plus PNG previews of
15
+ final-paper figure previews.
16
  - `models/wildfire_fm/` contains model code, manifests, and checkpoint metadata.
17
+ - `paper_outputs/` stores selected final-paper figure PDFs only. Manuscript
18
+ TeX, table TeX, TikZ source, BibTeX, and paper PDF files are not included.
19
+ - `artifacts/results/` stores sanitized compact CSV/JSON summaries that can be public.
20
  - `data_sources/` documents external data resources without redistributing them.
21
  - `experiments/` contains raw-rerun reference scripts and Slurm templates.
22
 
 
25
  The repository is a model release with reproducibility artifacts, not a raw-data
26
  mirror. Full raw-data reruns require separately obtained source data, local
27
  feature caches, and cluster-specific paths.
28
+
29
+ ## Manuscript Boundary
30
+
31
+ The Hub model release intentionally excludes manuscript TeX, BibTeX, table TeX,
32
+ TikZ source, and paper PDFs. The paper can be linked separately after the public
33
+ arXiv version is finalized.
paper/main.tex DELETED
@@ -1,141 +0,0 @@
1
- % !TeX root = main.tex
2
- % !TeX program = pdflatex
3
- \documentclass{article}
4
- \usepackage[preprint]{neurips_2026}
5
- \usepackage[utf8]{inputenc} % allow utf-8 input
6
- \usepackage[T1]{fontenc} % use 8-bit T1 fonts
7
- \usepackage{hyperref} % hyperlinks
8
- \usepackage{url} % simple URL typesetting
9
- \usepackage{booktabs} % professional-quality tables
10
- \usepackage{amsfonts} % blackboard math symbols
11
- \usepackage{nicefrac} % compact symbols for 1/2, etc.
12
- \usepackage{microtype} % microtypography
13
- \usepackage[table]{xcolor} % colors
14
- \usepackage{placeins}
15
- \usepackage[utf8]{inputenc}
16
- \usepackage[T1]{fontenc}
17
- \usepackage{hyperref}
18
- \setcitestyle{numbers,square}
19
- \definecolor{tocblue}{RGB}{31, 73, 125}
20
- \hypersetup{
21
- colorlinks=false,
22
- citebordercolor=green,
23
- linkbordercolor=green,
24
- urlbordercolor=blue,
25
- pdfauthor={Yangshuang Xu, Yuyang Dai, Liling Chang, Qi Wang, Yushun Dong},
26
- pdftitle={Does Your Wildfire Prediction Model Actually Work, or Just Score Well?},
27
- pdfsubject={},
28
- pdfkeywords={}
29
- }
30
-
31
- \usepackage{url}
32
- \usepackage{booktabs}
33
- \usepackage{amsfonts}
34
- \usepackage{nicefrac}
35
- \usepackage{microtype}
36
- \usepackage{amsmath}
37
- \usepackage{amssymb}
38
- \usepackage{graphicx}
39
- \usepackage{tabularx}
40
- \usepackage{longtable}
41
- \usepackage{multirow}
42
- \usepackage{array}
43
- \usepackage{float}
44
- \usepackage{adjustbox}
45
- \usepackage{placeins}
46
- \usepackage{enumitem}
47
- \usepackage{siunitx}
48
- \usepackage{tikz}
49
- \usepackage{subcaption}
50
- \usepackage{wrapfig}
51
- \usepackage[normalem]{ulem}
52
- \usepackage{pifont}
53
- \usepackage{hyperref}
54
- \usepackage{xcolor}
55
- \usepackage{tabularx}
56
- \usepackage{xspace}
57
-
58
-
59
- \sisetup{detect-all}
60
-
61
- \definecolor{wfblue}{RGB}{42,111,151}
62
- \definecolor{wforange}{RGB}{231,111,81}
63
- \definecolor{wfgreen}{RGB}{42,157,143}
64
- \definecolor{wfgold}{RGB}{233,196,106}
65
- \definecolor{wfslate}{RGB}{38,70,83}
66
- \definecolor{wfgray}{RGB}{108,117,125}
67
- \definecolor{wfpurple}{RGB}{116,81,164}
68
- \definecolor{wfindigo}{RGB}{77,100,166}
69
- \definecolor{wfrose}{RGB}{188,80,144}
70
- \definecolor{wfolive}{RGB}{120,143,64}
71
- \definecolor{primarybg}{RGB}{219,234,254}
72
- \definecolor{primaryrule}{RGB}{147,197,253}
73
- \definecolor{headerbg}{RGB}{30,58,138}
74
- \definecolor{regbg}{RGB}{240,240,240}
75
- \definecolor{retrbg}{RGB}{232,232,232}
76
- \definecolor{refbg}{RGB}{255,243,205} % amber – reference row
77
- \definecolor{alphabg}{RGB}{220,237,220} % green – AlphaEarth row
78
- \definecolor{subheadbg}{RGB}{241,245,249} % near-white – column subheader
79
- \definecolor{bestval}{RGB}{0,100,0} % dark green – best frozen value
80
- \definecolor{warnval}{RGB}{180,0,0} % dark red – anomalous value
81
-
82
- \newcolumntype{L}[1]{>{\raggedright\arraybackslash}p{#1}}
83
- \newcolumntype{Y}{>{\raggedright\arraybackslash}X}
84
- \newcommand{\ms}[2]
85
- {\ensuremath{#1{\mkern1mu}_{\scriptscriptstyle \pm #2}}}
86
- \newcommand{\msb}[2]{\ensuremath{\mathbf{#1 \pm #2}}}
87
-
88
- \newcommand{\best}[1]{\textbf{#1}}
89
- \newcommand{\ourfm}{\textsc{Wild}{\textbf{FIRE}}\textsc{-FM}\xspace}
90
- % \title{Ranking Is Not Decision Quality: Evaluation Contracts for Wildfire-Centric Transfer}
91
- \title{Does Your Wildfire Prediction Model Actually Work,\\ or Just Score Well?}
92
-
93
-
94
- \author{%
95
- Yangshuang Xu\thanks{Equal contribution.} \\
96
- Florida State University \\
97
- \texttt{yx21e@fsu.edu} \\
98
- \And
99
- Yuyang Dai\footnotemark[1] \\
100
- Florida State University \\
101
- \texttt{yd26@fsu.edu} \\
102
- \And
103
- Liling Chang \\
104
- Florida State University \\
105
- \texttt{liling.chang@fsu.edu} \\
106
- \And
107
- Qi Wang \\
108
- Northeastern University \\
109
- \texttt{wangqi@vt.edu} \\
110
- \And
111
- Yushun Dong\thanks{Corresponding author.} \\
112
- Florida State University \\
113
- \texttt{yd24f@fsu.edu} \\
114
- }
115
-
116
- \begin{document}
117
-
118
- \maketitle
119
-
120
- \input{sections/0_abstract}
121
- \input{sections/1_intro}
122
- \input{sections/2_backbone}
123
- % \input{sections/background}
124
- \input{sections/3_prelim}
125
- % \input{sections/methodology}
126
- \input{sections/4_experiments}
127
- \input{sections/5_conclusion}
128
-
129
- % \bibliographystyle{abbrvnat}
130
- \bibliographystyle{plain}
131
- \bibliography{references}
132
-
133
- \newpage
134
- \input{sections/appendix}
135
-
136
- \clearpage
137
- \input{checklist_filled}
138
- \clearpage
139
-
140
-
141
- \end{document}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
paper/manuscript_final.pdf DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:c342978b2f0f25cf6e430b860702895bbb3b512145c8c6e38aa2233b416d835e
3
- size 297362
 
 
 
 
paper/references.bib DELETED
@@ -1,465 +0,0 @@
1
- @article{lam2023graphcast,
2
- title = {Learning skillful medium-range global weather forecasting},
3
- author = {Lam, Remi and Sanchez-Gonzalez, Alvaro and Willson, Matthew and Wirnsberger, Peter and Fortunato, Meire and Alet, Ferran and Ravuri, Suman and Ewalds, Timo and Eaton-Rosen, Zach and Hu, Weihua and others},
4
- journal = {Science},
5
- volume = {382},
6
- number = {6677},
7
- pages = {1416--1421},
8
- year = {2023}
9
- }
10
-
11
- @article{ebert2009neighborhood,
12
- title = {Neighborhood Verification: A Strategy for Rewarding Close Forecasts},
13
- author = {Ebert, Elizabeth E.},
14
- journal = {Weather and Forecasting},
15
- volume = {24},
16
- number = {6},
17
- pages = {1498--1510},
18
- year = {2009},
19
- doi = {10.1175/2009WAF2222251.1}
20
- }
21
-
22
- @inproceedings{ronneberger2015unet,
23
- title = {U-Net: Convolutional Networks for Biomedical Image Segmentation},
24
- author = {Ronneberger, Olaf and Fischer, Philipp and Brox, Thomas},
25
- booktitle = {Medical Image Computing and Computer-Assisted Intervention},
26
- pages = {234--241},
27
- year = {2015},
28
- doi = {10.1007/978-3-319-24574-4_28}
29
- }
30
-
31
- @misc{noaa_hrrr_ncei,
32
- title = {{Rapid Refresh / High-Resolution Rapid Refresh}},
33
- author = {{NOAA National Centers for Environmental Information}},
34
- year = {2026},
35
- howpublished = {\url{https://www.ncei.noaa.gov/products/weather-climate-models/rapid-refresh-update}},
36
- note = {Accessed: 2026-05-05}
37
- }
38
-
39
- @misc{noaa_hrrr_emc,
40
- title = {{High-Resolution Rapid Refresh (HRRR)}},
41
- author = {{NOAA National Centers for Environmental Prediction Environmental Modeling Center}},
42
- year = {2026},
43
- howpublished = {\url{https://rapidrefresh.noaa.gov/hrrr/}},
44
- note = {Accessed: 2026-05-05}
45
- }
46
-
47
- @misc{nasa_firms,
48
- title = {{Fire Information for Resource Management System (FIRMS)}},
49
- author = {{NASA Earthdata}},
50
- year = {2026},
51
- howpublished = {\url{https://www.earthdata.nasa.gov/data/tools/firms}},
52
- note = {Accessed: 2026-05-05}
53
- }
54
-
55
- @misc{landfire_fbfm40,
56
- title = {{LANDFIRE 40 Fire Behavior Fuel Models}},
57
- author = {{LANDFIRE}},
58
- year = {2026},
59
- howpublished = {\url{https://landfire.gov/fuel/fbfm40}},
60
- note = {Accessed: 2026-05-05}
61
- }
62
-
63
- @misc{landfire_canopy_cover,
64
- title = {{LANDFIRE Forest Canopy Cover}},
65
- author = {{LANDFIRE}},
66
- year = {2026},
67
- howpublished = {\url{https://landfire.gov/fuel/cc}},
68
- note = {Accessed: 2026-05-05}
69
- }
70
-
71
- @misc{usfs_wrc_housing_density,
72
- title = {{Wildfire Risk to Communities: Housing Unit Density Image Service}},
73
- author = {{USDA Forest Service}},
74
- year = {2026},
75
- howpublished = {\url{https://catalog.data.gov/dataset/wildfire-risk-to-communities-housing-unit-density-image-service-fac22}},
76
- note = {Accessed: 2026-05-05}
77
- }
78
-
79
- @misc{ornl_landscan_2024,
80
- title = {{LandScan Global 2024}},
81
- author = {{Oak Ridge National Laboratory}},
82
- year = {2024},
83
- howpublished = {\url{https://landscan.ornl.gov/}},
84
- note = {Accessed: 2026-05-05}
85
- }
86
-
87
- @misc{nifc_wfigs_perimeters,
88
- title = {{Wildland Fire Interagency Geospatial Services (WFIGS): Current Perimeters}},
89
- author = {{National Interagency Fire Center}},
90
- year = {2026},
91
- howpublished = {\url{https://data-nifc.opendata.arcgis.com/datasets/nifc::wfigs-current-perimeters/about}},
92
- note = {Accessed: 2026-05-05}
93
- }
94
-
95
- @misc{mtbs_usgs_2025,
96
- title = {{Monitoring Trends in Burn Severity (MTBS)}},
97
- author = {{U.S. Geological Survey and USDA Forest Service}},
98
- year = {2025},
99
- howpublished = {\url{https://www.mtbs.gov/}},
100
- note = {Accessed: 2026-05-05}
101
- }
102
-
103
- @article{pickell2017early,
104
- title={An early warning system to forecast the close of the spring burning window from satellite-observed greenness},
105
- author={Pickell, Paul D and Coops, Nicholas C and Ferster, Colin J and Bater, Christopher W and Blouin, Karen D and Flannigan, Mike D and Zhang, Jinkai},
106
- journal={Scientific Reports},
107
- volume={7},
108
- number={1},
109
- pages={14190},
110
- year={2017},
111
- publisher={Nature Publishing Group}
112
- }
113
-
114
- @article{khosravi2025assessing,
115
- title={Assessing Pan-Canada wildfire susceptibility by integrating satellite data with novel hybrid deep learning and black widow optimizer algorithms},
116
- author={Khosravi, Khabat and Mosallanejad, Ashkan and Bateni, Sayed M. and Kim, Dongkyun and Jun, Changhyun and Shahvaran, Ali Reza and Farooque, Aitazaz A. and Karbasi, Massoud and Ali, Mumtaz},
117
- journal={Science of the Total Environment},
118
- volume={977},
119
- year={2025},
120
- publisher={Elsevier},
121
- doi={10.1016/j.scitotenv.2025.179369}
122
- }
123
-
124
- @article{vetrita2025drought,
125
- title={Drought and Fine Fuel Moisture Code Evaluation: An Early Warning System for Forest/Land Fire Using Remote Sensing Approach},
126
- author={Vetrita, Yenni and Prasasti, Indah and Haryani, Nanik Suryo and Priyatna, M. and Khomarudin, M. Rokhis},
127
- journal={International Journal of Remote Sensing and Earth Sciences},
128
- volume={9},
129
- number={2},
130
- year={2025},
131
- doi={10.30536/ijreses.v9i2.13954}
132
- }
133
-
134
- @inproceedings{xu2026bcwildfire,
135
- title={BCWildfire: A Long-term Multi-factor Dataset and Deep Learning Benchmark for Boreal Wildfire Risk Prediction},
136
- author={Xu, Zhengsen and Cheng, Sibo and Wang, Lanying and He, Hongjie and Sun, Wentao and Li, Jonathan and Xu, Lincoln Linlin},
137
- booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
138
- volume={40},
139
- number={46},
140
- pages={39486--39494},
141
- year={2026},
142
- doi={10.1609/aaai.v40i46.41299}
143
- }
144
-
145
- @article{krstinic2026spatio,
146
- title={Spatio-temporal data model for early wildfire detection},
147
- author={Krstini{\'c}, Damir and Bejo, Jakov and Sikora, Toma and Bugari{\'c}, Marin},
148
- journal={Fire},
149
- volume={9},
150
- number={4},
151
- pages={175},
152
- year={2026},
153
- doi={10.3390/fire9040175}
154
- }
155
-
156
- @article{kotroni2020disarm,
157
- title={DISARM Early Warning System for Wildfires in the Eastern Mediterranean},
158
- author={Kotroni, Vassiliki and Cartalis, Constantinos and Michaelides, Silas and Stoyanova, Julia and Tymvios, Filippos and Bezes, Antonis and Christoudias, Theodoros and Dafis, Stavros and Giannakopoulos, Christos and Giannaros, Theodore M. and others},
159
- journal={Sustainability},
160
- volume={12},
161
- number={16},
162
- pages={6670},
163
- year={2020},
164
- publisher={MDPI},
165
- doi={10.3390/su12166670}
166
- }
167
-
168
- @article{ts2025satfire,
169
- title={TS-SatFire: A Multi-Task Satellite Image Time-Series Dataset for Wildfire Detection and Prediction},
170
- author={TS-SatFire Consortium},
171
- journal={Scientific Data},
172
- volume={12},
173
- pages={1817},
174
- year={2025},
175
- doi={10.1038/s41597-025-06271-3}
176
- }
177
-
178
- @article{nujjoo2025modelling,
179
- title={Modelling Spatial-Temporal Wildfire Susceptibility Using Geospatial Techniques Over the Table Mountain Nature Reserve, South Africa},
180
- author={Nujjoo, Syed Tanweer Raza and Odera, Patroba Achola},
181
- journal={Geoplanning: Journal of Geomatics and Planning},
182
- volume={12},
183
- number={2},
184
- pages={197--214},
185
- year={2025},
186
- doi={10.14710/geoplanning.12.2.197-214}
187
- }
188
-
189
- @article{singh2025benchmarking,
190
- title={Benchmarking Artificial Neural Networks and U-Net Convolutional Architectures for Wildfire Susceptibility Prediction: Innovations in Geospatial Intelligence},
191
- author={Singh, Harikesh and Ang, Li-Minn and Srivastava, Sanjeev Kumar},
192
- journal={IEEE Transactions on Geoscience and Remote Sensing},
193
- volume={63},
194
- year={2025},
195
- publisher={IEEE},
196
- doi={10.1109/TGRS.2025.3529134}
197
- }
198
-
199
- @inproceedings{goldammer1999early,
200
- title={Early warning systems for the prediction of and appropriate response to wildfires and related environmental hazards},
201
- author={Goldammer, Johann Georg},
202
- booktitle={Early Warning Systems for Natural Disaster Reduction},
203
- year={1999}
204
- }
205
-
206
- @article{huot2022nextday,
207
- title={Next day wildfire prediction using deep learning},
208
- author={Huot, Fantine and Hu, R Lily and Goyal, Nita and Sankar, Tharun and Ihme, Matthias and Chen, Yi-Fan},
209
- journal={arXiv preprint arXiv:2206.08930},
210
- year={2022}
211
- }
212
-
213
- @inproceedings{radke2019firecast,
214
- title={FireCast: Leveraging deep learning to predict wildfire spread},
215
- author={Radke, David and Hessler, Anna and Ellsworth, David},
216
- booktitle={Proceedings of the 28th International Joint Conference on Artificial Intelligence},
217
- pages={4575--4581},
218
- year={2019}
219
- }
220
-
221
- @article{nguyen2023climax,
222
- title={ClimaX: A foundation model for weather and climate},
223
- author={Nguyen, Tung and Brandstetter, Johannes and Kapoor, Aditya and Gupta, Jayesh K and Grover, Aditya},
224
- journal={arXiv preprint arXiv:2301.10343},
225
- year={2023}
226
- }
227
-
228
- @article{reed2023scalemae,
229
- title={Scale-MAE: A scale-aware masked autoencoder for multiscale geospatial representation learning},
230
- author={Reed, Colorado J and Gupta, Ritwik and Li, Shufan and Brockman, Sarah and Funk, Christopher and Clipp, Brian and Keutzer, Kurt and Ermon, Stefano and Salakhutdinov, Ruslan},
231
- journal={arXiv preprint arXiv:2212.14532},
232
- year={2023}
233
- }
234
-
235
- @article{bi2023panguweather,
236
- title={Pangu-Weather: A 3D high-resolution model for fast and accurate global weather forecast},
237
- author={Bi, Kaifeng and Xie, Lingxi and Zhang, Hengheng and Chen, Xin and Gu, Xiaotao and Tian, Qi},
238
- journal={Nature},
239
- volume={619},
240
- number={7970},
241
- pages={533--538},
242
- year={2023},
243
- publisher={Nature Publishing Group}
244
- }
245
-
246
- @article{cong2022satmae,
247
- title={SatMAE: Pre-training transformers for temporal and multi-spectral satellite imagery},
248
- author={Cong, Yezhen and Khanna, Samar and Meng, Chenlin and Liu, Patrick and Rozi, Erik and He, Yutong and Burke, Marshall and Lobell, David and Ermon, Stefano},
249
- journal={Advances in Neural Information Processing Systems},
250
- volume={35},
251
- pages={197--211},
252
- year={2022}
253
- }
254
-
255
- @article{guo2024skysense,
256
- title={SkySense: A multi-modal remote sensing foundation model towards universal interpretation},
257
- author={Guo, Xin and Lao, Jianwei and Dang, Bo and Zhang, Yingying and Yu, Lei and Zhang, Ruixiang and Zhan, Siyu and Li, Wei and Hao, Yonggang and Zhang, Shuai and others},
258
- journal={arXiv preprint arXiv:2403.11916},
259
- year={2024}
260
- }
261
-
262
- @article{bodnar2025aurora,
263
- title={Aurora: A foundation model of the atmosphere},
264
- author={Bodnar, Cristian and others},
265
- journal={arXiv preprint arXiv:2405.13063},
266
- year={2024}
267
- }
268
-
269
- @article{schmude2024prithviwxc,
270
- title={Prithvi WxC: Foundation model for weather and climate},
271
- author={Schmude, Johannes and Roy, Sujit and Trofimova, Paulina and Ramesh, Karthik and Lusch, Bethany and Kesa, Harikumar and Singh, Shraddha and Chen, Phil and Liu, Zhuohan and Parashar, Shubhankar and others},
272
- journal={arXiv preprint arXiv:2409.13598},
273
- year={2024}
274
- }
275
-
276
- @article{farahmand2020fdeo,
277
- title={Introducing spatially distributed fire danger from earth observations (FDEO) using satellite-based data in the contiguous United States},
278
- author={Farahmand, Alireza and Stavros, E Natasha and Reager, John T and Behrangi, Ali},
279
- journal={Remote Sensing},
280
- volume={12},
281
- number={8},
282
- pages={1252},
283
- year={2020},
284
- publisher={MDPI}
285
- }
286
-
287
- @article{gilleland2009intercomparison,
288
- title={Intercomparison of spatial forecast verification methods},
289
- author={Gilleland, Eric and Ahijevych, David and Brown, Barbara G and Ebert, Elizabeth E},
290
- journal={Weather and Forecasting},
291
- volume={24},
292
- number={5},
293
- pages={1416--1430},
294
- year={2009},
295
- publisher={American Meteorological Society}
296
- }
297
-
298
- @article{gilleland2009spatialverification,
299
- title = {Intercomparison of Spatial Forecast Verification Methods},
300
- author = {Gilleland, Eric and Ahijevych, David and Brown, Barbara G. and Casati, Barbara and Ebert, Elizabeth E.},
301
- journal = {Weather and Forecasting},
302
- volume = {24},
303
- number = {5},
304
- pages = {1416--1430},
305
- year = {2009},
306
- doi = {10.1175/2009WAF2222269.1}
307
- }
308
-
309
- @article{bi2023pangu,
310
- title = {Accurate medium-range global weather forecasting with 3D neural networks},
311
- author = {Bi, Kaifeng and Xie, Lingxi and Zhang, Hengheng and Chen, Xin and Gu, Xiaotao and Tian, Qi},
312
- journal = {Nature},
313
- volume = {619},
314
- pages = {533--538},
315
- year = {2023}
316
- }
317
-
318
- @inproceedings{reed2023scale,
319
- title = {Scale-{MAE}: A Scale-Aware Masked Autoencoder for Multiscale Geospatial Representation Learning},
320
- author = {Reed, Colorado J. and Gupta, Ritwik and Li, Shufan and Brockman, Sarah and Funk, Christopher and Clipp, Brian and Keutzer, Kurt and Candido, Salvatore and Uyttendaele, Matt and Darrell, Trevor},
321
- booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision},
322
- pages = {4088--4099},
323
- year = {2023}
324
- }
325
-
326
- @inproceedings{mcdermott2024aurocauprc,
327
- title = {A Closer Look at {AUROC} and {AUPRC} under Class Imbalance},
328
- author = {McDermott, Matthew B. and Zhang, Haoran and Hansen, Lasse Hyldig and Angelotti, Giovanni and Gallifant, Jack},
329
- booktitle = {Advances in Neural Information Processing Systems},
330
- year = {2024}
331
- }
332
-
333
- @inproceedings{lacoste2023geobench,
334
- title = {{GEO-Bench}: Toward Foundation Models for Earth Monitoring},
335
- author = {Lacoste, Alexandre and Lehmann, Nils and Rodriguez, Pau and Sherwin, Evan and Kerner, Hannah and L{\"u}tjens, Bj{\"o}rn and Irvin, Jeremy and Dao, David and Alemohammad, Hamed and Drouin, Alexandre and others},
336
- booktitle = {Advances in Neural Information Processing Systems},
337
- year = {2023}
338
- }
339
-
340
- @article{rasp2024weatherbench2,
341
- title = {{WeatherBench} 2: A benchmark for the next generation of data-driven global weather models},
342
- author = {Rasp, Stephan and Hoyer, Stephan and Merose, Alex and Langmore, Ian and Battaglia, Peter and Russell, Tyler and Sanchez-Gonzalez, Alvaro and Yang, Vivian and Carver, Rob and Agrawal, Shreya and others},
343
- journal = {Journal of Advances in Modeling Earth Systems},
344
- volume = {16},
345
- number = {6},
346
- pages = {e2023MS004019},
347
- year = {2024}
348
- }
349
-
350
- @inproceedings{koh2021wilds,
351
- title = {{WILDS}: A benchmark of in-the-wild distribution shifts},
352
- author = {Koh, Pang Wei and Sagawa, Shiori and Marklund, Henrik and Xie, Sang Michael and Zhang, Marvin and Balsubramani, Akshay and Hu, Weihua and Yasunaga, Michihiro and Phillips, Richard Lanas and Gao, Irena and others},
353
- booktitle = {Proceedings of the International Conference on Machine Learning},
354
- pages = {5637--5664},
355
- year = {2021}
356
- }
357
-
358
- @inproceedings{yeh2021sustainbench,
359
- title = {{SustainBench}: Benchmarks for Monitoring the Sustainable Development Goals with Machine Learning},
360
- author = {Yeh, Christopher and Meng, Chenlin and Wang, Sijing and Driscoll, Anne and Rozi, Erik and Liu, Peng and Lee, Jae Yong and Burke, Marshall and Lobell, David B. and Ermon, Stefano},
361
- booktitle = {Advances in Neural Information Processing Systems},
362
- year = {2021}
363
- }
364
-
365
- @inproceedings{torchgeo2022,
366
- title = {TorchGeo: Deep Learning With Geospatial Data},
367
- author = {Stewart, Adam J. and Robinson, Caleb and Corley, Isaac A. and Ortiz, Anthony and Lavista Ferres, Juan M. and Banerjee, Arindam},
368
- booktitle = {Proceedings of the ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems},
369
- year = {2022}
370
- }
371
-
372
- @inproceedings{schaeffer2023mirage,
373
- title = {Are Emergent Abilities of Large Language Models a Mirage?},
374
- author = {Schaeffer, Rylan and Miranda, Brando and Koyejo, Sanmi},
375
- booktitle = {Advances in Neural Information Processing Systems},
376
- year = {2023}
377
- }
378
-
379
- @inproceedings{luth2023activelearning,
380
- title = {Navigating the Pitfalls of Active Learning Evaluation: A Systematic Framework for Meaningful Performance Assessment},
381
- author = {L{\"u}th, Carsten T. and Bungert, Till J. and Klein, Lukas and J{\"a}ger, Paul F.},
382
- booktitle = {Advances in Neural Information Processing Systems},
383
- year = {2023}
384
- }
385
-
386
- @inproceedings{traub2024selectiveclassification,
387
- title = {Overcoming Common Flaws in the Evaluation of Selective Classification Systems},
388
- author = {Traub, Jeremias and Bungert, Till J. and L{\"u}th, Carsten T. and Baumgartner, Michael and Maier-Hein, Klaus H. and Maier-Hein, Lena and J{\"a}ger, Paul F.},
389
- booktitle = {Advances in Neural Information Processing Systems},
390
- year = {2024}
391
- }
392
-
393
-
394
- # new---4.30
395
- @article{pathak2024stormcast,
396
- title={Kilometer-scale convection-allowing model emulation using generative diffusion modeling},
397
- author={Pathak, Jaideep and Cohen, Yair and Garg, Piyush and Harrington, Peter and Brenowitz, Noah and Durran, Dale and Mardani, Morteza and Vahdat, Arash and Xu, Shaoming and Kashinath, Karthik and others},
398
- journal={Science Advances},
399
- volume={12},
400
- number={5},
401
- pages={eadv0423},
402
- year={2026},
403
- publisher={American Association for the Advancement of Science}
404
- }
405
-
406
- @article{brown2025alphaearth,
407
- title={Alphaearth foundations: An embedding field model for accurate and efficient global mapping from sparse label data},
408
- author={Brown, Christopher F and Kazmierski, Michal R and Pasquarella, Valerie J and Rucklidge, William J and Samsikova, Masha and Zhang, Chenhui and Shelhamer, Evan and Lahera, Estefania and Wiles, Olivia and Ilyushchenko, Simon and others},
409
- journal={arXiv preprint arXiv:2507.22291},
410
- year={2025}
411
- }
412
-
413
- @article{weyn2020dlwp,
414
- title={Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere},
415
- author={Weyn, Jonathan A and Durran, Dale R and Caruana, Rich},
416
- journal={Journal of Advances in Modeling Earth Systems},
417
- volume={12},
418
- number={9},
419
- pages={e2020MS002109},
420
- year={2020},
421
- publisher={Wiley Online Library}
422
- }
423
-
424
- @article{pathak2022fourcastnet,
425
- title={Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators},
426
- author={Pathak, Jaideep and Subramanian, Shashank and Harrington, Peter and Raja, Sanjeev and Chattopadhyay, Ashesh and Mardani, Morteza and Kurth, Thorsten and Hall, David and Li, Zongyi and Azizzadenesheli, Kamyar and others},
427
- journal={arXiv preprint arXiv:2202.11214},
428
- year={2022}
429
- }
430
-
431
- @article{chen2023fengwu,
432
- title={Fengwu: Pushing the skillful global medium-range weather forecast beyond 10 days lead},
433
- author={Chen, Kang and Han, Tao and Gong, Junchao and Bai, Lei and Ling, Fenghua and Luo, Jing-Jia and Chen, Xi and Ma, Leiming and Zhang, Tianning and Su, Rui and others},
434
- journal={arXiv preprint arXiv:2304.02948},
435
- year={2023}
436
- }
437
-
438
- @article{chen2023fuxi,
439
- title={FuXi: A cascade machine learning forecasting system for 15-day global weather forecast},
440
- author={Chen, Lei and Zhong, Xiaohui and Zhang, Feng and Cheng, Yuan and Xu, Yinghui and Qi, Yuan and Li, Hao},
441
- journal={npj climate and atmospheric science},
442
- volume={6},
443
- number={1},
444
- pages={190},
445
- year={2023},
446
- publisher={Nature Publishing Group UK London}
447
- }
448
-
449
- @article{marsocci2024pangaea,
450
- title={Pangaea: A global and inclusive benchmark for geospatial foundation models},
451
- author={Marsocci, Valerio and Jia, Yuru and Bellier, Georges Le and Kerekes, David and Zeng, Liang and Hafner, Sebastian and Gerard, Sebastian and Brune, Eric and Yadav, Ritu and Shibli, Ali and others},
452
- journal={arXiv preprint arXiv:2412.04204},
453
- year={2024}
454
- }
455
-
456
- @article{gerard2023wildfirespreadts,
457
- title={Wildfirespreadts: A dataset of multi-modal time series for wildfire spread prediction},
458
- author={Gerard, Sebastian and Zhao, Yu and Sullivan, Josephine},
459
- journal={Advances in Neural Information Processing Systems},
460
- volume={36},
461
- pages={74515--74529},
462
- year={2023}
463
- }
464
-
465
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
paper/sections/0_abstract.tex DELETED
@@ -1,4 +0,0 @@
1
- \begin{abstract}
2
- Wildfire prediction is important for early warning and resource allocation, yet existing Earth foundation models (Earth FMs) are pretrained for general atmospheric and geophysical objectives rather than wildfire forecasting. To address this gap, we introduce \ourfm, the first foundation model pretrained specifically for wildfire prediction using weather, active-fire observations, topography, vegetation, and static environmental data. However, introducing a domain-specific backbone alone does not solve the evaluation problem: wildfire events are sparse in space and time, making transfer conclusions highly sensitive to matching rules and evaluation settings.
3
- To address this problem, we introduce a fixed-contract evaluation framework with two controlled checks: a fixed-output check for matching-rule effects and a fixed-feature check for head-selection effects. Under matched contracts, we compare \ourfm\ with ten Earth-FM baselines across occupancy, spread, retrieval, and regression tasks. Our results show that wildfire transfer conclusions depend strongly on evaluation design and task formulation. We hope this framework and \ourfm\ provide a foundation for future wildfire-specific Earth-FM research and benchmarking. Our code is available at https://anonymous.4open.science/r/Wildfire-fm-evaluation-contracts-5AE9/.
4
- \end{abstract}
 
 
 
 
 
paper/sections/1_intro.tex DELETED
@@ -1,77 +0,0 @@
1
- \section{Introduction}
2
-
3
- Wildfire prediction is critical for early warning and resource allocation in disaster response~\cite{goldammer1999early, farahmand2020fdeo}. As extreme fire events grow more frequent and severe, accurate forecasting of wildfire occurrence and spread is becoming increasingly important~\cite{pickell2017early, kotroni2020disarm}. Recent Earth foundation models (Earth FMs), pretrained on large-scale atmospheric and geophysical data~\cite{bodnar2025aurora, schmude2024prithviwxc, nguyen2023climax}, provide transferable representations for Earth-system dynamics and have shown strong performance across weather and remote-sensing tasks. However, wildfire dynamics depend on complex interactions among weather, vegetation, topography, fuel conditions, and active-fire behavior, which are not explicitly modeled during pretraining in existing general-purpose Earth FMs. This mismatch raises a natural question: can representations learned for general atmospheric or geophysical objectives transfer reliably to wildfire forecasting, and how should we measure that transfer?
4
-
5
- Answering this question requires solving two intertwined problems. The first is that existing Earth FMs are not pretrained specifically for wildfire dynamics, but instead adapted to wildfire tasks after general-purpose pretraining. To address this limitation, we introduce \textbf{\ourfm}, the first foundation model pretrained specifically for wildfire prediction using fire-relevant multimodal data, including regional weather dynamics, active-fire observations, topography, vegetation, and static environmental context. By incorporating wildfire-specific signals directly during pretraining, \ourfm\ learns representations aligned with the physical processes underlying fire behavior rather than relying on transfer from general atmospheric or geophysical objectives.
6
-
7
- The second problem is evaluation: even with a domain-specific model, reliably comparing \ourfm\ against transferred general-purpose Earth FMs remains difficult. Wildfire events are sparse in space and time~\cite{ebert2009neighborhood, gilleland2009intercomparison}, making transfer conclusions highly sensitive to three sources of evaluation variability.
8
- \textit{First,} matching rules determine what counts as a correct prediction. Early warning systems tolerate spatial offsets that post-fire damage assessment cannot, so different matching rules can produce substantially different F1 scores from the same model outputs~\cite{ebert2009neighborhood, gilleland2009intercomparison}.
9
- \textit{Second,} head-selection metrics determine which lightweight adapter is chosen on top of a frozen representation. Ranking metrics such as PR-AUC and decision metrics such as F1 can favor different heads from the same frozen features~\cite{mcdermott2024aurocauprc, traub2024selectiveclassification}.
10
- \textit{Third,} wildfire task forms operate over different prediction units and metric families. Occupancy prediction, spread forecasting, burned-area regression, and analog retrieval therefore produce scores that are not directly comparable even under the same backbone~\cite{schaeffer2023mirage, gerard2023wildfirespreadts}.
11
- Related protocol sensitivity has also been observed in active learning~\cite{luth2023activelearning} and selective classification~\cite{traub2024selectiveclassification}. We show that these effects become particularly severe in wildfire transfer evaluation, where sparse events and heterogeneous task forms amplify evaluation instability~\cite{marsocci2024pangaea}.
12
-
13
- \begin{figure*}
14
- \centering
15
- \includegraphics[width=0.96\linewidth]{figures/overview_wildfire.pdf}
16
- \caption{Overview of \textbf{\ourfm} and \textbf{Evaluation Protocol} in this paper.}
17
- \vspace{-8mm}
18
- \label{fig:overview}
19
- \end{figure*}
20
-
21
-
22
- This evaluation instability makes reliable comparison between
23
- \ourfm\ and existing Earth FMs fundamentally difficult. Standard
24
- geospatial benchmarks such as GEO-Bench~\cite{lacoste2023geobench},
25
- WeatherBench2~\cite{rasp2024weatherbench2}, WILDS~\cite{koh2021wilds},
26
- SustainBench~\cite{yeh2021sustainbench}, and
27
- TorchGeo~\cite{torchgeo2022} standardize datasets, splits, and
28
- metrics, but do so for tasks with dense, balanced labels where
29
- matching-rule sensitivity is not a primary concern. Wildfire
30
- studies such as FireCast~\cite{radke2019firecast} and Next Day
31
- Wildfire Spread~\cite{huot2022nextday} apply the same
32
- report-and-compare paradigm directly to sparse fire events
33
- without controlling for matching-rule choice, head-selection
34
- metric, or task-form comparability, the three sources of
35
- instability identified above. As a result, scores reported under
36
- different implicit protocol choices are not directly comparable,
37
- even when the underlying predictions are identical.
38
-
39
- Based on the \textit{limitations of prior work}, our contributions are as follows (see Figure~\ref{fig:overview}).
40
- \begin{itemize}
41
-
42
- \item \textbf{Wildfire-specific foundation model.}
43
- We introduce \ourfm, the first foundation model pretrained
44
- specifically for wildfire prediction using multimodal wildfire
45
- data spanning weather, active-fire observations, topography,
46
- vegetation, and static environmental context. Unlike general
47
- Earth FMs adapted after pretraining, \ourfm learns wildfire
48
- representations directly from fire-relevant processes during
49
- pretraining.
50
-
51
- \item \textbf{Fixed-contract evaluation framework.}
52
- We formulate wildfire Earth-FM transfer as a fixed-contract
53
- evaluation problem, defining a contract
54
- $\mathcal{C} = (\mathcal{T}, M, \Lambda, \Omega, \mathcal{A})$
55
- that specifies the task, metric, matching rule, evaluation
56
- scope, and lightweight-head family before comparison. We
57
- introduce two controlled checks: a \emph{fixed-output check}
58
- for matching-rule effects and a \emph{fixed-feature check}
59
- for head-selection effects, enabling evaluation artifacts to
60
- be separated from representation quality.
61
-
62
- \item \textbf{Systematic wildfire transfer benchmark.}
63
- Under fixed contracts, we compare \ourfm\ against ten
64
- general-purpose Earth FMs across six wildfire task forms.
65
- Our results show that wildfire transfer conclusions are
66
- highly sensitive to evaluation design and strongly
67
- task-dependent across occupancy, spread, retrieval, and
68
- regression settings.
69
-
70
- \end{itemize}
71
-
72
- % \begin{figure*}
73
- % \centering
74
- % \includegraphics[width=\linewidth]{figures/overview_wildfire.pdf}
75
- % \caption{Overview of \textbf{\ourfm} and \textbf{Evaluation Protocol} in this paper.}
76
- % \label{fig:overview}
77
- % \end{figure*}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
paper/sections/2_backbone.tex DELETED
@@ -1,39 +0,0 @@
1
- \section{\ourfm Reference Backbone}
2
- \label{Reference_backbone}
3
-
4
- \ourfm is a wildfire-specialized regional backbone trained on fire-relevant multimodal data for wildfire prediction. Existing general-purpose Earth FMs are pretrained for atmospheric and geophysical objectives~\cite{lam2023graphcast}, or for remote-sensing objectives~\cite{reed2023scalemae}, so wildfire-relevant information enters only indirectly through those objectives. In contrast, \ourfm is trained with weather, active-fire observations, topography, vegetation, and static environmental context, so its representation is learned from inputs tied directly to wildfire behavior. This design makes \ourfm a strong wildfire-specific backbone whose features are shaped by signals directly relevant to fire occurrence and spread.
5
- It provides a task-aligned regional model trained directly for wildfire prediction.
6
- It also serves as an empirical anchor for interpreting how transferred Earth FMs behave under matched evaluation contracts. This section describes the data resources and training strategy used to build \ourfm as an in-domain reference backbone. The fixed-contract protocol used to compare it with transferred Earth FMs is defined separately in Section~\ref{sec:eval}.
7
-
8
- \subsection{Data Resources}
9
- We group the resources by their role in the study: dynamic weather inputs, occupancy supervision, static context, and event-level resources for supporting tasks. Source and terms-of-use notes for the external data and model assets used in this study are summarized in Appendix Table~\ref{tab:external_assets_licenses}.
10
-
11
-
12
- \noindent\textbf{Dynamic weather inputs.}
13
- The weather inputs come from a California regional dataset built from NOAA High-Resolution Rapid Refresh (HRRR) fields~\cite{noaa_hrrr_ncei,noaa_hrrr_emc}. The data are placed on a projected 5 km grid in EPSG:5070. Each time map uses weather fields every 6 hours and predicts wildfire occupancy at a 12-hour lead. The variables include near-surface temperature and dew point, wind, CAPE, surface pressure, boundary-layer height, visibility, precipitation rate, and accumulated precipitation.
14
-
15
- \noindent\textbf{Occupancy supervision.}
16
- Wildfire supervision comes from NASA FIRMS active-fire detections~\cite{nasa_firms}. The detections are mapped to the same grid as the weather fields. \ourfm is trained on gridded occupancy labels derived from these detections. This defines the occupancy target used by the reference backbone throughout the primary experiments.
17
-
18
- \noindent\textbf{Static context.}
19
- Static context describes landscape and exposure factors that do not change at the weather time step. These variables are LANDFIRE fire-behavior fuel model~\cite{landfire_fbfm40}, LANDFIRE canopy cover~\cite{landfire_canopy_cover}, Wildfire Risk to Communities housing-unit density~\cite{usfs_wrc_housing_density}, and LandScan population~\cite{ornl_landscan_2024}. Together with validity masks for the weather and static fields, the occupancy input has 16 channels: 10 weather fields, two validity masks, and four static layers for regional fire prediction.
20
-
21
- \noindent\textbf{Event-level resources.}
22
- Event-level resources are used for supporting burned-area and analog tasks, not as occupancy labels for \ourfm. These resources include WFIGS incident and perimeter attributes~\cite{nifc_wfigs_perimeters} and MTBS burned-area and burn-severity records~\cite{mtbs_usgs_2025}. They provide event-scale outcomes and incident metadata for supporting tasks in the experiments and appendix analyses.
23
-
24
-
25
-
26
- \subsection{Training Strategy}
27
-
28
- \noindent\textbf{Model and data split.}
29
- \ourfm uses a compact U-Net~\cite{ronneberger2015unet} that maps gridded weather and static inputs to wildfire predictions.
30
- Its primary output is fire occupancy on the common spatial grid.
31
- Data are split by time: June--August 2024 for training, September 2024 for validation, and October 2024 for testing.
32
- This yields 368 training time maps, 120 validation time maps, and 120 test time maps.
33
- Temporal splitting keeps later fire outcomes out of earlier training periods.
34
-
35
- \noindent\textbf{Fire-aware tile training.}
36
- Training is performed on 32$\times$32 tiles sampled from the time maps. The tiles include fire-centered regions and non-fire context, so the model sees both sparse fire labels and surrounding background conditions. This sampling reduces the dominance of empty cells without removing non-fire examples from the training distribution. Class-weighted binary cross-entropy is used for the primary occupancy target to further balance sparse positives.
37
-
38
- \noindent\textbf{Spatial-support training objective.}
39
- Wildfire labels can shift by a few grid cells because detections, weather fields, and static layers are aligned on a common grid. To reduce sensitivity to these small displacements during training, the occupancy target is dilated by two grid cells. An auxiliary spatial-support output is trained for the same neighborhood alongside the primary occupancy output. At test time, \ourfm is scored under the same task-specific evaluation contracts as the transferred Earth-FM backbones in Section~\ref{sec:eval}, ensuring matched comparison conditions.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
paper/sections/3_prelim.tex DELETED
@@ -1,84 +0,0 @@
1
- \section{Evaluation Design}
2
- \label{sec:eval}
3
-
4
- % Section~\ref{Reference_backbone} establishes \ourfm\ as a wildfire-specialized backbone.
5
- % This section formalizes an evaluation contract and introduces two controlled checks to isolate evaluation effects.
6
-
7
- \subsection{Wildfire Output Records and Fire Sets}
8
-
9
- \paragraph{Output record.}
10
- A wildfire prediction model produces scores over spatial units and forecast times, which are compared against observed fire activity to compute performance. We formalize this comparison as a \emph{wildfire output record} $\mathcal{O} = (S, Y)$, where the score field $S = \{s_{i,t}\}$ contains model scores over spatial units $i$ and times $t$, and the label field $Y = \{y_{i,t}\}$ contains the corresponding observations. For occupancy tasks, $y_{i,t} \in \{0,1\}$ indicates whether fire is observed at $(i,t)$.
11
- \vspace{-0.5em}
12
- \paragraph{Predicted and observed fire sets.}
13
- To evaluate $\mathcal{O}$, the score field is thresholded at $\tau$ to produce a predicted fire set $\hat{P}_\tau = \{(i,t) : s_{i,t} \geq \tau\}$, while the observed fire set is $P = \{(i,t) : y_{i,t} = 1\}$. The pair $(\hat{P}_\tau, P)$ is evaluated under a matching rule. Given a matching rule, true positives (TP), false positives (FP), and false negatives (FN) are computed from matched and unmatched elements, and the decision F1 score is $\text{F1} = 2\text{TP}/(2\text{TP} + \text{FP} + \text{FN})$. The same $(\hat{P}_\tau, P)$ can yield different TP, FP, and FN counts under different matching rules without changing model outputs, motivating the fixed-output check in Section~\ref{sec:checks}.
14
- \vspace{-0.5em}
15
- \paragraph{Matching rules.}
16
- A matching rule specifies when a predicted unit-time pair in $\hat{P}_\tau$ is considered a match to an observed pair in $P$~\cite{ebert2009neighborhood, gilleland2009intercomparison}. Because wildfire applications tolerate different levels of spatial and temporal error, we define three matching rules for occupancy outputs. \textit{(1) Exact matching}: requires agreement in both spatial unit and forecast time. \textit{(2) Tolerated matching}: accepts predictions within a fixed spatial or temporal neighborhood defined by the evaluation contract $\mathcal{C}$. \textit{(3) Union matching}: accepts predictions satisfying either exact or tolerated matching.
17
- % \begin{itemize}
18
- % \item \textbf{Exact matching}: requires agreement in both spatial unit and forecast time.
19
- % \item \textbf{Tolerated matching}: accepts predictions within a fixed spatial or temporal neighborhood defined by the evaluation contract $\mathcal{C}$.
20
- % \item \textbf{Union matching}: accepts predictions satisfying either exact or tolerated matching.
21
- % \end{itemize}
22
- % \vspace{-0.5em}
23
- %
24
- Figure~\ref{fig:toy_occupancy_contract} illustrates these rules for a fixed output. Because the output record is held constant, any score difference is attributed solely to the matching rule.
25
-
26
- \begin{figure}
27
- \centering
28
- \vspace{-2mm}
29
- \includegraphics[width=0.8\linewidth]{figures/matching.pdf}
30
- \vspace{-2mm}
31
- \caption{Matching rules for one fixed occupancy output.
32
- (a) Exact matching counts only same-time, same-cell overlap.
33
- (b) Tolerated matching accepts bounded spatial or temporal offsets.
34
- (c) The union reading counts matches accepted by either rule.}
35
- \vspace{-5mm}
36
- \label{fig:toy_occupancy_contract}
37
- \end{figure}
38
-
39
- \subsection{Evaluation Contract}
40
-
41
- A wildfire transfer score depends not only on the model, but also on the evaluation choices used to compute it~\cite{luth2023activelearning}. Changing the matching rule $\Lambda$, metric $M$, or evaluation scope $\Omega$ changes what the score measures even when model outputs are fixed.
42
-
43
- We define an \emph{evaluation contract} as the tuple
44
- $\mathcal{C} = (\mathcal{T}, M, \Lambda, \Omega, \mathcal{A})$,
45
- where $\mathcal{T}$ denotes the task, $M$ the metric,
46
- $\Lambda$ the matching rule, $\Omega$ the evaluation scope,
47
- and $\mathcal{A}$ the allowed lightweight-head family.
48
- Two transfer scores are comparable only when all five
49
- components are identical. The evaluation scope $\Omega$ is particularly important in wildfire settings. A global scope evaluates the full spatial domain, including many fire-inactive regions that can mask differences between models. A fire-prone scope restricts evaluation to regions with higher historical fire activity. We report both scopes separately rather than averaging across them. Fixed matching-rule, task-form, and scope parameters are reported in Appendix Tables~\ref{tab:app_matching_rule_params}, \ref{tab:app_contract_params_full}, and~\ref{tab:app_scope_params}.
50
-
51
-
52
- \subsection{Task-Form Contracts}
53
- \label{sec:taskforms}
54
- Contract components depend on task form. We distinguish \emph{primary} and \emph{supporting} tasks based on whether they directly evaluate wildfire decisions. Occupancy and fire spread are primary tasks because they evaluate spatial fire outputs under matching or overlap rules.
55
- Retrieval, burned-area regression, smoke PM$_{2.5}$, and extreme heat are supporting tasks because they use different prediction units and metric families. Their results provide complementary evidence rather than direct substitutes for occupancy and spread evaluation~\cite{schaeffer2023mirage}.
56
-
57
- For primary tasks, multiple metrics are reported for the same output under different contracts. For occupancy, exact F1 requires same-cell same-time agreement, tolerated F1 accepts predictions within a spatial or temporal neighborhood, and union F1 accepts predictions satisfying either rule. For fire spread, exact F1 evaluates raster-cell agreement, spatial F1 evaluates region overlap between $\hat{B}$ and $B$~\cite{gilleland2009intercomparison}, and AP summarizes ranking quality across thresholds. These metrics are reported separately rather than aggregated because they measure different aspects of the same prediction task. Figure~\ref{fig:task_contract_tiles} summarizes the contract map across all six task forms.
58
-
59
- \subsection{Controlled Checks}
60
- \label{sec:checks}
61
- \begin{wrapfigure}[19]{r}{0.52\textwidth}
62
- \vspace{-2em}
63
- \centering
64
- \includegraphics[width=\linewidth]{figures/fig_task_contract_tiles.pdf}
65
- \vspace{-1.5em}
66
- \caption{
67
- Evaluation contract map for the six fixed-contract tasks.
68
- Yellow boxes denote \textcolor[RGB]{255,193,7}{\textbf{primary}} decision tasks; purple boxes denote \textcolor[RGB]{148,103,189}{\textbf{supporting}} tasks.
69
- }
70
- \label{fig:task_contract_tiles}
71
- \vspace{-0.8em}
72
- \end{wrapfigure}
73
- We isolate the two instability sources with two checks.
74
- Each check fixes all contract components except one, so any difference is attributed solely to that component.
75
-
76
- \paragraph{Fixed-output check.}
77
- The fixed-output check isolates matching-rule effects by holding the output record $\mathcal{O} = (S, Y)$ and all other contract components fixed while varying only $\Lambda$. For the same occupancy record, we compute F1 under exact, tolerated, and union matching. Any score difference is therefore attributed solely to the matching rule. If matching rules alone shift F1 by tens of percentage points on the same output, then comparing models under different $\Lambda$ conflates model quality with evaluation design.
78
-
79
- \paragraph{Fixed-feature check and selection regret.}
80
- The fixed-feature check isolates head-selection effects by holding the frozen feature source, $\mathcal{T}$, $\Omega$, $\Lambda$, and candidate head family $\mathcal{H} \subseteq \mathcal{A}$ fixed while varying only the selection metric. Let $R(h)$ denote the ranking score of head $h$ and $D(h)$ its decision score. Ranking-based selection chooses $h_R = \arg\max_{h \in \mathcal{H}} R(h)$, while decision-based selection chooses $h_D = \arg\max_{h \in \mathcal{H}} D(h)$. We define \emph{selection regret} as the decision-score gap incurred by using a ranking metric as a proxy for a decision metric during head selection: $\delta = D(h_D) - D(h_R) \geq 0$ ~\cite{mcdermott2024aurocauprc, traub2024selectiveclassification}. When $\delta > 0$, the ranking metric selects a head with lower decision performance under the same frozen representation, indicating that the observed gap arises from metric misalignment rather than from representation quality.The head family used in fixed-feature comparisons is summarized in Appendix Table~\ref{tab:app_head_architectures}.
81
-
82
-
83
- \paragraph{Fixed-contract transfer comparison.}
84
- After the controlled checks establish that matching-rule and selection-metric effects are non-trivial, Earth-FM backbones are evaluated under a shared contract $\mathcal{C}$. Entries are compared only when they satisfy the same $(\mathcal{T}, M, \Lambda, \Omega, \mathcal{A})$ tuple. Supporting tasks test whether occupancy and spread patterns generalize across task forms and provide additional evidence when transfer orderings are preserved.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
paper/sections/4_experiments.tex DELETED
@@ -1,435 +0,0 @@
1
- \section{Experiments}
2
- \label{sec:experiments}
3
-
4
- We address three research questions under the fixed-contract framework defined in Section~\ref{sec:eval}. \textbf{RQ1:} Under fixed outputs, does the matching rule determine whether a wildfire model appears usable?
5
- \textbf{RQ2:} Under fixed features, does ranking-based head selection lose decision performance?
6
- \textbf{RQ3:} Under fixed task contracts, do model comparisons remain consistent across task forms?
7
- \vspace{-0.5em}
8
- \subsection{Experimental Setup}
9
- \paragraph{Task instances.}
10
- We instantiate the six task-form contracts defined in Section~\ref{sec:taskforms}.
11
- Occupancy and fire spread serve as primary tasks because they evaluate spatial fire outputs under matching or overlap rules and align with the decision structure of early warning systems~\cite{goldammer1999early, farahmand2020fdeo}.
12
- The four supporting tasks, \textit{final burned area, analog retrieval, smoke PM$_{2.5}$, and extreme heat}, use different prediction units and metric families; their results bound rather than replace primary decision evidence.
13
-
14
- \paragraph{Compared backbones.}
15
- The frozen Earth-FM comparator set includes Prithvi-WxC~\cite{schmude2024prithviwxc}, Aurora~\cite{bodnar2025aurora}, ClimaX~\cite{nguyen2023climax}, StormCast~\cite{pathak2024stormcast}, DLWP~\cite{weyn2020dlwp}, FCN~\cite{pathak2022fourcastnet}, FengWu~\cite{chen2023fengwu}, FuXi~\cite{chen2023fuxi}, Pangu-Weather~\cite{bi2023panguweather}, and AlphaEarth~\cite{brown2025alphaearth}.
16
- \ourfm\ serves as the wildfire-specialized reference backbone.
17
-
18
- \paragraph{Protocol.}
19
- For each comparison, the contract $\mathcal{C} = (\mathcal{T}, M, \Lambda, \Omega, \mathcal{A})$ is fixed before reporting test scores.
20
- Thresholds and morphology parameters are selected on validation data and held fixed at test time.
21
- Stochastic components are evaluated over five seeds and reported as mean $\pm$ standard deviation; deterministic fixed-output checks have zero seed variance by construction.
22
- Entries outside a fixed contract are omitted from main tables and documented in the appendix.
23
- For error metrics lower is better ($\downarrow$); for F1, AP, nDCG, and correlation metrics higher is better ($\uparrow$).
24
- Appendix Table~\ref{tab:app_seed_robustness} summarizes the seed-level checks behind the reported mean-with-std convention.
25
-
26
- \subsection{Matching-Rule Sensitivity Under Fixed Output (RQ1)}
27
- \label{sec:rq1}
28
-
29
- To answer RQ1, we conduct a fixed-output check on occupancy and fire spread tasks, holding the score field $S$, label field $Y$, threshold, and all other operating choices fixed while varying only the matching rule $\Lambda$ across exact, tolerated, and union settings. Occupancy results are reported in Figure~\ref{fig:fireprone_contract_progression} under both global and fire-prone scopes. The same progression is applied to fire spread outputs. Complete occupancy sweeps and predicted-positive rates are reported in Appendix Tables~\ref{tab:fireprone_contract_progression} and~\ref{tab:app_occupancy_ppr_scope}.
30
-
31
- \begin{wrapfigure}[21]{r}{0.50\textwidth}
32
- \centering
33
- \vspace{-3mm}
34
- \includegraphics[width=\linewidth]{figures/fig_primary_rank_change_map.pdf}
35
- \caption{\textbf{Primary-task rank changes (RQ1).}
36
- Cells show rank before\(\rightarrow\)after. Green/red/gray mark moving up/down/no change; darker green or red marks a larger move. Following Section~\ref{sec:taskforms}, Ex/Tol/Un are occupancy exact, tolerated, and union matching; Sp is spread spatial-overlap $F_1$.}
37
- \label{fig:primary_ranking}
38
- \vspace{-0.8em}
39
- \end{wrapfigure}
40
- Because both tasks involve spatially sparse targets, fire-active cells for occupancy, burned raster patches for spread, the operational assumptions encoded in $\Lambda$ directly govern what the model is being asked to get right, making matching-rule choice a substantive experimental setting rather than a post hoc evaluation detail.
41
- The fixed-output results reveal a pattern that goes beyond score differences: matching-rule choice determines whether a model appears viable for wildfire decision tasks at all. Under exact matching, which requires same-cell same-time agreement, the majority of frozen Earth-FM backbones produce F1 scores that are effectively near zero, rendering them indistinguishable from an uninformative baseline and suggesting they have no practical utility for the task. As the matching rule relaxes to tolerated and then union matching, both of which reflect operationally realistic assumptions for early warning systems, where a prediction displaced by a few grid cells still triggers the correct response, the same frozen representations recover substantial decision performance, with several backbones crossing from near-zero to practically meaningful F1 levels. This transition is not a marginal score improvement: it is a qualitative change in whether a model can be considered usable. The same pattern holds for fire spread under region-level matching relaxation, where strict raster-cell agreement again suppresses performance for most backbones while spatial tolerance restores it. The implications for prior wildfire transfer claims are significant: papers that report model performance under a single implicit matching rule, which is common practice given that sparse decision targets almost always require some form of tolerance~\cite{ebert2009neighborhood, gilleland2009intercomparison}, may be drawing viability conclusions that are entirely dependent on an undisclosed protocol choice. A model claimed to perform well under one tolerance assumption may be completely unusable under a stricter one, and vice versa. Matching rule cannot be treated as an evaluation detail; it is an experimental setting that must be fixed, reported, and justified as part of any wildfire transfer claim. Additional spread AP values under fixed scopes are reported in Appendix Table~\ref{tab:app_spread_ap_by_scope}.
42
-
43
-
44
- \begin{table}[t]
45
- \centering
46
- \small
47
- \setlength{\tabcolsep}{4pt}
48
- \renewcommand{\arraystretch}{1.20}
49
- \caption{%
50
- \textbf{Primary fixed-contract transfer results (RQ1).}
51
- Occupancy metrics: exact, tolerated, union $F_1$ (\%).
52
- Fire spread metrics: exact $F_1$ and spatial $F_1$ (\%).
53
- Each block fixes $\mathcal{T}$, $\Lambda$, $\Omega$, and $\mathcal{A}$.
54
- Upward arrows indicate that larger values are better.
55
- \textbf{Bold} marks the best value per metric. \textbf{Tol.} = Tolerated
56
- }
57
- \label{tab:primary_results}
58
- \setlength{\arrayrulewidth}{0.4pt}
59
- \resizebox{\textwidth}{!}{%
60
- \begin{tabular}{lccccc}
61
- \toprule
62
- & \multicolumn{3}{c}{\textbf{Occupancy}}
63
- & \multicolumn{2}{c}{\textbf{Fire spread}} \\
64
- \cmidrule(lr){2-4}\cmidrule(lr){5-6}
65
- \textbf{Comparator}
66
- & \textbf{Exact $F_1\uparrow$} & \textbf{Tol.\ $F_1\uparrow$} & \textbf{Union $F_1\uparrow$}
67
- & \textbf{Exact $F_1\uparrow$} & \textbf{Spatial $F_1\uparrow$} \\
68
- \midrule
69
- \ourfm\
70
- & \ms{0.4546}{0.1412}
71
- & \ms{29.7484}{1.2868}
72
- & \ms{59.0656}{2.7372}
73
- & \ensuremath{\mathbf{37.6700}{\mkern1mu}_{\scriptscriptstyle \boldsymbol{\pm}\mathbf{0.9800}}}
74
- & \ensuremath{\mathbf{80.9700}{\mkern1mu}_{\scriptscriptstyle \boldsymbol{\pm}\mathbf{2.0200}}} \\
75
- \midrule
76
- Prithvi-WxC
77
- & \ms{0.0552}{0.0039} & \ms{7.1649}{0.6557} & \ms{20.1853}{1.8299}
78
- & \ms{22.3500}{3.4500} & \ms{65.2600}{1.0700} \\
79
- Aurora
80
- & \ms{0.0656}{0.0094} & \ms{8.5009}{1.9594} & \ms{23.1037}{4.9418}
81
- & \ms{30.8757}{0.1343} & \ms{71.7329}{0.0141} \\
82
- ClimaX
83
- & \ms{0.3480}{0.0754}
84
- & \ensuremath{\mathbf{29.7535}{\mkern1mu}_{\scriptscriptstyle \boldsymbol{\pm}\mathbf{3.6073}}}
85
- & \ensuremath{\mathbf{60.1506}{\mkern1mu}_{\scriptscriptstyle \boldsymbol{\pm}\mathbf{7.5865}}}
86
- & \ms{27.9853}{2.0532} & \ms{69.0634}{2.3832} \\
87
- StormCast
88
- & \ms{0.0626}{0.0119} & \ms{8.1951}{2.1895} & \ms{22.3817}{5.4294}
89
- & \ms{14.8387}{7.5791} & \ms{55.7568}{21.3003} \\
90
- DLWP
91
- & \ms{0.1693}{0.0419} & \ms{14.9148}{3.2446} & \ms{28.1901}{6.9658}
92
- & \ms{5.9335}{10.0712} & \ms{22.8587}{22.3750} \\
93
- FCN
94
- & \ms{0.2829}{0.0839} & \ms{19.5061}{3.3412} & \ms{40.0604}{9.3701}
95
- & \ms{3.1798}{2.6598} & \ms{15.6203}{12.4531} \\
96
- FengWu
97
- & \ms{0.2613}{0.0757} & \ms{12.0050}{6.0239} & \ms{24.1022}{13.6293}
98
- & \ms{5.5189}{9.0883} & \ms{18.4774}{22.4703} \\
99
- FuXi
100
- & \ms{0.3774}{0.1212} & \ms{21.0323}{4.8211} & \ms{37.2888}{9.4470}
101
- & \ms{19.9909}{2.1364} & \ms{56.1826}{3.0412} \\
102
- Pangu-Weather
103
- & \ms{0.2755}{0.1089} & \ms{17.0909}{4.0477} & \ms{35.6386}{9.0327}
104
- & \ms{11.2583}{11.0719} & \ms{32.5081}{25.4969} \\
105
- AlphaEarth
106
- & \ensuremath{\mathbf{2.0606}{\mkern1mu}_{\scriptscriptstyle \boldsymbol{\pm}\mathbf{0.4404}}}
107
- & \ms{29.4476}{6.0064} & \ms{37.4286}{9.9458}
108
- & \ms{11.0995}{3.6088} & \ms{32.8316}{7.4634} \\
109
- \bottomrule
110
- \end{tabular}
111
- }
112
- \end{table}
113
-
114
-
115
- % \begin{figure}[H]
116
- % \centering
117
- % \includegraphics[width=\textwidth]{figures/fig_fireprone_contract_progression_compact.pdf}
118
- % \caption{
119
- % \textbf{Matching-rule sensitivity in fire-prone occupancy (RQ1).}
120
- % Each row holds the score field \(S\), label field \(Y\), threshold, and \(\Omega\) fixed, and changes only \(\Lambda\).
121
- % Legend: \textcolor[HTML]{17375E}{$\blacksquare$} strict \(F_1\),
122
- % \textcolor[HTML]{4F8DCC}{$\blacksquare$} added \(F_1\) from spatial tolerance,
123
- % \textcolor[HTML]{BFD7F0}{$\blacksquare$} added \(F_1\) from union matching,
124
- % red outline \ourfm, and dashed line original weather FMs vs.\ added baselines.
125
- % The horizontal axis is \(F_1\) in percent.
126
- % }
127
- % \label{fig:fireprone_contract_progression}
128
- % \end{figure}
129
- \begin{wrapfigure}[14]{r}{0.50\textwidth}
130
- \centering
131
- \vspace{-1em} \includegraphics[width=0.50\textwidth]{figures/fig_selection_regret_scatter.pdf}
132
- \caption{\textbf{Head-selection regret under fixed features (RQ2).}
133
- Each point is one backbone; selection regret \(\delta\) follows Section~\ref{sec:checks} under global-scope union-\(F_1\).}
134
-
135
- \label{fig:selection_regret_diagnostic}
136
- \vspace{-1.2em}
137
- \end{wrapfigure}
138
-
139
- % \begin{wrapfigure}[17]{r}{0.46\textwidth}
140
- % \vspace{-0.4em}
141
- % \centering
142
- % \resizebox{\linewidth}{!}{%
143
- % \begin{tikzpicture}[x=1cm,y=1cm]
144
- % \footnotesize
145
- % \draw[black!12, line width=0.35pt] (2.450,-0.350) -- (2.450,4.530);
146
- % \node[anchor=north, font=\scriptsize, text=black!70] at (2.450,-0.410) {-20};
147
- % \draw[black!12, line width=0.35pt] (3.243,-0.350) -- (3.243,4.530);
148
- % \node[anchor=north, font=\scriptsize, text=black!70] at (3.243,-0.410) {-10};
149
- % \draw[wfgray, line width=0.55pt] (4.036,-0.350) -- (4.036,4.530);
150
- % \node[anchor=north, font=\scriptsize, text=black!70] at (4.036,-0.410) {0};
151
- % \draw[black!12, line width=0.35pt] (4.829,-0.350) -- (4.829,4.530);
152
- % \node[anchor=north, font=\scriptsize, text=black!70] at (4.829,-0.410) {10};
153
- % \draw[black!12, line width=0.35pt] (5.621,-0.350) -- (5.621,4.530);
154
- % \node[anchor=north, font=\scriptsize, text=black!70] at (5.621,-0.410) {20};
155
- % \draw[black!12, line width=0.35pt] (6.414,-0.350) -- (6.414,4.530);
156
- % \node[anchor=north, font=\scriptsize, text=black!70] at (6.414,-0.410) {30};
157
- % \draw[black!12, line width=0.35pt] (7.207,-0.350) -- (7.207,4.530);
158
- % \node[anchor=north, font=\scriptsize, text=black!70] at (7.207,-0.410) {40};
159
- % \draw[black!12, line width=0.35pt] (8.000,-0.350) -- (8.000,4.530);
160
- % \node[anchor=north, font=\scriptsize, text=black!70] at (8.000,-0.410) {50};
161
- % \draw[black!45, line width=0.4pt] (2.450,-0.350) -- (8.000,-0.350);
162
- % \node[anchor=east, font=\scriptsize, text=black!82] at (2.320,4.350) {\textcolor{wfblue}{\textbf{\ourfm}}};
163
- % \draw[wfslate, line width=0.72pt] (4.030,4.220) -- (5.212,4.220);
164
- % \draw[wfslate, line width=0.72pt] (4.030,4.185) -- (4.030,4.255);
165
- % \draw[wfslate, line width=0.72pt] (5.212,4.185) -- (5.212,4.255);
166
- % \filldraw[wfslate] (4.621,4.220) circle[radius=0.045];
167
- % \draw[wforange, line width=0.72pt] (4.051,4.480) -- (4.487,4.480);
168
- % \draw[wforange, line width=0.72pt] (4.051,4.445) -- (4.051,4.515);
169
- % \draw[wforange, line width=0.72pt] (4.487,4.445) -- (4.487,4.515);
170
- % \filldraw[wforange] (4.224,4.435) rectangle (4.314,4.525);
171
- % \node[anchor=east, font=\scriptsize, text=black!82] at (2.320,3.940) {Prithvi-WxC};
172
- % \draw[wfslate, line width=0.72pt] (4.036,3.810) -- (4.036,3.810);
173
- % \draw[wfslate, line width=0.72pt] (4.036,3.775) -- (4.036,3.845);
174
- % \draw[wfslate, line width=0.72pt] (4.036,3.775) -- (4.036,3.845);
175
- % \filldraw[wfslate] (4.036,3.810) circle[radius=0.045];
176
- % \draw[wforange, line width=0.72pt] (4.036,4.070) -- (4.036,4.070);
177
- % \draw[wforange, line width=0.72pt] (4.036,4.035) -- (4.036,4.105);
178
- % \draw[wforange, line width=0.72pt] (4.036,4.035) -- (4.036,4.105);
179
- % \filldraw[wforange] (3.991,4.025) rectangle (4.081,4.115);
180
- % \node[anchor=east, font=\scriptsize, text=black!82] at (2.320,3.530) {Aurora};
181
- % \draw[wfslate, line width=0.72pt] (3.580,3.400) -- (5.276,3.400);
182
- % \draw[wfslate, line width=0.72pt] (3.580,3.365) -- (3.580,3.435);
183
- % \draw[wfslate, line width=0.72pt] (5.276,3.365) -- (5.276,3.435);
184
- % \filldraw[wfslate] (4.428,3.400) circle[radius=0.045];
185
- % \draw[wforange, line width=0.72pt] (2.627,3.660) -- (7.723,3.660);
186
- % \draw[wforange, line width=0.72pt] (2.627,3.625) -- (2.627,3.695);
187
- % \draw[wforange, line width=0.72pt] (7.723,3.625) -- (7.723,3.695);
188
- % \filldraw[wforange] (5.130,3.615) rectangle (5.220,3.705);
189
- % \node[anchor=east, font=\scriptsize, text=black!82] at (2.320,3.120) {ClimaX};
190
- % \draw[wfslate, line width=0.72pt] (4.032,2.990) -- (4.060,2.990);
191
- % \draw[wfslate, line width=0.72pt] (4.032,2.955) -- (4.032,3.025);
192
- % \draw[wfslate, line width=0.72pt] (4.060,2.955) -- (4.060,3.025);
193
- % \filldraw[wfslate] (4.046,2.990) circle[radius=0.045];
194
- % \draw[wforange, line width=0.72pt] (4.036,3.250) -- (4.036,3.250);
195
- % \draw[wforange, line width=0.72pt] (4.036,3.215) -- (4.036,3.285);
196
- % \draw[wforange, line width=0.72pt] (4.036,3.215) -- (4.036,3.285);
197
- % \filldraw[wforange] (3.991,3.205) rectangle (4.081,3.295);
198
- % \node[anchor=east, font=\scriptsize, text=black!82] at (2.320,2.710) {StormCast};
199
- % \draw[wfslate, line width=0.72pt] (4.036,2.580) -- (4.036,2.580);
200
- % \draw[wfslate, line width=0.72pt] (4.036,2.545) -- (4.036,2.615);
201
- % \draw[wfslate, line width=0.72pt] (4.036,2.545) -- (4.036,2.615);
202
- % \filldraw[wfslate] (4.036,2.580) circle[radius=0.045];
203
- % \draw[wforange, line width=0.72pt] (4.036,2.840) -- (4.036,2.840);
204
- % \draw[wforange, line width=0.72pt] (4.036,2.805) -- (4.036,2.875);
205
- % \draw[wforange, line width=0.72pt] (4.036,2.805) -- (4.036,2.875);
206
- % \filldraw[wforange] (3.991,2.795) rectangle (4.081,2.885);
207
- % \node[anchor=east, font=\scriptsize, text=black!82] at (2.320,2.300) {DLWP};
208
- % \draw[wfslate, line width=0.72pt] (4.036,2.170) -- (4.036,2.170);
209
- % \draw[wfslate, line width=0.72pt] (4.036,2.135) -- (4.036,2.205);
210
- % \draw[wfslate, line width=0.72pt] (4.036,2.135) -- (4.036,2.205);
211
- % \filldraw[wfslate] (4.036,2.170) circle[radius=0.045];
212
- % \draw[wforange, line width=0.72pt] (4.044,2.430) -- (4.735,2.430);
213
- % \draw[wforange, line width=0.72pt] (4.044,2.395) -- (4.044,2.465);
214
- % \draw[wforange, line width=0.72pt] (4.735,2.395) -- (4.735,2.465);
215
- % \filldraw[wforange] (4.345,2.385) rectangle (4.435,2.475);
216
- % \node[anchor=east, font=\scriptsize, text=black!82] at (2.320,1.890) {FCN};
217
- % \draw[wfslate, line width=0.72pt] (4.036,1.760) -- (4.036,1.760);
218
- % \draw[wfslate, line width=0.72pt] (4.036,1.725) -- (4.036,1.795);
219
- % \draw[wfslate, line width=0.72pt] (4.036,1.725) -- (4.036,1.795);
220
- % \filldraw[wfslate] (4.036,1.760) circle[radius=0.045];
221
- % \draw[wforange, line width=0.72pt] (3.971,2.020) -- (4.286,2.020);
222
- % \draw[wforange, line width=0.72pt] (3.971,1.985) -- (3.971,2.055);
223
- % \draw[wforange, line width=0.72pt] (4.286,1.985) -- (4.286,2.055);
224
- % \filldraw[wforange] (4.083,1.975) rectangle (4.173,2.065);
225
- % \node[anchor=east, font=\scriptsize, text=black!82] at (2.320,1.480) {FengWu};
226
- % \draw[wfslate, line width=0.72pt] (4.036,1.350) -- (4.036,1.350);
227
- % \draw[wfslate, line width=0.72pt] (4.036,1.315) -- (4.036,1.385);
228
- % \draw[wfslate, line width=0.72pt] (4.036,1.315) -- (4.036,1.385);
229
- % \filldraw[wfslate] (4.036,1.350) circle[radius=0.045];
230
- % \draw[wforange, line width=0.72pt] (4.028,1.610) -- (4.127,1.610);
231
- % \draw[wforange, line width=0.72pt] (4.028,1.575) -- (4.028,1.645);
232
- % \draw[wforange, line width=0.72pt] (4.127,1.575) -- (4.127,1.645);
233
- % \filldraw[wforange] (4.032,1.565) rectangle (4.122,1.655);
234
- % \node[anchor=east, font=\scriptsize, text=black!82] at (2.320,1.070) {FuXi};
235
- % \draw[wfslate, line width=0.72pt] (4.036,0.940) -- (4.036,0.940);
236
- % \draw[wfslate, line width=0.72pt] (4.036,0.905) -- (4.036,0.975);
237
- % \draw[wfslate, line width=0.72pt] (4.036,0.905) -- (4.036,0.975);
238
- % \filldraw[wfslate] (4.036,0.940) circle[radius=0.045];
239
- % \draw[wforange, line width=0.72pt] (4.029,1.200) -- (4.087,1.200);
240
- % \draw[wforange, line width=0.72pt] (4.029,1.165) -- (4.029,1.235);
241
- % \draw[wforange, line width=0.72pt] (4.087,1.165) -- (4.087,1.235);
242
- % \filldraw[wforange] (4.013,1.155) rectangle (4.103,1.245);
243
- % \node[anchor=east, font=\scriptsize, text=black!82] at (2.320,0.660) {Pangu-Weather};
244
- % \draw[wfslate, line width=0.72pt] (4.036,0.530) -- (4.036,0.530);
245
- % \draw[wfslate, line width=0.72pt] (4.036,0.495) -- (4.036,0.565);
246
- % \draw[wfslate, line width=0.72pt] (4.036,0.495) -- (4.036,0.565);
247
- % \filldraw[wfslate] (4.036,0.530) circle[radius=0.045];
248
- % \draw[wforange, line width=0.72pt] (4.025,0.790) -- (4.076,0.790);
249
- % \draw[wforange, line width=0.72pt] (4.025,0.755) -- (4.025,0.825);
250
- % \draw[wforange, line width=0.72pt] (4.076,0.755) -- (4.076,0.825);
251
- % \filldraw[wforange] (4.006,0.745) rectangle (4.096,0.835);
252
- % \node[anchor=east, font=\scriptsize, text=black!82] at (2.320,0.250) {AlphaEarth};
253
- % \draw[wfslate, line width=0.72pt] (4.700,0.120) -- (6.103,0.120);
254
- % \draw[wfslate, line width=0.72pt] (4.700,0.085) -- (4.700,0.155);
255
- % \draw[wfslate, line width=0.72pt] (6.103,0.085) -- (6.103,0.155);
256
- % \filldraw[wfslate] (5.401,0.120) circle[radius=0.045];
257
- % \draw[wforange, line width=0.72pt] (3.872,0.380) -- (4.815,0.380);
258
- % \draw[wforange, line width=0.72pt] (3.872,0.345) -- (3.872,0.415);
259
- % \draw[wforange, line width=0.72pt] (4.815,0.345) -- (4.815,0.415);
260
- % \filldraw[wforange] (4.298,0.335) rectangle (4.388,0.425);
261
- % \end{tikzpicture}%
262
- % }
263
- % \caption{\textbf{Fixed-feature selection-regret check (RQ2).} Fixed-feature selection regret \(\delta = D(h_D)-D(h_R)\) under union-\(F_1\). \textcolor{wfslate}{$\bullet$} uses the full grid; \textcolor{wforange}{$\blacksquare$} uses the top \(20\%\) fire-prone cells from training fire frequency. Horizontal intervals use the same colors and show mean \(\pm\) std over five seeds, in percentage points.}
264
- % \vspace{+0.4em}
265
- % \label{fig:selection_regret_diagnostic}
266
- % \end{wrapfigure}
267
-
268
- \subsection{Head-Selection Sensitivity Under Fixed Features (RQ2)}
269
- \label{sec:rq2}
270
-
271
- To answer RQ2, we conduct a fixed-feature check on occupancy and fire spread tasks, holding the frozen feature source, $\mathcal{T}$, $\Omega$, $\Lambda$, and candidate head family $\mathcal{H} \subseteq \mathcal{A}$ fixed while varying only the selection metric between PR-AUC-based and decision-F1-based selection. The resulting selection regret $\delta = D(h_D) - D(h_R)$ measures the decision-score loss induced by metric misalignment. Occupancy results are reported in Figure~\ref{fig:selection_regret_diagnostic} under both global and fire-prone scopes. Full per-seed and per-head details are reported in Appendix~\ref{sec:app_seeded_audits}, and the exact, tolerated, and union regret breakdown is provided in Appendix Table~\ref{tab:appendix_selection_regret_tolerance}.
272
-
273
- The fixed-feature results show that head-selection metrics introduce substantial backbone-dependent variation that is not explained by representation quality alone. Some backbones exhibit near-zero regret, indicating agreement between PR-AUC and decision-F1 selection, while others show large regret concentrated in specific scope-matching settings. Regret is generally larger under the global scope, where severe fire imbalance amplifies misalignment between ranking and decision metrics~\cite{mcdermott2024aurocauprc}. Restricting evaluation to fire-prone scopes typically reduces regret by concentrating evaluation on fire-relevant regions. A similar pattern appears for fire spread, where ranking and decision metrics can favor different heads under the same frozen representation. These results show that selection metrics must be aligned with the evaluation objective as part of the evaluation contract~\cite{traub2024selectiveclassification}.
274
-
275
- \subsection{Supporting Task Checks (RQ3)}
276
- \label{sec:rq3}
277
-
278
- To answer RQ3, we evaluate all backbones across the four supporting task contracts, \textit{burned area, analog retrieval, smoke PM$_{2.5}$, }and\textit{ extreme heat}, and examine whether the reference-versus-frozen ordering established under primary tasks generalizes across task forms. A rank overview across all six contracts is provided in Figure~\ref{fig:task_comparator_normalized_map}, which maps backbone-by-task rank positions and makes cross-task ordering shifts directly visible. Native metric values are reported in Table~\ref{tab:supporting_results}. Additional supporting-task diagnostics are reported in Appendix Tables~\ref{tab:app_burned_area_median_acre}, \ref{tab:app_analog_rank_depth}, \ref{tab:app_smoke_high_event}, and~\ref{tab:app_heat_event_pr}.
279
-
280
-
281
- The supporting task results produce three qualitatively distinct patterns relative to the primary findings. Burned area largely preserves the reference-versus-frozen ordering seen under occupancy and spread: \ourfm\ leads frozen entries on log-RMSE and Spearman $\rho$, suggesting that the representational advantage of wildfire-specific pretraining generalizes to event-scale regression under a different metric family, providing convergent evidence for the primary claim. Analog retrieval and smoke PM$_{2.5}$ show a different pattern, with AlphaEarth matching \ourfm\ closely on both tasks while atmospheric FMs show near-zero correlation on smoke PM$_{2.5}$, indicating that retrieval and air-quality signals are captured comparably by a general remote-sensing backbone, and that the primary occupancy advantage does not extend uniformly to these task forms. Extreme heat exhibits the largest variance across the comparator set, with atmospheric FMs ranging from near-reference performance to near-complete failure depending on backbone pretraining domain, while AlphaEarth again matches \ourfm\ closely. The scale of this variance is itself informative: aggregating scores across task forms without respecting contract boundaries would produce rankings dominated by scale artifacts in the extreme heat block rather than by transfer quality. Taken together, these results establish that supporting tasks bound rather than extend the primary claim, they provide useful evidence about where backbone families generalize and where they do not, but they cannot substitute for primary decision task evaluation, and their results must within their own task-form contracts.
282
-
283
- \begin{figure}[t]
284
- \centering
285
- \vspace{-5mm}
286
-
287
- \includegraphics[width=\textwidth]{figures/fig_rank_heatmap1.pdf}
288
- \vspace{-2mm}
289
- \caption{{\textbf{Rank map for supporting task comparison (RQ4).} Each row fixes one task contract $\mathcal{C}$ and ranks the eligible backbones within that contract. The figure shows rank changes across task forms; native metric values are reported in Table~\ref{tab:supporting_results}.}}
290
- \vspace{-6mm}
291
- \label{fig:task_comparator_normalized_map}
292
- \end{figure}
293
-
294
- \begin{table}[t]
295
- \centering
296
- \small
297
- \setlength{\tabcolsep}{3.5pt}
298
- \renewcommand{\arraystretch}{1.18}
299
- \caption{%
300
- \textbf{Supporting task-metric matrix (RQ3).}
301
- Top: final burned area and analog retrieval.
302
- Bottom: smoke PM$_{2.5}$ and extreme heat.
303
- Each block fixes $\mathcal{T}$, $\Lambda$, and $\Omega$; backbone
304
- column is shared across paired tasks. \ourfm\ row is
305
- separated by a rule as the empirical anchor. \textbf{Bold} marks
306
- the best value per metric. For error metrics
307
- lower is better ($\downarrow$); for $F_1$, nDCG, and $r$ higher
308
- is better ($\uparrow$).
309
- }
310
- \label{tab:supporting_results}
311
- \resizebox{\textwidth}{!}{%
312
- \begin{tabular}{lcccccc}
313
- \toprule
314
- & \multicolumn{3}{c}{\textbf{Burned area}}
315
- & \multicolumn{3}{c}{\textbf{Analog retrieval}} \\
316
- \cmidrule(lr){2-4}\cmidrule(lr){5-7}
317
- \textbf{Backbone}
318
- & \textbf{log-RMSE$\downarrow$} & \textbf{log-MAE$\downarrow$}
319
- & \textbf{Spearman$\uparrow$}
320
- & \textbf{nDCG@10$\uparrow$} & \textbf{log-RMSE$\downarrow$}
321
- & \textbf{log-MAE$\downarrow$} \\
322
- \midrule
323
- \ourfm\
324
- & \ensuremath{\mathbf{1.1657}{\mkern1mu}_{\scriptscriptstyle \boldsymbol{\pm}\mathbf{0.0126}}}
325
- & \ensuremath{\mathbf{1.0423}{\mkern1mu}_{\scriptscriptstyle \boldsymbol{\pm}\mathbf{0.0081}}}
326
- & \ensuremath{\mathbf{0.6298}{\mkern1mu}_{\scriptscriptstyle \boldsymbol{\pm}\mathbf{0.0338}}}
327
- & \ensuremath{\mathbf{0.5099}{\mkern1mu}_{\scriptscriptstyle \boldsymbol{\pm}\mathbf{0.0336}}}
328
- & \ensuremath{\mathbf{1.1977}{\mkern1mu}_{\scriptscriptstyle \boldsymbol{\pm}\mathbf{0.1029}}}
329
- & \ensuremath{\mathbf{1.0043}{\mkern1mu}_{\scriptscriptstyle \boldsymbol{\pm}\mathbf{0.0759}}} \\
330
- \midrule
331
- Prithvi-WxC
332
- & \ms{1.3630}{0.0681} & \ms{1.2435}{0.0668} & \ms{0.1799}{0.3002}
333
- & \ms{0.3857}{0.0189} & \ms{1.3908}{0.0938} & \ms{1.2585}{0.0865} \\
334
- Aurora
335
- & \ms{1.8658}{0.2009} & \ms{1.6717}{0.1245} & \ms{-0.1156}{0.2982}
336
- & \ms{0.4046}{0.0144} & \ms{1.3659}{0.0792} & \ms{1.2596}{0.0968} \\
337
- ClimaX
338
- & \ms{2.0300}{0.2103} & \ms{1.8443}{0.1528} & \ms{-0.2515}{0.2688}
339
- & \ms{0.4143}{0.0191} & \ms{1.4526}{0.0926} & \ms{1.2441}{0.1446} \\
340
- StormCast
341
- & \ms{1.6679}{0.1438} & \ms{1.4745}{0.1134} & \ms{0.1830}{0.1969}
342
- & \ms{0.4076}{0.0094} & \ms{1.3663}{0.0781} & \ms{1.2371}{0.1078} \\
343
- DLWP
344
- & \ms{1.3070}{0.0980} & \ms{1.1769}{0.0834} & \ms{0.4888}{0.1368}
345
- & \ms{0.3972}{0.0146} & \ms{1.5351}{0.0802} & \ms{1.3196}{0.0781} \\
346
- FCN
347
- & \ms{1.3693}{0.0885} & \ms{1.2599}{0.0723} & \ms{0.3484}{0.1662}
348
- & \ms{0.4316}{0.0134} & \ms{1.4604}{0.1035} & \ms{1.2351}{0.0586} \\
349
- FengWu
350
- & \ms{1.3715}{0.1011} & \ms{1.2604}{0.0820} & \ms{0.3221}{0.2004}
351
- & \ms{0.4246}{0.0237} & \ms{1.4179}{0.0986} & \ms{1.2233}{0.0915} \\
352
- FuXi
353
- & \ms{1.4068}{0.1011} & \ms{1.3023}{0.0789} & \ms{0.2663}{0.2561}
354
- & \ms{0.4279}{0.0212} & \ms{1.4290}{0.0929} & \ms{1.2236}{0.0961} \\
355
- Pangu-Weather
356
- & \ms{1.3280}{0.0735} & \ms{1.2081}{0.0607} & \ms{0.4141}{0.1573}
357
- & \ms{0.4017}{0.0245} & \ms{1.4235}{0.0731} & \ms{1.2225}{0.0847} \\
358
- AlphaEarth
359
- & \ms{2.4068}{0.2841} & \ms{2.0822}{0.2371} & \ms{-0.3428}{0.1716}
360
- & \ms{0.5086}{0.0440} & \ms{1.2158}{0.1310} & \ms{1.0350}{0.1018} \\
361
- \bottomrule
362
- \end{tabular}
363
- }
364
-
365
- \vspace{4pt}
366
-
367
- \resizebox{\textwidth}{!}{%
368
- \begin{tabular}{lcccccc}
369
- \toprule
370
- & \multicolumn{3}{c}{\textbf{Smoke PM$_{2.5}$}}
371
- & \multicolumn{3}{c}{\textbf{Extreme heat}} \\
372
- \cmidrule(lr){2-4}\cmidrule(lr){5-7}
373
- \textbf{Backbone}
374
- & \textbf{RMSE$\downarrow$} & \textbf{MAE$\downarrow$}
375
- & \textbf{Pearson $r\uparrow$}
376
- & \textbf{RMSE-C$\downarrow$} & \textbf{MAE-C$\downarrow$}
377
- & \textbf{Exceed.\ $F_1\uparrow$} \\
378
- \midrule
379
- \ourfm\
380
- & \ms{4.4646}{0.0060}
381
- & \ms{2.4108}{0.0016}
382
- & \ensuremath{\mathbf{0.6368}{\mkern1mu}_{\scriptscriptstyle \boldsymbol{\pm}\mathbf{0.0013}}}
383
- & \ensuremath{\mathbf{0.2179}{\mkern1mu}_{\scriptscriptstyle \boldsymbol{\pm}\mathbf{0.0043}}}
384
- & \ensuremath{\mathbf{0.1787}{\mkern1mu}_{\scriptscriptstyle \boldsymbol{\pm}\mathbf{0.0018}}}
385
- & \ms{0.9541}{0.0164} \\
386
- \midrule
387
- Prithvi-WxC
388
- & \ms{6.0382}{0.0828} & \ms{3.7301}{0.0055} & \ms{0.0243}{0.0045}
389
- & \ms{4.6225}{0.0192} & \ms{2.6315}{0.0128} & \ms{0.8693}{0.0023} \\
390
- Aurora
391
- & \ms{6.0384}{0.0828} & \ms{3.7265}{0.0055} & \ms{0.0193}{0.0043}
392
- & \ms{18.0474}{0.0708} & \ms{15.3747}{0.0594} & \ms{0.0951}{0.0038} \\
393
- ClimaX
394
- & \ms{6.0402}{0.0828} & \ms{3.7290}{0.0055} & \ms{0.0004}{0.0029}
395
- & \ms{17.6492}{0.0347} & \ms{14.4938}{0.0319} & \ms{0.7684}{0.0068} \\
396
- StormCast
397
- & \ms{6.1230}{0.0830} & \ms{3.8182}{0.0073} & \ms{0.0183}{0.0041}
398
- & \ms{1.7671}{0.2145} & \ms{1.3507}{0.1576} & \ms{0.9073}{0.0189} \\
399
- DLWP
400
- & \ms{5.9289}{0.1031} & \ms{3.7331}{0.0088} & \ms{0.0303}{0.0060}
401
- & \ms{2.2662}{0.1106} & \ms{1.7153}{0.0748} & \ms{0.9156}{0.0112} \\
402
- FCN
403
- & \ms{5.9277}{0.1033} & \ms{3.7345}{0.0088} & \ms{0.0312}{0.0062}
404
- & \ms{2.1657}{0.1800} & \ms{1.6033}{0.1039} & \ms{0.9257}{0.0096} \\
405
- FengWu
406
- & \ms{5.9297}{0.1032} & \ms{3.7395}{0.0088} & \ms{0.0304}{0.0063}
407
- & \ms{2.1266}{0.1589} & \ms{1.5801}{0.1004} & \ms{0.0481}{0.0459} \\
408
- FuXi
409
- & \ms{5.9319}{0.1029} & \ms{3.7398}{0.0088} & \ms{0.0299}{0.0061}
410
- & \ms{2.1282}{0.0969} & \ms{1.5759}{0.0719} & \ms{0.2268}{0.0623} \\
411
- Pangu-Weather
412
- & \ms{5.9270}{0.1036} & \ms{3.7320}{0.0088} & \ms{0.0301}{0.0060}
413
- & \ms{2.2045}{0.1483} & \ms{1.6307}{0.0889} & \ms{0.0199}{0.0062} \\
414
- AlphaEarth
415
- & \ensuremath{\mathbf{4.4403}{\mkern1mu}_{\scriptscriptstyle \boldsymbol{\pm}\mathbf{0.0488}}}
416
- & \ensuremath{\mathbf{2.3992}{\mkern1mu}_{\scriptscriptstyle \boldsymbol{\pm}\mathbf{0.0056}}}
417
- & \ms{0.6347}{0.0066}
418
- & \ms{0.2194}{0.0039}
419
- & \ms{0.1800}{0.0014}
420
- & \ensuremath{\mathbf{0.9542}{\mkern1mu}_{\scriptscriptstyle \boldsymbol{\pm}\mathbf{0.0107}}} \\
421
- \bottomrule
422
- \end{tabular}
423
- }
424
- \end{table}
425
-
426
-
427
- \paragraph{Pattern 1: primary pattern preserved (burned area).}
428
- \ourfm\ leads all frozen entries on log-RMSE and Spearman $\rho$. The ordering observed under occupancy and spread is preserved under burned-area regression despite the different prediction unit and metric family.
429
-
430
- \paragraph{Pattern 2: primary pattern bounded (analog retrieval and smoke PM$_{2.5}$).}
431
- For analog retrieval, AlphaEarth matches \ourfm\ (nDCG@10 $= 0.51 \pm 0.04$ vs.\ $0.51 \pm 0.03$). For smoke PM$_{2.5}$, AlphaEarth also matches \ourfm\ on MAE and Pearson $r$, while atmospheric Earth FMs show near-zero correlation. These results show that the occupancy-and-spread ordering does not fully extend to all supporting tasks once AlphaEarth is included.
432
-
433
- \paragraph{Pattern 3: primary pattern bounded with large variance (extreme heat).} AlphaEarth matches \ourfm on RMSE-C and remains close on exceedance F1, while atmospheric FMs range from RMSE-C $= 1.77$ (StormCast) to $18.05$ (Aurora). This large spread indicates that aggregated scores across task forms would be dominated by scale artifacts rather than transfer quality, reinforcing the need for per-contract reporting established in Section~\ref{sec:eval}.
434
-
435
- \textit{Answer to RQ3:} Figure~\ref{fig:task_comparator_normalized_map} and Table~\ref{tab:supporting_results} show that burned area preserves the primary reference-versus-frozen pattern under a different metric family. Analog retrieval, smoke PM$_{2.5}$, and extreme heat bound this pattern: AlphaEarth matches or approaches \ourfm on these tasks, indicating that the primary occupancy and spread claims do not extend uniformly across all task forms.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
paper/sections/5_conclusion.tex DELETED
@@ -1,31 +0,0 @@
1
- \vspace{-4em}
2
- \section{Conclusion}
3
-
4
- We introduced \ourfm, the first foundation model pretrained
5
- specifically for wildfire prediction using fire-relevant
6
- multimodal data. Our results show that wildfire forecasting
7
- requires representations aligned with wildfire dynamics rather
8
- than transfer alone from general atmospheric or geophysical
9
- pretraining.
10
- At the same time, our study shows that reliable wildfire
11
- transfer evaluation is substantially more difficult than
12
- standard benchmark settings suggest. Wildfire transfer
13
- conclusions depend strongly on matching rules, head-selection
14
- metrics, and task form, and scores computed under different
15
- evaluation settings are not directly comparable. These effects
16
- become particularly pronounced in sparse spatiotemporal
17
- prediction settings such as wildfire forecasting.
18
- We therefore introduced a fixed-contract evaluation framework
19
- for wildfire Earth-FM transfer. By explicitly specifying the
20
- task, metric, matching rule, evaluation scope, and head family
21
- before comparison, fixed-contract evaluation enables more
22
- controlled and interpretable comparison across wildfire tasks
23
- and models.
24
- We hope \ourfm\ and the fixed-contract framework provide a
25
- foundation for future wildfire-specific Earth FMs, transfer
26
- benchmarks, and decision-oriented evaluation protocols.
27
- More broadly, our research provides a reliable system to guide real-world intervention and resource allocation at the intersection of AI for environmental decision-making.
28
-
29
- \paragraph{Limitations.} The conclusions apply to the task forms, scopes, evaluation rules, and comparator eligibility decisions used in this study.
30
- The evaluation covers selected wildfire decision tasks and supporting retrieval and regression task forms.
31
- They provide task-form evidence rather than a single score across all wildfire-related prediction tasks.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
paper/sections/appendix.tex DELETED
@@ -1,733 +0,0 @@
1
- % ============================================================
2
- % APPENDIX
3
- % ============================================================
4
- \appendix
5
-
6
- % Copy-paste safety: these definitions are no-ops when main.tex already defines them.
7
- \providecommand{\ms}[2]{\ensuremath{#1{\mkern1mu}_{\scriptscriptstyle \pm #2}}}
8
-
9
-
10
- % ────────────────────────────────────────────────────────────
11
- % TABLE OF CONTENTS (appendix only)
12
- % ────────────────────────────────────────────────────────────
13
- \section*{Appendix Contents}
14
- \addcontentsline{toc}{section}{Appendix Contents}
15
-
16
-
17
- \begin{center}
18
- \begin{tabular}{@{}p{0.82\textwidth}r@{}}
19
- \textbf{A\quad Evaluation Contract Specifications} & \pageref{sec:app_contract} \\
20
- \quad A.1\enspace Matching Rule Definitions & \pageref{sec:app_contract_matching} \\
21
- \quad A.2\enspace Task-Form Contract Parameters & \pageref{sec:app_contract_params} \\
22
- \quad A.3\enspace Evaluation Scope Definitions & \pageref{sec:app_contract_scope} \\[4pt]
23
- \textbf{B\quad Controlled Check Details} & \pageref{sec:app_checks} \\
24
- \quad B.1\enspace Fixed-Output Check: Full Sweep & \pageref{sec:app_checks_output} \\
25
- \quad B.2\enspace Fixed-Feature Check: Selection Summary & \pageref{sec:app_checks_feature} \\
26
- \quad B.3\enspace Selection Regret Under Matching Rules & \pageref{sec:app_checks_regret} \\
27
- \quad B.4\enspace Additional Value Tables & \pageref{sec:app_checks_values} \\[4pt]
28
- \textbf{C\quad Comparator Eligibility Notes} & \pageref{sec:comparator_audit} \\[4pt]
29
- \textbf{D\quad Seeded Audits} & \pageref{sec:app_seeded_audits} \\
30
- \quad D.1\enspace Seed Robustness Summary & \pageref{sec:app_seed_robustness} \\[4pt]
31
- \textbf{E\quad Lightweight Head and Adaptation Details} & \pageref{sec:app_heads} \\[4pt]
32
- \textbf{F\quad Limitations} & \pageref{sec:limitations} \\[4pt]
33
- \textbf{G\quad Reproducibility and Evaluation Artifacts} & \pageref{sec:repro_compute_impact} \\
34
- \end{tabular}
35
- \end{center}
36
-
37
- \noindent\textit{Retention rule.}
38
- Appendix tables are retained when they add contract parameters, controlled-check arithmetic,
39
- task-specific non-main metrics, seed summaries, eligibility checks, or protocol details.
40
- Full task matrices and reference-summary tables that repeat the main result tables are not repeated here.
41
-
42
- \clearpage
43
-
44
- % ============================================================
45
- % A EVALUATION CONTRACT SPECIFICATIONS
46
- % ============================================================
47
- \section{Evaluation Contract Specifications}
48
- \label{sec:app_contract}
49
-
50
- % ────────────────────────────────────────────────────────────
51
- \subsection{Matching Rule Definitions}
52
- \label{sec:app_contract_matching}
53
-
54
- The three matching rules used across occupancy task forms are defined as follows.
55
-
56
- \noindent\textbf{Exact matching.}
57
- A predicted unit-time pair $(i,t) \in \widehat{P}_\tau$ is counted as a true positive if and only if the same pair appears in the observed fire set $P = \{(i,t): y_{i,t}=1\}$.
58
- This is the strictest rule and yields the lowest $F_1$ for any fixed output.
59
-
60
- \noindent\textbf{Tolerated matching.}
61
- A predicted pair $(i,t)$ is counted as correct if there exists an observed pair $(i',t') \in P$ such that $\|i - i'\|_\infty \le k$ and $|t - t'| \le \Delta t$, where $k$ is the spatial tolerance in grid cells and $\Delta t$ is the temporal tolerance in forecast steps.
62
- Both parameters are fixed as part of the evaluation contract $\mathcal{C}$ before scoring.
63
-
64
- \noindent\textbf{Union matching.}
65
- A predicted pair is counted as a true positive if it satisfies either exact or tolerated matching.
66
- The resulting union-$F_1$ provides an upper bound on decision performance under the chosen tolerance.
67
-
68
- \noindent\textbf{Fixed parameter values.}
69
- For occupancy, the spatial tolerance is $k=8$ grid cells.
70
- The temporal tolerance is $\Delta t=3$ forecast steps for union matching and $\Delta t=0$ for spatial-only tolerance.
71
- The threshold $\tau$ is selected on validation strict-$F_1$ before test scoring.
72
- For fire spread, the spatial tolerance is $k=4$ grid cells, $\Delta t=0$, and the threshold is selected on validation spatial $F_1$.
73
-
74
- \noindent Table~\ref{tab:app_matching_rule_params} records the fixed matching-rule parameters.
75
-
76
- \begin{table}[h]
77
- \centering
78
- \small
79
- \setlength{\tabcolsep}{10pt}
80
- \renewcommand{\arraystretch}{1.2}
81
- \caption{Matching-rule values used in the evaluation contracts.}
82
- \label{tab:app_matching_rule_params}
83
- \begin{tabular}{lll}
84
- \toprule
85
- \textbf{Parameter} & \textbf{Occupancy} & \textbf{Fire spread} \\
86
- \midrule
87
- \(k\) & 8 cells & 4 cells \\
88
- \(\Delta t\) & 3 for union; 0 spatial-only & 0 \\
89
- \(\tau\) & val. strict \(F_1\) & val. spatial \(F_1\) \\
90
- \bottomrule
91
- \end{tabular}
92
- \end{table}
93
-
94
- % ────────────────────────────────────────────────────────────
95
- \subsection{Task-Form Contract Parameters}
96
- \label{sec:app_contract_params}
97
-
98
- Table~\ref{tab:app_contract_params_full} lists fixed scoring values not shown in the main contract map.
99
-
100
- \begin{table}[h]
101
- \centering
102
- \scriptsize
103
- \setlength{\tabcolsep}{3.5pt}
104
- \renewcommand{\arraystretch}{1.2}
105
- \caption{Fixed scoring values used by each task-form contract.}
106
- \label{tab:app_contract_params_full}
107
- \begin{adjustbox}{max width=\textwidth}
108
- \begin{tabular}{llll}
109
- \toprule
110
- \textbf{\(\mathcal{T}\)} & \textbf{Scoring} & \textbf{Validation} & \textbf{\(\Omega\)} \\
111
- \midrule
112
- Occupancy & \(k=8,\Delta t=3\); exact/tol./union \(F_1\) & val. strict \(F_1\) & global; top-5/10/20\% fire-prone \\
113
- Fire spread & \(k=4,\Delta t=0\); exact/spatial \(F_1\), AP & val. spatial \(F_1\) & spread-region cells \\
114
- Final burned area & log-RMSE, log-MAE, Spearman \(\rho\) & val. log-RMSE & test events \\
115
- Analog retrieval & nDCG@10; retrieved-event log error & val. nDCG@10 & test events \\
116
- Smoke PM\(_{2.5}\) & RMSE, MAE, Pearson \(r\); exceedance 35 & val. RMSE & test stations \\
117
- Extreme heat & RMSE-C, MAE-C, exceedance \(F_1\) & val. threshold 27/30/33\(^{\circ}\)C & heat-region stations \\
118
- \bottomrule
119
- \end{tabular}
120
- \end{adjustbox}
121
- \end{table}
122
-
123
- % ────────────────────────────────────────────────────────────
124
- \subsection{Evaluation Scope Definitions}
125
- \label{sec:app_contract_scope}
126
-
127
- \noindent\textbf{Global scope.}
128
- Evaluation covers all spatial units in the domain, including fire-inactive regions.
129
- This scope can mask model differences on fire-relevant locations because inactive cells inflate true-negative counts.
130
-
131
- \noindent\textbf{Fire-prone scope.}
132
- Evaluation is restricted to grid cells in the top-$k$\% of historical fire activity.
133
- We report results for top-5\%, top-10\%, and top-20\% cutoffs.
134
- The cutoff thresholds are derived from the training period and held fixed at test time.
135
-
136
- \noindent\textbf{Spread region scope.}
137
- For fire spread tasks, evaluation is restricted to the predicted and observed burned raster patches.
138
- Only cells within the union of $\widehat{B}$ and $B$ contribute to metric computation.
139
-
140
- \noindent\textbf{Fixed scope sizes.}
141
- The global scope contains 8,085,000 test cells.
142
- The fire-prone top-5\%, top-10\%, and top-20\% scopes contain 404,280, 808,560, and 1,617,000 test cells, respectively.
143
- The spread-region scope is event-specific and uses the union of $\widehat{B}$ and $B$.
144
-
145
- \begin{table}[h]
146
- \centering
147
- \small
148
- \setlength{\tabcolsep}{8pt}
149
- \renewcommand{\arraystretch}{1.2}
150
- \caption{Scope values used in the evaluation contracts.}
151
- \label{tab:app_scope_params}
152
- \begin{tabular}{lcc}
153
- \toprule
154
- \textbf{\(\Omega\)} & \textbf{Definition} & \textbf{Units} \\
155
- \midrule
156
- Global & full domain & 8,085,000 test cells \\
157
- Fire-prone top-5\% & top 5\% by training-period fire frequency & 404,280 test cells \\
158
- Fire-prone top-10\% & top 10\% by training-period fire frequency & 808,560 test cells \\
159
- Fire-prone top-20\% & top 20\% by training-period fire frequency & 1,617,000 test cells \\
160
- Spread region & union of \(\widehat{B}\) and \(B\) & event-specific cells \\
161
- \bottomrule
162
- \end{tabular}
163
- \end{table}
164
-
165
- \clearpage
166
-
167
- % ============================================================
168
- % B CONTROLLED CHECK DETAILS
169
- % ============================================================
170
- \section{Controlled Check Details}
171
- \label{sec:app_checks}
172
-
173
-
174
- \begin{figure}[t]
175
- \centering
176
- \includegraphics[width=\textwidth]{figures/fig_fireprone_contract_progression_compact.pdf}
177
- \caption{
178
- \textbf{Matching-rule sensitivity in fire-prone occupancy (RQ1).}
179
- Each row holds the score field \(S\), label field \(Y\), threshold, and \(\Omega\) fixed, and changes only \(\Lambda\).
180
- Legend: \textcolor[HTML]{17375E}{$\blacksquare$} strict \(F_1\),
181
- \textcolor[HTML]{4F8DCC}{$\blacksquare$} added \(F_1\) from spatial tolerance,
182
- \textcolor[HTML]{BFD7F0}{$\blacksquare$} added \(F_1\) from union matching,
183
- red outline \ourfm, and dashed line original weather FMs vs.\ added baselines.
184
- The horizontal axis is \(F_1\) in percent.
185
- }
186
- \label{fig:fireprone_contract_progression}
187
- \end{figure}
188
-
189
- % ────────────────────────────────────────────────────────────
190
- \subsection{Fixed-Output Check: Full Sweep}
191
- \label{sec:app_checks_output}
192
-
193
- The fixed-output check holds the score field $S$ and label field $Y$ fixed and varies only $\Lambda$.
194
- Table~\ref{tab:fireprone_contract_progression} reports the full global and fire-prone sweep for all retained backbones.
195
- The same table is the numeric counterpart to Figure~\ref{fig:fireprone_contract_progression}.
196
-
197
- \begin{table*}[t]
198
- \centering
199
- \scriptsize
200
- \setlength{\tabcolsep}{4pt}
201
- \caption{Occupancy \(F_1\) scores across global and fire-prone scopes. Global uses the full validation/test domain; top-\(k\) rows use train-defined fire-prone masks from historical fire frequency. Values are percentages from the same validation-selected strict threshold. Tolerance is spatial-only; union adds temporal and spatial matching. \(\Delta\) is union minus strict. Cells report five-seed mean with std in small type.}
202
- \label{tab:fireprone_contract_progression}
203
- \begin{tabular}{@{}llcccc@{}}
204
- \toprule
205
- Backbone & \(\Omega\) & Strict \(F_1\uparrow\) & Tol.\ \(F_1\uparrow\) & Union \(F_1\uparrow\) & \(\Delta\) \(\uparrow\) \\
206
- \midrule
207
- \ourfm & global & \ms{0.4546}{0.1412} & \ms{29.7484}{1.2868} & \ms{59.0656}{2.7372} & \ms{58.6109}{2.6945} \\
208
- & top 5\% & \ms{3.5604}{0.8809} & \ms{39.2617}{1.4011} & \ms{72.8280}{2.5784} & \ms{69.2676}{1.9960} \\
209
- & top 10\% & \ms{3.5575}{0.8799} & \ms{39.1665}{1.3906} & \ms{72.5204}{2.5670} & \ms{68.9629}{1.9888} \\
210
- & top 20\% & \ms{3.5300}{0.8700} & \ms{38.2849}{1.2952} & \ms{69.7228}{2.4664} & \ms{66.1928}{1.9273} \\
211
- \addlinespace[1pt]
212
- Prithvi-WxC & global & \ms{0.0552}{0.0039} & \ms{7.1649}{0.6557} & \ms{20.1853}{1.8299} & \ms{20.1301}{1.8297} \\
213
- & top 5\% & \ms{1.4119}{1.1635} & \ms{19.2636}{4.5019} & \ms{42.5793}{4.5495} & \ms{41.1674}{3.4846} \\
214
- & top 10\% & \ms{1.2376}{1.3201} & \ms{14.8780}{8.4429} & \ms{32.6913}{13.2085} & \ms{31.4536}{11.9053} \\
215
- & top 20\% & \ms{1.1520}{1.3770} & \ms{13.1512}{9.4556} & \ms{28.1319}{15.2866} & \ms{26.9800}{13.9224} \\
216
- \addlinespace[1pt]
217
- Aurora & global & \ms{0.0656}{0.0094} & \ms{8.5009}{1.9594} & \ms{23.1037}{4.9418} & \ms{23.0382}{4.9325} \\
218
- & top 5\% & \ms{0.9859}{0.9299} & \ms{15.1337}{6.0821} & \ms{35.4834}{11.0192} & \ms{34.4975}{10.3728} \\
219
- & top 10\% & \ms{0.7790}{1.0453} & \ms{12.7381}{6.5558} & \ms{30.5305}{10.8842} & \ms{29.7515}{9.8656} \\
220
- & top 20\% & \ms{0.6655}{1.1043} & \ms{10.5304}{7.4309} & \ms{24.9444}{12.5844} & \ms{24.2790}{11.4943} \\
221
- \addlinespace[1pt]
222
- ClimaX & global & \ms{0.3480}{0.0754} & \ms{29.7535}{3.6073} & \ms{60.1506}{7.5865} & \ms{59.8026}{7.5454} \\
223
- & top 5\% & \ms{1.2937}{0.1086} & \ms{34.5791}{2.3772} & \ms{69.2186}{5.7215} & \ms{67.9249}{5.7263} \\
224
- & top 10\% & \ms{1.2522}{0.1602} & \ms{34.3341}{2.2852} & \ms{68.5713}{5.5377} & \ms{67.3191}{5.5538} \\
225
- & top 20\% & \ms{1.0287}{0.2686} & \ms{30.2140}{4.2857} & \ms{60.0650}{7.5674} & \ms{59.0363}{7.5891} \\
226
- \addlinespace[1pt]
227
- StormCast & global & \ms{0.0626}{0.0119} & \ms{8.1951}{2.1895} & \ms{22.3817}{5.4294} & \ms{22.3191}{5.4178} \\
228
- & top 5\% & \ms{0.9573}{0.8011} & \ms{15.3219}{5.5337} & \ms{36.1857}{9.7331} & \ms{35.2284}{9.1816} \\
229
- & top 10\% & \ms{0.7284}{0.9280} & \ms{12.6669}{6.3290} & \ms{30.4748}{10.6527} & \ms{29.7464}{9.7494} \\
230
- & top 20\% & \ms{0.5795}{0.9104} & \ms{10.4157}{7.3437} & \ms{24.6598}{12.3973} & \ms{24.0803}{11.4988} \\
231
- \addlinespace[1pt]
232
- DLWP & global & \ms{0.1693}{0.0419} & \ms{14.9148}{3.2446} & \ms{28.1901}{6.9658} & \ms{28.0208}{6.9257} \\
233
- & top 5\% & \ms{1.8054}{0.4835} & \ms{31.7231}{3.2923} & \ms{55.4596}{5.2920} & \ms{53.6542}{5.4752} \\
234
- & top 10\% & \ms{1.6110}{0.5999} & \ms{27.6581}{5.9216} & \ms{47.1269}{8.0111} & \ms{45.5158}{7.7927} \\
235
- & top 20\% & \ms{1.5248}{0.8987} & \ms{20.9403}{4.7971} & \ms{34.9301}{7.8471} & \ms{33.4054}{7.8760} \\
236
- \addlinespace[1pt]
237
- FCN & global & \ms{0.2829}{0.0839} & \ms{19.5061}{3.3412} & \ms{40.0604}{9.3701} & \ms{39.7775}{9.3423} \\
238
- & top 5\% & \ms{1.6231}{0.5064} & \ms{29.3769}{2.7626} & \ms{54.3033}{7.4089} & \ms{52.6801}{7.4389} \\
239
- & top 10\% & \ms{1.1777}{0.5118} & \ms{22.4217}{3.9803} & \ms{43.4510}{9.2513} & \ms{42.2734}{9.0251} \\
240
- & top 20\% & \ms{0.9962}{0.4315} & \ms{16.9792}{3.9371} & \ms{34.0859}{8.2616} & \ms{33.0897}{7.9275} \\
241
- \addlinespace[1pt]
242
- FengWu & global & \ms{0.2613}{0.0757} & \ms{12.0050}{6.0239} & \ms{24.1022}{13.6293} & \ms{23.8410}{13.5736} \\
243
- & top 5\% & \ms{1.5695}{0.3592} & \ms{16.2763}{3.7024} & \ms{30.1055}{5.0103} & \ms{28.5360}{4.7696} \\
244
- & top 10\% & \ms{1.2427}{0.5333} & \ms{12.9503}{5.6052} & \ms{24.1854}{8.6854} & \ms{22.9427}{8.1863} \\
245
- & top 20\% & \ms{1.1192}{0.5023} & \ms{11.9508}{5.0745} & \ms{22.7860}{7.9115} & \ms{21.6668}{7.4438} \\
246
- \addlinespace[1pt]
247
- FuXi & global & \ms{0.3774}{0.1212} & \ms{21.0323}{4.8211} & \ms{37.2888}{9.4470} & \ms{36.9114}{9.4327} \\
248
- & top 5\% & \ms{2.0307}{0.6800} & \ms{31.8944}{4.7331} & \ms{53.9308}{8.3822} & \ms{51.9001}{8.6878} \\
249
- & top 10\% & \ms{1.6542}{0.7316} & \ms{24.0128}{5.7784} & \ms{40.2140}{9.9307} & \ms{38.5597}{9.7744} \\
250
- & top 20\% & \ms{1.3646}{0.6773} & \ms{21.9548}{5.8601} & \ms{36.7314}{10.0289} & \ms{35.3668}{9.9223} \\
251
- \addlinespace[1pt]
252
- Pangu-Weather & global & \ms{0.2755}{0.1089} & \ms{17.0909}{4.0477} & \ms{35.6386}{9.0327} & \ms{35.3630}{9.0774} \\
253
- & top 5\% & \ms{1.3656}{0.3064} & \ms{22.2222}{6.8613} & \ms{43.4234}{13.2383} & \ms{42.0578}{13.0599} \\
254
- & top 10\% & \ms{1.0931}{0.3535} & \ms{18.9337}{5.9329} & \ms{38.5325}{11.7221} & \ms{37.4394}{11.5261} \\
255
- & top 20\% & \ms{0.8844}{0.3601} & \ms{17.0172}{5.4859} & \ms{34.5688}{10.2932} & \ms{33.6844}{10.1334} \\
256
- \addlinespace[1pt]
257
- AlphaEarth & global & \ms{2.0606}{0.4404} & \ms{29.4476}{6.0064} & \ms{37.4286}{9.9458} & \ms{35.3679}{10.0271} \\
258
- & top 5\% & \ms{6.9133}{0.8450} & \ms{42.8790}{4.6087} & \ms{51.7449}{8.7321} & \ms{44.8315}{9.0763} \\
259
- & top 10\% & \ms{6.6366}{0.9901} & \ms{41.8981}{5.9454} & \ms{50.5712}{10.0057} & \ms{43.9346}{9.9156} \\
260
- & top 20\% & \ms{6.1908}{1.1330} & \ms{38.8325}{7.4966} & \ms{46.3833}{12.1697} & \ms{40.1925}{11.6788} \\
261
- \bottomrule
262
- \end{tabular}
263
- \end{table*}
264
-
265
- % ────────────────────────────────────────────────────────────
266
- \subsection{Fixed-Feature Check: Selection Summary}
267
- \label{sec:app_checks_feature}
268
-
269
- The paper appendix keeps the fixed-feature result at the selection-summary level.
270
- The full per-head rows are retained in the supplementary CSV files and are not repeated as a manuscript table because many degenerate heads produce identical zero decision scores.
271
- The supplementary selection rows report the decision-score loss after changing only the head-selection metric.
272
-
273
- % ────────────────────────────────────────────────────────────
274
- \subsection{Selection Regret Under Matching Rules}
275
- \label{sec:app_checks_regret}
276
-
277
- The fixed-feature check trains the same head family $\mathcal{H}$ on a fixed feature source and changes only the selection metric.
278
- Table~\ref{tab:appendix_selection_regret_tolerance} reports the same selection comparison under exact, tolerated, and union matching.
279
- Here, \(h_R\) is selected by PR-AUC and \(h_D\) is selected by the decision metric.
280
- The reported regret is \(D(h_D)-D(h_R)\).
281
- Exact zero entries mean the two selectors give the same decision score for all five seeds.
282
-
283
- \begin{table*}[!t]
284
- \centering
285
- \scriptsize
286
- \setlength{\tabcolsep}{4pt}
287
- \caption{Selection-regret values under exact, tolerated, and union matching. Values are percentage-point regret from selecting \(h_R\) by PR-AUC instead of \(h_D\) by the decision metric. Rows report mean with small std over five seeds; \(0.0000\) denotes exact zero regret.}
288
- \label{tab:appendix_selection_regret_tolerance}
289
- \begin{adjustbox}{max width=\textwidth}
290
- \begin{tabular}{llccc}
291
- \toprule
292
- \textbf{Feature} & \textbf{\(\Omega\)} & \textbf{Exact regret} & \textbf{Tolerated regret} & \textbf{Union regret} \\
293
- \midrule
294
- \ourfm & global & 0.0000 & \ms{8.7830}{9.6705} & \ms{8.7830}{9.6705} \\
295
- \ourfm & fire-prone & 0.0000 & \ms{3.4027}{3.2045} & \ms{3.4027}{3.2045} \\
296
- Prithvi-WxC & global & 0.0000 & 0.0000 & 0.0000 \\
297
- Prithvi-WxC & fire-prone & 0.0000 & 0.0000 & 0.0000 \\
298
- Aurora & global & \ms{0.0200}{0.0267} & \ms{9.8520}{12.9878} & \ms{9.8520}{12.9878} \\
299
- Aurora & fire-prone & \ms{0.8203}{1.8341} & \ms{14.3919}{32.1219} & \ms{14.3919}{32.1219} \\
300
- ClimaX & global & \ms{0.0003}{0.0004} & \ms{0.1296}{0.1775} & \ms{0.1296}{0.1775} \\
301
- ClimaX & fire-prone & 0.0000 & 0.0000 & 0.0000 \\
302
- StormCast & global & 0.0000 & 0.0000 & 0.0000 \\
303
- StormCast & fire-prone & 0.0000 & 0.0000 & 0.0000 \\
304
- DLWP & global & 0.0000 & 0.0000 & 0.0000 \\
305
- DLWP & fire-prone & \ms{0.0770}{0.1100} & \ms{4.3266}{4.3323} & \ms{4.3266}{4.3323} \\
306
- FCN & global & 0.0000 & 0.0000 & 0.0000 \\
307
- FCN & fire-prone & \ms{0.0006}{0.0013} & \ms{1.1680}{1.9872} & \ms{1.1680}{1.9872} \\
308
- FengWu & global & 0.0000 & 0.0000 & 0.0000 \\
309
- FengWu & fire-prone & \ms{0.0691}{0.1191} & \ms{0.5222}{0.6239} & \ms{0.5222}{0.6239} \\
310
- FuXi & global & 0.0000 & 0.0000 & 0.0000 \\
311
- FuXi & fire-prone & 0.0000 & \ms{0.1084}{0.1729} & \ms{0.1084}{0.1729} \\
312
- Pangu-Weather & global & 0.0000 & 0.0000 & 0.0000 \\
313
- Pangu-Weather & fire-prone & \ms{0.0728}{0.1179} & \ms{0.1849}{0.3263} & \ms{0.1849}{0.3263} \\
314
- AlphaEarth & global & 0.0000 & \ms{17.2217}{8.8492} & \ms{17.2217}{8.8492} \\
315
- AlphaEarth & fire-prone & 0.0000 & \ms{3.8804}{5.9483} & \ms{3.8804}{5.9483} \\
316
- \bottomrule
317
- \end{tabular}
318
- \end{adjustbox}
319
- \end{table*}
320
-
321
-
322
- % ───────────────────────────────────────────────────────────���
323
- \subsection{Additional Value Tables}
324
- \label{sec:app_checks_values}
325
-
326
- Table~\ref{tab:app_occupancy_ppr_scope}
327
- reports the predicted-positive rate behind the occupancy \(F_1\) sweep.
328
-
329
- \begin{table*}[t]
330
- \centering
331
- \small
332
- \setlength{\tabcolsep}{4pt}
333
- \renewcommand{\arraystretch}{1.18}
334
- \caption{For fixed occupancy \(\mathcal{T}\), this table reports predicted-positive rate.
335
- Values are percentages under the same validation-selected strict threshold.
336
- Scopes \(\Omega\) are fixed before test scoring; cells report five-seed mean with std in small type.}
337
- \label{tab:app_occupancy_ppr_scope}
338
- \begin{tabular}{lcccc}
339
- \toprule
340
- \textbf{Backbone} & \textbf{\(\Omega=\)global} & \textbf{\(\Omega=\)top 5\%} & \textbf{\(\Omega=\)top 10\%} & \textbf{\(\Omega=\)top 20\%} \\
341
- \midrule
342
- \ourfm & \ms{1.6808}{0.3684} & \ms{3.0619}{1.0925} & \ms{1.5310}{0.5463} & \ms{0.7655}{0.2732} \\
343
- Prithvi-WxC & \ms{61.9711}{30.9101} & \ms{57.4117}{47.8987} & \ms{58.4565}{51.0897} & \ms{58.9788}{52.6991} \\
344
- Aurora & \ms{55.5849}{19.7524} & \ms{57.2238}{35.3400} & \ms{68.7942}{37.6958} & \ms{67.2891}{38.3991} \\
345
- ClimaX & \ms{5.6763}{3.9261} & \ms{24.0091}{9.2816} & \ms{11.8450}{4.5067} & \ms{5.7442}{4.1341} \\
346
- StormCast & \ms{60.6507}{17.4895} & \ms{57.6017}{35.2921} & \ms{68.0766}{37.3899} & \ms{67.8397}{39.2410} \\
347
- DLWP & \ms{4.3221}{1.5619} & \ms{9.4001}{5.0807} & \ms{4.9700}{3.6849} & \ms{1.9198}{1.4678} \\
348
- FCN & \ms{1.5202}{1.3446} & \ms{4.7856}{2.9409} & \ms{2.7257}{1.6353} & \ms{0.8368}{0.2358} \\
349
- FengWu & \ms{0.4277}{0.4830} & \ms{0.6004}{0.3041} & \ms{0.2609}{0.1935} & \ms{0.1501}{0.1206} \\
350
- FuXi & \ms{0.4505}{0.2773} & \ms{2.9315}{2.6392} & \ms{0.5197}{0.6074} & \ms{0.3621}{0.4346} \\
351
- Pangu-Weather & \ms{1.0801}{1.1308} & \ms{2.0549}{2.1893} & \ms{1.4029}{1.4739} & \ms{1.0103}{1.1084} \\
352
- AlphaEarth & \ms{0.0691}{0.0499} & \ms{0.2826}{0.1497} & \ms{0.1524}{0.0770} & \ms{0.0656}{0.0414} \\
353
- \bottomrule
354
- \end{tabular}
355
- \end{table*}
356
-
357
- Tables~\ref{tab:app_spread_ap_by_scope}--\ref{tab:app_heat_event_pr}
358
- report additional values that are not repeated in the main tables.
359
- Each table fixes the task \(\mathcal{T}\) and reports either a different \(\Omega\), metric, or event subset.
360
-
361
- \begin{table*}[t]
362
- \centering
363
- \scriptsize
364
- \setlength{\tabcolsep}{3pt}
365
- \caption{For fixed spread \(\mathcal{T}\) and strict \(\Lambda\), this table reports AP under three \(\Omega\) scopes: full test, top-5\% train-fire area, and top-10\% train-fire area. Values are percentages; cells report mean with small std.}
366
- \label{tab:app_spread_ap_by_scope}
367
- \begin{tabular}{lccc}
368
- \toprule
369
- Backbone & full \(\Omega\) AP & top-5\% \(\Omega\) AP & top-10\% \(\Omega\) AP \\
370
- \midrule
371
- \ourfm & \ms{30.0197}{1.5651} & \ms{40.7452}{2.0542} & \ms{37.4096}{1.8731} \\
372
- Prithvi-WxC & \ms{4.8319}{0.1731} & \ms{12.6086}{0.4468} & \ms{8.7051}{0.1889} \\
373
- Aurora & \ms{17.7723}{0.4293} & \ms{30.3106}{0.9404} & \ms{26.4732}{0.6932} \\
374
- ClimaX & \ms{11.1726}{0.2337} & \ms{25.7871}{1.2896} & \ms{19.9977}{1.2217} \\
375
- StormCast & \ms{8.1147}{1.1569} & \ms{18.5461}{1.1727} & \ms{14.1286}{1.2956} \\
376
- DLWP & \ms{9.2142}{2.6587} & \ms{19.3346}{2.3922} & \ms{14.9788}{2.6696} \\
377
- FCN & \ms{6.6774}{1.3001} & \ms{16.7396}{3.2955} & \ms{11.9308}{2.3881} \\
378
- FengWu & \ms{11.0046}{2.7092} & \ms{21.1506}{1.2163} & \ms{17.0113}{1.5778} \\
379
- FuXi & \ms{13.5507}{0.3840} & \ms{22.5434}{0.4100} & \ms{19.1964}{0.3943} \\
380
- Pangu-Weather & \ms{10.6250}{1.4643} & \ms{19.8294}{1.3044} & \ms{15.8013}{1.1602} \\
381
- AlphaEarth & \ms{12.2847}{1.3562} & \ms{22.8692}{0.4915} & \ms{18.2992}{1.2110} \\
382
- \bottomrule
383
- \end{tabular}
384
- \end{table*}
385
-
386
- \begin{table*}[t]
387
- \centering
388
- \scriptsize
389
- \setlength{\tabcolsep}{3pt}
390
- \caption{For fixed final-area \(\mathcal{T}\) and \(\Omega\), this table reports median log error and acre-scale errors in addition to the main log-RMSE/log-MAE/Spearman metrics. Cells report mean with small std.}
391
- \label{tab:app_burned_area_median_acre}
392
- \begin{tabular}{lccc}
393
- \toprule
394
- Backbone & log median AE & acre median AE & acre MAPE \\
395
- \midrule
396
- \ourfm & \ms{1.0235}{0.0982} & \ms{4504.0692}{459.0483} & \ms{1.4525}{0.0254} \\
397
- Prithvi-WxC & \ms{1.2184}{0.2107} & \ms{5375.8770}{788.7906} & \ms{1.9517}{0.2875} \\
398
- Aurora & \ms{1.4547}{0.0301} & \ms{9904.9483}{457.4260} & \ms{6.8728}{3.0026} \\
399
- ClimaX & \ms{1.6841}{0.1818} & \ms{18130.4820}{3248.3873} & \ms{8.2373}{2.8540} \\
400
- StormCast & \ms{1.4522}{0.1519} & \ms{11155.7881}{2020.8656} & \ms{4.6142}{1.1500} \\
401
- DLWP & \ms{1.0952}{0.1306} & \ms{4406.9315}{303.0944} & \ms{1.7357}{0.3625} \\
402
- FCN & \ms{1.1688}{0.1139} & \ms{5166.9993}{213.0333} & \ms{2.0800}{0.4004} \\
403
- FengWu & \ms{1.1589}{0.1772} & \ms{5137.2822}{628.7543} & \ms{2.0944}{0.4545} \\
404
- FuXi & \ms{1.1855}{0.0612} & \ms{5697.7117}{796.8785} & \ms{2.4411}{0.5567} \\
405
- Pangu-Weather & \ms{1.1221}{0.1470} & \ms{5092.3621}{483.8243} & \ms{1.9571}{0.3113} \\
406
- AlphaEarth & \ms{1.7459}{0.6057} & \ms{15110.7573}{7106.3417} & \ms{9.7398}{2.7425} \\
407
- \bottomrule
408
- \end{tabular}
409
- \end{table*}
410
-
411
- \begin{table*}[t]
412
- \centering
413
- \scriptsize
414
- \setlength{\tabcolsep}{3pt}
415
- \caption{For fixed retrieval \(\mathcal{T}\) and \(\Omega\), this table reports nDCG@5, best log gap, and rank \(\rho\) in addition to the main nDCG@10/log-error metrics. Cells report mean with small std.}
416
- \label{tab:app_analog_rank_depth}
417
- \begin{tabular}{lccc}
418
- \toprule
419
- Backbone & nDCG@5 & best log gap & rank $\rho$ \\
420
- \midrule
421
- \ourfm & \ms{0.5175}{0.0445} & \ms{0.1868}{0.0285} & \ms{0.6019}{0.1460} \\
422
- Prithvi-WxC & \ms{0.3591}{0.0107} & \ms{0.2151}{0.0594} & \ms{0.1514}{0.1489} \\
423
- Aurora & \ms{0.4423}{0.0210} & \ms{0.1551}{0.0437} & \ms{0.2162}{0.1856} \\
424
- ClimaX & \ms{0.4151}{0.0293} & \ms{0.2129}{0.0653} & \ms{0.1587}{0.2831} \\
425
- StormCast & \ms{0.3960}{0.0240} & \ms{0.1714}{0.0310} & \ms{0.1258}{0.1625} \\
426
- DLWP & \ms{0.3795}{0.0274} & \ms{0.1944}{0.0807} & \ms{-0.3865}{0.2802} \\
427
- FCN & \ms{0.4250}{0.0112} & \ms{0.1856}{0.0846} & \ms{-0.1357}{0.2571} \\
428
- FengWu & \ms{0.4228}{0.0310} & \ms{0.1870}{0.0858} & \ms{-0.1926}{0.2194} \\
429
- FuXi & \ms{0.4544}{0.0356} & \ms{0.2171}{0.0806} & \ms{-0.1367}{0.2885} \\
430
- Pangu-Weather & \ms{0.3988}{0.0506} & \ms{0.1901}{0.0838} & \ms{-0.1970}{0.2216} \\
431
- AlphaEarth & \ms{0.5276}{0.0531} & \ms{0.1782}{0.0454} & \ms{0.4639}{0.2802} \\
432
- \bottomrule
433
- \end{tabular}
434
- \end{table*}
435
-
436
- \begin{table*}[t]
437
- \centering
438
- \scriptsize
439
- \setlength{\tabcolsep}{3pt}
440
- \caption{For fixed smoke \(\mathcal{T}\) and station \(\Omega\), this table reports RMSE, MAE, and 90th-percentile absolute error on test rows with observed PM$_{2.5}\ge35$; std uses a row bootstrap over those rows. Cells report mean with small std.}
441
- \label{tab:app_smoke_high_event}
442
- \begin{tabular}{lccc}
443
- \toprule
444
- Backbone & high-smoke RMSE & high-smoke MAE & high-smoke 90th AE \\
445
- \midrule
446
- \ourfm & \ms{47.4870}{0.6346} & \ms{34.3954}{0.7654} & \ms{65.6213}{3.8778} \\
447
- Prithvi-WxC & \ms{57.2224}{1.7268} & \ms{47.3871}{0.3153} & \ms{74.9666}{3.2381} \\
448
- Aurora & \ms{57.2752}{1.7248} & \ms{47.4368}{0.3149} & \ms{75.0755}{3.1074} \\
449
- ClimaX & \ms{57.2828}{1.7239} & \ms{47.4407}{0.3140} & \ms{75.1012}{3.0777} \\
450
- StormCast & \ms{56.6512}{1.7517} & \ms{46.7914}{0.3281} & \ms{74.0794}{3.4707} \\
451
- DLWP & \ms{57.0075}{1.7359} & \ms{47.1971}{0.3198} & \ms{74.4936}{3.3826} \\
452
- FCN & \ms{57.0582}{1.7339} & \ms{47.2401}{0.3187} & \ms{74.6431}{3.1982} \\
453
- FengWu & \ms{57.0158}{1.7357} & \ms{47.1957}{0.3194} & \ms{74.5652}{3.2871} \\
454
- FuXi & \ms{56.9622}{1.7371} & \ms{47.1508}{0.3201} & \ms{74.3278}{3.4435} \\
455
- Pangu-Weather & \ms{57.1282}{1.7307} & \ms{47.3050}{0.3170} & \ms{74.6830}{3.2375} \\
456
- AlphaEarth & \ms{48.0665}{0.7904} & \ms{35.6088}{0.7341} & \ms{66.7613}{3.9235} \\
457
- \bottomrule
458
- \end{tabular}
459
- \end{table*}
460
-
461
- \begin{table*}[t]
462
- \centering
463
- \scriptsize
464
- \setlength{\tabcolsep}{3pt}
465
- \caption{For fixed heat \(\mathcal{T}\) and heat-region \(\Omega\), this table reports precision and recall for the exceedance label used by the main \(F_1\). Cells report mean with small std.}
466
- \label{tab:app_heat_event_pr}
467
- \begin{tabular}{lcc}
468
- \toprule
469
- Backbone & precision & recall \\
470
- \midrule
471
- \ourfm & \ms{0.9767}{0.0117} & \ms{0.9330}{0.0299} \\
472
- Prithvi-WxC & \ms{0.8260}{0.0030} & \ms{0.9173}{0.0033} \\
473
- Aurora & \ms{0.5920}{0.0347} & \ms{0.0517}{0.0020} \\
474
- ClimaX & \ms{0.7397}{0.0099} & \ms{0.7994}{0.0051} \\
475
- StormCast & \ms{0.8840}{0.0237} & \ms{0.9320}{0.0165} \\
476
- DLWP & \ms{0.9429}{0.0085} & \ms{0.8899}{0.0167} \\
477
- FCN & \ms{0.9408}{0.0097} & \ms{0.9111}{0.0127} \\
478
- FengWu & \ms{0.3808}{0.2719} & \ms{0.0266}{0.0267} \\
479
- FuXi & \ms{0.3262}{0.1262} & \ms{0.1810}{0.0481} \\
480
- Pangu-Weather & \ms{0.1159}{0.0743} & \ms{0.0112}{0.0032} \\
481
- AlphaEarth & \ms{0.9824}{0.0040} & \ms{0.9278}{0.0178} \\
482
- \bottomrule
483
- \end{tabular}
484
- \end{table*}
485
-
486
- \clearpage
487
-
488
- % ============================================================
489
- % C COMPARATOR ELIGIBILITY NOTES
490
- % ============================================================
491
- \section{Comparator Eligibility Notes}
492
- \label{sec:comparator_audit}
493
-
494
- All numeric comparator rows in Tables~\ref{tab:primary_results} and~\ref{tab:supporting_results}
495
- are included only after the task form, metric, matching rule, scope, and head family are fixed.
496
- The appendix does not repeat those full matrices.
497
- The key eligibility rule is simple: reported rows satisfy the same contract as the row block in which they appear, while excluded rows are excluded because their representation or output form does not satisfy that contract.
498
-
499
- \noindent\textbf{Reading rule.}
500
- Exact-only, tolerated, union, ranking, retrieval, and regression scores answer different questions.
501
- The fixed-contract reading is therefore to compare entries only within one row block and not to average across task forms.
502
-
503
- \clearpage
504
-
505
- % ============================================================
506
- % D SEEDED AUDITS
507
- % ============================================================
508
- \section{Seeded Audits}
509
- \label{sec:app_seeded_audits}
510
-
511
- \subsection{Seed Robustness Summary}
512
- \label{sec:app_seed_robustness}
513
-
514
- Table~\ref{tab:app_seed_robustness} summarizes stochastic checks used to support the reported mean-with-std convention.
515
- It is not a replacement for the main fixed-contract result tables.
516
-
517
- \begin{table}[h]
518
- \centering
519
- \small
520
- \setlength{\tabcolsep}{5pt}
521
- \renewcommand{\arraystretch}{1.2}
522
- \caption{Seed summaries for stochastic checks. Values report mean with small std over completed seeds.}
523
- \label{tab:app_seed_robustness}
524
- \begin{adjustbox}{max width=\textwidth}
525
- \begin{tabular}{p{0.28\textwidth}cllp{0.18\textwidth}}
526
- \toprule
527
- \textbf{\(\mathcal{T}\) check} & \textbf{Seeds} & \textbf{Primary value} & \textbf{Other value(s)} & \textbf{Reading} \\
528
- \midrule
529
- Final burned area &
530
- 5 & log-RMSE \ms{1.1657}{0.0126} &
531
- log-MAE \ms{1.0423}{0.0081}; Spear.\ \ms{0.6298}{0.0338} &
532
- stable across seeds \\
533
- Smoke PM\(_{2.5}\) &
534
- 5 & RMSE \ms{4.4646}{0.0060} &
535
- MAE \ms{2.4108}{0.0016}; \(r\) \ms{0.6368}{0.0013} &
536
- stable at table precision \\
537
- Extreme heat &
538
- 5 & RMSE-C \ms{0.2179}{0.0043} &
539
- MAE-C \ms{0.1787}{0.0018}; exceed.\ \(F_1\) \ms{0.9541}{0.0164} &
540
- stable across seeds \\
541
- Fire spread &
542
- 5 & exact \(F_1\) \ms{37.6700}{0.9800} &
543
- spatial \(F_1\) \ms{80.9700}{2.0200}; AP \ms{30.0900}{1.2500} &
544
- stable across seeds \\
545
- Aurora paired-head check &
546
- 5 & fire-prone score diff.\ \ms{6.3500}{13.2800} &
547
- PR-AUC and union choices differ in 2/5 seeds &
548
- variable across seeds \\
549
- \bottomrule
550
- \end{tabular}
551
- \end{adjustbox}
552
- \end{table}
553
-
554
- \clearpage
555
-
556
- % ============================================================
557
- % E LIGHTWEIGHT HEAD AND ADAPTATION DETAILS
558
- % ============================================================
559
- \section{Lightweight Head and Adaptation Details}
560
- \label{sec:app_heads}
561
-
562
- All frozen-transfer comparisons use the same five lightweight head architectures applied
563
- on top of the frozen backbone representations.
564
- Table~\ref{tab:app_head_architectures} summarises each head family, its architecture,
565
- approximate parameter count, and the adaptation procedure used.
566
-
567
- \begin{table}[h]
568
- \centering
569
- \small
570
- \setlength{\tabcolsep}{5pt}
571
- \renewcommand{\arraystretch}{1.3}
572
- \caption{Lightweight head architectures used in the fixed-contract transfer comparisons.
573
- All heads are trained from random initialisation on the frozen backbone features.
574
- Parameter counts are approximate and depend on the feature dimensionality of each backbone.}
575
- \label{tab:app_head_architectures}
576
- \begin{tabular}{p{0.15\textwidth}p{0.30\textwidth}p{0.12\textwidth}p{0.33\textwidth}}
577
- \toprule
578
- \textbf{$\mathcal{A}$ head} & \textbf{Architecture} & \textbf{Approx.\ params} & \textbf{Notes} \\
579
- \midrule
580
- Constant prior &
581
- Outputs a fixed bias vector, ignoring input features. &
582
- Output dimension only &
583
- Provides a degenerate baseline; selected when backbone features carry no useful signal. \\
584
- Linear probe &
585
- Single linear layer mapping backbone features to output. No nonlinearity. &
586
- $d\times c + c$ &
587
- Standard frozen-representation baseline. \\
588
- Pixel MLP &
589
- Two-layer MLP applied independently per spatial unit. &
590
- $d\times h + h\times c$ &
591
- Captures per-pixel nonlinearity; ignores spatial context. \\
592
- Shallow adapter &
593
- Two-layer MLP with a spatial context window; uses $3\times3$ convolution before the linear output. &
594
- $9dh + hc$ &
595
- Balances local spatial context with parameter efficiency. \\
596
- Wide adapter &
597
- Shallow adapter with wider hidden dimension. &
598
- $9dH + Hc$ &
599
- Higher capacity variant; can overfit on small fire-event sets. \\
600
- \bottomrule
601
- \end{tabular}
602
- \end{table}
603
-
604
- \noindent\textbf{Training protocol.}
605
- Each occupancy head-control run uses seeds $\{1,7,42,99,123\}$, the five heads listed above, and the fixed variants identity, erode-r1, and close-r1.
606
- The spread U-Net reference is trained for 4 epochs.
607
- The threshold $\tau$ is selected on the validation split by maximising union-$F_1$ (for occupancy) or spatial $F_1$ (for spread) and held fixed at test time.
608
- Morphology parameters (spatial tolerance $k$, temporal tolerance $\Delta t$) are fixed as part of the evaluation contract and are not tuned after validation.
609
-
610
- \noindent\textbf{Head selection procedure.}
611
- For each (feature source, scope, seed) tuple, all five heads are trained independently.
612
- The PR-AUC-based selector picks $h_R = \arg\max_{h \in \mathcal{H}} R(h)$ on the validation set;
613
- the decision-based selector picks $h_D = \arg\max_{h \in \mathcal{H}} D(h)$ on the same set.
614
- The selection regret $\delta = D(h_D) - D(h_R) \ge 0$ is computed on the held-out test set.
615
-
616
- \clearpage
617
-
618
- % ============================================================
619
- % F LIMITATIONS
620
- % ============================================================
621
- \section{Limitations}
622
- \label{sec:limitations}
623
-
624
- The conclusions apply to the task forms, scopes, evaluation rules, and comparator eligibility decisions used in this study.
625
- The evaluation covers selected wildfire decision tasks and supporting retrieval and regression task forms.
626
- Comparator eligibility is fixed before metric values are interpreted.
627
- This eligibility rule keeps each comparison within one task-form contract.
628
- It also leaves some model and task pairs outside the evaluated comparison set by design.
629
-
630
- The transfer comparison uses frozen backbones with lightweight heads.
631
- The results therefore describe frozen-backbone transfer under the allowed head families in each contract.
632
- Full fine-tuning, alternative adaptation procedures, and broader head families are outside the evaluated scope.
633
- The task-specific reference baselines serve as empirical anchors for same-contract comparison.
634
- \ourfm is a regional wildfire reference for the reported California fixed-contract experiments.
635
-
636
-
637
- The supporting retrieval and regression checks bound the primary spatial decision claim.
638
- They provide task-form evidence rather than a single score across all wildfire-related prediction tasks.
639
- The analysis focuses on the reported metric families, matching rules, and fixed comparison choices.
640
- Operational response rules, intervention costs, and deployment policies are part of wildfire early-warning use contexts~\cite{goldammer1999early,pickell2017early,farahmand2020fdeo}.
641
- They are outside the scope of this evaluation study and are not inferred from the reported scores.
642
-
643
- \clearpage
644
-
645
- % ============================================================
646
- % G REPRODUCIBILITY AND EVALUATION ARTIFACTS
647
- % ============================================================
648
- \section{Reproducibility and Evaluation Artifacts}
649
- \label{sec:repro_compute_impact}
650
- \subsection{External Assets and Terms of Use}
651
- \label{sec:external_assets_terms}
652
-
653
- We use external datasets and model assets only for research evaluation.
654
- Access to each asset follows the original provider's portal, license, or terms of use; this submission does not imply that all assets are openly redistributable.
655
- We do not redistribute raw external datasets, provider-hosted embeddings, or third-party model weights.
656
- Table~\ref{tab:external_assets_licenses} records the source and terms-of-use status used to interpret reproducibility.
657
-
658
- \begin{table}[h]
659
- \centering
660
- \small
661
- \setlength{\tabcolsep}{4pt}
662
- \renewcommand{\arraystretch}{1.18}
663
- \caption{External assets used by the study and their source or terms-of-use status.}
664
- \label{tab:external_assets_licenses}
665
- \begin{tabular}{p{0.25\textwidth}p{0.34\textwidth}p{0.34\textwidth}}
666
- \toprule
667
- \textbf{Asset family} & \textbf{Use in this study} & \textbf{Source and terms-of-use note} \\
668
- \midrule
669
- NOAA HRRR fields~\cite{noaa_hrrr_ncei,noaa_hrrr_emc}
670
- & Dynamic weather inputs for \ourfm and transfer tasks.
671
- & NOAA provider terms and citation requirements apply. \\
672
- NASA FIRMS~\cite{nasa_firms}
673
- & Active-fire occupancy supervision.
674
- & NASA Earthdata/FIRMS access terms and citation requirements apply. \\
675
- LANDFIRE and WRC layers~\cite{landfire_fbfm40,landfire_canopy_cover,usfs_wrc_housing_density}
676
- & Static fuel, canopy, and exposure context.
677
- & Original geospatial-product provider terms and citations apply. \\
678
- LandScan~\cite{ornl_landscan_2024}
679
- & Static population context.
680
- & ORNL/LandScan source-specific access terms apply; raw data are not redistributed. \\
681
- WFIGS and MTBS~\cite{nifc_wfigs_perimeters,mtbs_usgs_2025}
682
- & Event-level resources for burned-area and analog tasks.
683
- & Original incident/perimeter-product provider terms and citations apply. \\
684
- External Earth-FM baselines~\cite{schmude2024prithviwxc,bodnar2025aurora,nguyen2023climax,pathak2024stormcast,weyn2020dlwp,pathak2022fourcastnet,chen2023fengwu,chen2023fuxi,bi2023panguweather,brown2025alphaearth}
685
- & Frozen comparator representations or task-model baselines.
686
- & Original model-provider licenses and access terms apply; third-party weights are not redistributed. \\
687
- \bottomrule
688
- \end{tabular}
689
- \end{table}
690
-
691
- This note supports the NeurIPS checklist and identifies the files that support the reported claims.
692
- This file statement does not imply full raw-data release.
693
- The main claims can be checked from the manuscript contracts, metric
694
- definitions, and per-head result files, even if full raw-data release is
695
- delayed or limited. Sections~3 and~4 specify the contract components used by
696
- the main claims: task definition, split logic, label space, tolerance
697
- parameters, scope definitions, threshold or operating-point rules, and
698
- lightweight-head set.
699
-
700
- The supplementary source includes the check scripts, per-head and per-seed
701
- CSV result files, and \LaTeX{} result tables for the expanded check and matching-rule support.
702
- These files expose exact \(F_1\),
703
- tolerated \(F_1\), union-\(F_1\), PR-AUC, per-head selection,
704
- top-1 agreement, and selection-regret arithmetic. The manuscript also includes
705
- full figure and table reproduction values in result tables and appendix tables.
706
- These files provide a runnable check of the
707
- selection-regret arithmetic and the table-construction logic from fixed
708
- per-head rows. The seeded occupancy check uses seeds
709
- $\{1,7,42,99,123\}$, and the spread task-specific U-Net check uses repeated seeds; reported error bars are standard deviations over the completed
710
- seeded runs. Full raw wildfire inputs and large feature arrays are not
711
- released at submission because redistribution and storage constraints require a
712
- separate review.
713
-
714
- For stochastic results, the paper reports mean with standard deviation over repeated seeds.
715
- For fixed-output or fixed-feature controls, the table uses one fixed output or feature set; the changed item is the matching rule or selection metric.
716
-
717
- The reported experiments use two resource classes on a shared Slurm-managed
718
- cluster. Tabular retrieval/regression checks and same-feature head controls run
719
- on CPU workers with 4 to 8 cores, 24 to 64~GB host memory, and 2 to 4~hour wall-clock
720
- limits. Spread U-Net training and threshold calibration run on single-GPU jobs
721
- with one B200 GPU, 8 CPU cores, 96~GB host memory, and a 4~hour wall-clock
722
- limit. The seed/check waves reported in the appendix correspond to roughly
723
- 78 CPU job-hours and 12 GPU job-hours of scheduled wall-clock budget;
724
- exploratory runs are not included in the reported compute accounting.
725
-
726
- The raw-data limitation is separate from the selection-regret files.
727
- The supplementary source is sufficient to inspect the selection-regret arithmetic and reproduce the reported tables.
728
- Full end-to-end recomputation from raw wildfire inputs is not included at submission because redistribution review is still required.
729
- The broader impact is evaluation-facing rather than operational.
730
- Better reading of wildfire transfer evidence can reduce overconfident benchmark claims, while misread transfer results could still encourage inappropriate reliance on models with low decision scores.
731
- For that reason, the paper keeps its claims wildfire-centered, decision-task
732
- specific, and explicitly separate from any predictive deployment
733
- recommendation.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
paper_outputs/figures/fig_selection_regret_rq2.tikz DELETED
@@ -1,120 +0,0 @@
1
- % Auto-generated by scripts/build_selection_regret_rq2_figure.py.
2
- \begin{tikzpicture}[x=1cm,y=1cm]
3
- \footnotesize
4
- \draw[black!12, line width=0.35pt] (2.450,-0.350) -- (2.450,4.530);
5
- \node[anchor=north, font=\scriptsize, text=black!70] at (2.450,-0.410) {-20};
6
- \draw[black!12, line width=0.35pt] (3.243,-0.350) -- (3.243,4.530);
7
- \node[anchor=north, font=\scriptsize, text=black!70] at (3.243,-0.410) {-10};
8
- \draw[wfgray, line width=0.55pt] (4.036,-0.350) -- (4.036,4.530);
9
- \node[anchor=north, font=\scriptsize, text=black!70] at (4.036,-0.410) {0};
10
- \draw[black!12, line width=0.35pt] (4.829,-0.350) -- (4.829,4.530);
11
- \node[anchor=north, font=\scriptsize, text=black!70] at (4.829,-0.410) {10};
12
- \draw[black!12, line width=0.35pt] (5.621,-0.350) -- (5.621,4.530);
13
- \node[anchor=north, font=\scriptsize, text=black!70] at (5.621,-0.410) {20};
14
- \draw[black!12, line width=0.35pt] (6.414,-0.350) -- (6.414,4.530);
15
- \node[anchor=north, font=\scriptsize, text=black!70] at (6.414,-0.410) {30};
16
- \draw[black!12, line width=0.35pt] (7.207,-0.350) -- (7.207,4.530);
17
- \node[anchor=north, font=\scriptsize, text=black!70] at (7.207,-0.410) {40};
18
- \draw[black!12, line width=0.35pt] (8.000,-0.350) -- (8.000,4.530);
19
- \node[anchor=north, font=\scriptsize, text=black!70] at (8.000,-0.410) {50};
20
- \draw[black!45, line width=0.4pt] (2.450,-0.350) -- (8.000,-0.350);
21
- \node[anchor=east, font=\scriptsize, text=black!82] at (2.320,4.350) {\textcolor{wfblue}{\textbf{FireWx-FM ref.}}};
22
- \draw[wfslate, line width=0.72pt] (4.030,4.220) -- (5.212,4.220);
23
- \draw[wfslate, line width=0.72pt] (4.030,4.185) -- (4.030,4.255);
24
- \draw[wfslate, line width=0.72pt] (5.212,4.185) -- (5.212,4.255);
25
- \filldraw[wfslate] (4.621,4.220) circle[radius=0.045];
26
- \draw[wforange, line width=0.72pt] (4.051,4.480) -- (4.487,4.480);
27
- \draw[wforange, line width=0.72pt] (4.051,4.445) -- (4.051,4.515);
28
- \draw[wforange, line width=0.72pt] (4.487,4.445) -- (4.487,4.515);
29
- \filldraw[wforange] (4.224,4.435) rectangle (4.314,4.525);
30
- \node[anchor=east, font=\scriptsize, text=black!82] at (2.320,3.940) {Prithvi-WxC};
31
- \draw[wfslate, line width=0.72pt] (4.036,3.810) -- (4.036,3.810);
32
- \draw[wfslate, line width=0.72pt] (4.036,3.775) -- (4.036,3.845);
33
- \draw[wfslate, line width=0.72pt] (4.036,3.775) -- (4.036,3.845);
34
- \filldraw[wfslate] (4.036,3.810) circle[radius=0.045];
35
- \draw[wforange, line width=0.72pt] (4.036,4.070) -- (4.036,4.070);
36
- \draw[wforange, line width=0.72pt] (4.036,4.035) -- (4.036,4.105);
37
- \draw[wforange, line width=0.72pt] (4.036,4.035) -- (4.036,4.105);
38
- \filldraw[wforange] (3.991,4.025) rectangle (4.081,4.115);
39
- \node[anchor=east, font=\scriptsize, text=black!82] at (2.320,3.530) {Aurora};
40
- \draw[wfslate, line width=0.72pt] (3.580,3.400) -- (5.276,3.400);
41
- \draw[wfslate, line width=0.72pt] (3.580,3.365) -- (3.580,3.435);
42
- \draw[wfslate, line width=0.72pt] (5.276,3.365) -- (5.276,3.435);
43
- \filldraw[wfslate] (4.428,3.400) circle[radius=0.045];
44
- \draw[wforange, line width=0.72pt] (2.627,3.660) -- (7.723,3.660);
45
- \draw[wforange, line width=0.72pt] (2.627,3.625) -- (2.627,3.695);
46
- \draw[wforange, line width=0.72pt] (7.723,3.625) -- (7.723,3.695);
47
- \filldraw[wforange] (5.130,3.615) rectangle (5.220,3.705);
48
- \node[anchor=east, font=\scriptsize, text=black!82] at (2.320,3.120) {ClimaX};
49
- \draw[wfslate, line width=0.72pt] (4.032,2.990) -- (4.060,2.990);
50
- \draw[wfslate, line width=0.72pt] (4.032,2.955) -- (4.032,3.025);
51
- \draw[wfslate, line width=0.72pt] (4.060,2.955) -- (4.060,3.025);
52
- \filldraw[wfslate] (4.046,2.990) circle[radius=0.045];
53
- \draw[wforange, line width=0.72pt] (4.036,3.250) -- (4.036,3.250);
54
- \draw[wforange, line width=0.72pt] (4.036,3.215) -- (4.036,3.285);
55
- \draw[wforange, line width=0.72pt] (4.036,3.215) -- (4.036,3.285);
56
- \filldraw[wforange] (3.991,3.205) rectangle (4.081,3.295);
57
- \node[anchor=east, font=\scriptsize, text=black!82] at (2.320,2.710) {StormCast};
58
- \draw[wfslate, line width=0.72pt] (4.036,2.580) -- (4.036,2.580);
59
- \draw[wfslate, line width=0.72pt] (4.036,2.545) -- (4.036,2.615);
60
- \draw[wfslate, line width=0.72pt] (4.036,2.545) -- (4.036,2.615);
61
- \filldraw[wfslate] (4.036,2.580) circle[radius=0.045];
62
- \draw[wforange, line width=0.72pt] (4.036,2.840) -- (4.036,2.840);
63
- \draw[wforange, line width=0.72pt] (4.036,2.805) -- (4.036,2.875);
64
- \draw[wforange, line width=0.72pt] (4.036,2.805) -- (4.036,2.875);
65
- \filldraw[wforange] (3.991,2.795) rectangle (4.081,2.885);
66
- \node[anchor=east, font=\scriptsize, text=black!82] at (2.320,2.300) {DLWP};
67
- \draw[wfslate, line width=0.72pt] (4.036,2.170) -- (4.036,2.170);
68
- \draw[wfslate, line width=0.72pt] (4.036,2.135) -- (4.036,2.205);
69
- \draw[wfslate, line width=0.72pt] (4.036,2.135) -- (4.036,2.205);
70
- \filldraw[wfslate] (4.036,2.170) circle[radius=0.045];
71
- \draw[wforange, line width=0.72pt] (4.044,2.430) -- (4.735,2.430);
72
- \draw[wforange, line width=0.72pt] (4.044,2.395) -- (4.044,2.465);
73
- \draw[wforange, line width=0.72pt] (4.735,2.395) -- (4.735,2.465);
74
- \filldraw[wforange] (4.345,2.385) rectangle (4.435,2.475);
75
- \node[anchor=east, font=\scriptsize, text=black!82] at (2.320,1.890) {FCN};
76
- \draw[wfslate, line width=0.72pt] (4.036,1.760) -- (4.036,1.760);
77
- \draw[wfslate, line width=0.72pt] (4.036,1.725) -- (4.036,1.795);
78
- \draw[wfslate, line width=0.72pt] (4.036,1.725) -- (4.036,1.795);
79
- \filldraw[wfslate] (4.036,1.760) circle[radius=0.045];
80
- \draw[wforange, line width=0.72pt] (3.971,2.020) -- (4.286,2.020);
81
- \draw[wforange, line width=0.72pt] (3.971,1.985) -- (3.971,2.055);
82
- \draw[wforange, line width=0.72pt] (4.286,1.985) -- (4.286,2.055);
83
- \filldraw[wforange] (4.083,1.975) rectangle (4.173,2.065);
84
- \node[anchor=east, font=\scriptsize, text=black!82] at (2.320,1.480) {FengWu};
85
- \draw[wfslate, line width=0.72pt] (4.036,1.350) -- (4.036,1.350);
86
- \draw[wfslate, line width=0.72pt] (4.036,1.315) -- (4.036,1.385);
87
- \draw[wfslate, line width=0.72pt] (4.036,1.315) -- (4.036,1.385);
88
- \filldraw[wfslate] (4.036,1.350) circle[radius=0.045];
89
- \draw[wforange, line width=0.72pt] (4.028,1.610) -- (4.127,1.610);
90
- \draw[wforange, line width=0.72pt] (4.028,1.575) -- (4.028,1.645);
91
- \draw[wforange, line width=0.72pt] (4.127,1.575) -- (4.127,1.645);
92
- \filldraw[wforange] (4.032,1.565) rectangle (4.122,1.655);
93
- \node[anchor=east, font=\scriptsize, text=black!82] at (2.320,1.070) {FuXi};
94
- \draw[wfslate, line width=0.72pt] (4.036,0.940) -- (4.036,0.940);
95
- \draw[wfslate, line width=0.72pt] (4.036,0.905) -- (4.036,0.975);
96
- \draw[wfslate, line width=0.72pt] (4.036,0.905) -- (4.036,0.975);
97
- \filldraw[wfslate] (4.036,0.940) circle[radius=0.045];
98
- \draw[wforange, line width=0.72pt] (4.029,1.200) -- (4.087,1.200);
99
- \draw[wforange, line width=0.72pt] (4.029,1.165) -- (4.029,1.235);
100
- \draw[wforange, line width=0.72pt] (4.087,1.165) -- (4.087,1.235);
101
- \filldraw[wforange] (4.013,1.155) rectangle (4.103,1.245);
102
- \node[anchor=east, font=\scriptsize, text=black!82] at (2.320,0.660) {Pangu-Weather};
103
- \draw[wfslate, line width=0.72pt] (4.036,0.530) -- (4.036,0.530);
104
- \draw[wfslate, line width=0.72pt] (4.036,0.495) -- (4.036,0.565);
105
- \draw[wfslate, line width=0.72pt] (4.036,0.495) -- (4.036,0.565);
106
- \filldraw[wfslate] (4.036,0.530) circle[radius=0.045];
107
- \draw[wforange, line width=0.72pt] (4.025,0.790) -- (4.076,0.790);
108
- \draw[wforange, line width=0.72pt] (4.025,0.755) -- (4.025,0.825);
109
- \draw[wforange, line width=0.72pt] (4.076,0.755) -- (4.076,0.825);
110
- \filldraw[wforange] (4.006,0.745) rectangle (4.096,0.835);
111
- \node[anchor=east, font=\scriptsize, text=black!82] at (2.320,0.250) {AlphaEarth};
112
- \draw[wfslate, line width=0.72pt] (4.700,0.120) -- (6.103,0.120);
113
- \draw[wfslate, line width=0.72pt] (4.700,0.085) -- (4.700,0.155);
114
- \draw[wfslate, line width=0.72pt] (6.103,0.085) -- (6.103,0.155);
115
- \filldraw[wfslate] (5.401,0.120) circle[radius=0.045];
116
- \draw[wforange, line width=0.72pt] (3.872,0.380) -- (4.815,0.380);
117
- \draw[wforange, line width=0.72pt] (3.872,0.345) -- (3.872,0.415);
118
- \draw[wforange, line width=0.72pt] (4.815,0.345) -- (4.815,0.415);
119
- \filldraw[wforange] (4.298,0.335) rectangle (4.388,0.425);
120
- \end{tikzpicture}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
paper_outputs/tables/tab_app_analog_rank_depth.tex DELETED
@@ -1,24 +0,0 @@
1
- \begin{table*}[t]
2
- \centering
3
- \scriptsize
4
- \setlength{\tabcolsep}{3pt}
5
- \caption{For fixed retrieval \(\mathcal{T}\) and \(\Omega\), this table reports nDCG@5, best log gap, and rank \(\rho\) in addition to the main nDCG@10/log-error metrics. Cells report mean with small std.}
6
- \label{tab:app_analog_rank_depth}
7
- \begin{tabular}{lccc}
8
- \toprule
9
- Backbone & nDCG@5 & best log gap & rank $\rho$ \\
10
- \midrule
11
- FireWx-FM ref. & \ms{0.5175}{0.0445} & \ms{0.1868}{0.0285} & \ms{0.6019}{0.1460} \\
12
- Prithvi-WxC & \ms{0.3591}{0.0107} & \ms{0.2151}{0.0594} & \ms{0.1514}{0.1489} \\
13
- Aurora & \ms{0.4423}{0.0210} & \ms{0.1551}{0.0437} & \ms{0.2162}{0.1856} \\
14
- ClimaX & \ms{0.4151}{0.0293} & \ms{0.2129}{0.0653} & \ms{0.1587}{0.2831} \\
15
- StormCast & \ms{0.3960}{0.0240} & \ms{0.1714}{0.0310} & \ms{0.1258}{0.1625} \\
16
- DLWP & \ms{0.3795}{0.0274} & \ms{0.1944}{0.0807} & \ms{-0.3865}{0.2802} \\
17
- FCN & \ms{0.4250}{0.0112} & \ms{0.1856}{0.0846} & \ms{-0.1357}{0.2571} \\
18
- FengWu & \ms{0.4228}{0.0310} & \ms{0.1870}{0.0858} & \ms{-0.1926}{0.2194} \\
19
- FuXi & \ms{0.4544}{0.0356} & \ms{0.2171}{0.0806} & \ms{-0.1367}{0.2885} \\
20
- Pangu-Weather & \ms{0.3988}{0.0506} & \ms{0.1901}{0.0838} & \ms{-0.1970}{0.2216} \\
21
- AlphaEarth & \ms{0.5276}{0.0531} & \ms{0.1782}{0.0454} & \ms{0.4639}{0.2802} \\
22
- \bottomrule
23
- \end{tabular}
24
- \end{table*}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
paper_outputs/tables/tab_app_burned_area_median_acre.tex DELETED
@@ -1,24 +0,0 @@
1
- \begin{table*}[t]
2
- \centering
3
- \scriptsize
4
- \setlength{\tabcolsep}{3pt}
5
- \caption{For fixed final-area \(\mathcal{T}\) and \(\Omega\), this table reports median log error and acre-scale errors in addition to the main log-RMSE/log-MAE/Spearman metrics. Cells report mean with small std.}
6
- \label{tab:app_burned_area_median_acre}
7
- \begin{tabular}{lccc}
8
- \toprule
9
- Backbone & log median AE & acre median AE & acre MAPE \\
10
- \midrule
11
- FireWx-FM ref. & \ms{1.0235}{0.0982} & \ms{4504.0692}{459.0483} & \ms{1.4525}{0.0254} \\
12
- Prithvi-WxC & \ms{1.2184}{0.2107} & \ms{5375.8770}{788.7906} & \ms{1.9517}{0.2875} \\
13
- Aurora & \ms{1.4547}{0.0301} & \ms{9904.9483}{457.4260} & \ms{6.8728}{3.0026} \\
14
- ClimaX & \ms{1.6841}{0.1818} & \ms{18130.4820}{3248.3873} & \ms{8.2373}{2.8540} \\
15
- StormCast & \ms{1.4522}{0.1519} & \ms{11155.7881}{2020.8656} & \ms{4.6142}{1.1500} \\
16
- DLWP & \ms{1.0952}{0.1306} & \ms{4406.9315}{303.0944} & \ms{1.7357}{0.3625} \\
17
- FCN & \ms{1.1688}{0.1139} & \ms{5166.9993}{213.0333} & \ms{2.0800}{0.4004} \\
18
- FengWu & \ms{1.1589}{0.1772} & \ms{5137.2822}{628.7543} & \ms{2.0944}{0.4545} \\
19
- FuXi & \ms{1.1855}{0.0612} & \ms{5697.7117}{796.8785} & \ms{2.4411}{0.5567} \\
20
- Pangu-Weather & \ms{1.1221}{0.1470} & \ms{5092.3621}{483.8243} & \ms{1.9571}{0.3113} \\
21
- AlphaEarth & \ms{1.7459}{0.6057} & \ms{15110.7573}{7106.3417} & \ms{9.7398}{2.7425} \\
22
- \bottomrule
23
- \end{tabular}
24
- \end{table*}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
paper_outputs/tables/tab_app_contract_params_full.tex DELETED
@@ -1,22 +0,0 @@
1
- \begin{table}[h]
2
- \centering
3
- \scriptsize
4
- \setlength{\tabcolsep}{3.5pt}
5
- \renewcommand{\arraystretch}{1.2}
6
- \caption{Fixed scoring values used by each task-form contract.}
7
- \label{tab:app_contract_params_full}
8
- \begin{adjustbox}{max width=\textwidth}
9
- \begin{tabular}{llll}
10
- \toprule
11
- \textbf{\(\mathcal{T}\)} & \textbf{Scoring} & \textbf{Validation} & \textbf{\(\Omega\)} \\
12
- \midrule
13
- Occupancy & \(k=8,\Delta t=3\); exact/tol./union \(F_1\) & val. strict \(F_1\) & global; top-5/10/20\% fire-prone \\
14
- Fire spread & \(k=4,\Delta t=0\); exact/spatial \(F_1\), AP & val. spatial \(F_1\) & spread-region cells \\
15
- Final burned area & log-RMSE, log-MAE, Spearman \(\rho\) & val. log-RMSE & test events \\
16
- Analog retrieval & nDCG@10; retrieved-event log error & val. nDCG@10 & test events \\
17
- Smoke PM\(_{2.5}\) & RMSE, MAE, Pearson \(r\); exceedance 35 & val. RMSE & test stations \\
18
- Extreme heat & RMSE-C, MAE-C, exceedance \(F_1\) & val. threshold 27/30/33\(^{\circ}\)C & heat-region stations \\
19
- \bottomrule
20
- \end{tabular}
21
- \end{adjustbox}
22
- \end{table}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
paper_outputs/tables/tab_app_head_architectures.tex DELETED
@@ -1,36 +0,0 @@
1
- \begin{table}[h]
2
- \centering
3
- \small
4
- \setlength{\tabcolsep}{5pt}
5
- \renewcommand{\arraystretch}{1.3}
6
- \caption{Lightweight head architectures used in the fixed-contract transfer comparisons.
7
- All heads are trained from random initialisation on the frozen backbone features.
8
- Parameter counts are approximate and depend on the feature dimensionality of each backbone.}
9
- \label{tab:app_head_architectures}
10
- \begin{tabular}{p{0.15\textwidth}p{0.30\textwidth}p{0.12\textwidth}p{0.33\textwidth}}
11
- \toprule
12
- \textbf{$\mathcal{A}$ head} & \textbf{Architecture} & \textbf{Approx.\ params} & \textbf{Notes} \\
13
- \midrule
14
- Constant prior &
15
- Outputs a fixed bias vector, ignoring input features. &
16
- Output dimension only &
17
- Provides a degenerate baseline; selected when backbone features carry no useful signal. \\
18
- Linear probe &
19
- Single linear layer mapping backbone features to output. No nonlinearity. &
20
- $d\times c + c$ &
21
- Standard frozen-representation baseline. \\
22
- Pixel MLP &
23
- Two-layer MLP applied independently per spatial unit. &
24
- $d\times h + h\times c$ &
25
- Captures per-pixel nonlinearity; ignores spatial context. \\
26
- Shallow adapter &
27
- Two-layer MLP with a spatial context window; uses $3\times3$ convolution before the linear output. &
28
- $9dh + hc$ &
29
- Balances local spatial context with parameter efficiency. \\
30
- Wide adapter &
31
- Shallow adapter with wider hidden dimension. &
32
- $9dH + Hc$ &
33
- Higher capacity variant; can overfit on small fire-event sets. \\
34
- \bottomrule
35
- \end{tabular}
36
- \end{table}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
paper_outputs/tables/tab_app_heat_event_pr.tex DELETED
@@ -1,24 +0,0 @@
1
- \begin{table*}[t]
2
- \centering
3
- \scriptsize
4
- \setlength{\tabcolsep}{3pt}
5
- \caption{For fixed heat \(\mathcal{T}\) and heat-region \(\Omega\), this table reports precision and recall for the exceedance label used by the main \(F_1\). Cells report mean with small std.}
6
- \label{tab:app_heat_event_pr}
7
- \begin{tabular}{lcc}
8
- \toprule
9
- Backbone & precision & recall \\
10
- \midrule
11
- FireWx-FM ref. & \ms{0.9767}{0.0117} & \ms{0.9330}{0.0299} \\
12
- Prithvi-WxC & \ms{0.8260}{0.0030} & \ms{0.9173}{0.0033} \\
13
- Aurora & \ms{0.5920}{0.0347} & \ms{0.0517}{0.0020} \\
14
- ClimaX & \ms{0.7397}{0.0099} & \ms{0.7994}{0.0051} \\
15
- StormCast & \ms{0.8840}{0.0237} & \ms{0.9320}{0.0165} \\
16
- DLWP & \ms{0.9429}{0.0085} & \ms{0.8899}{0.0167} \\
17
- FCN & \ms{0.9408}{0.0097} & \ms{0.9111}{0.0127} \\
18
- FengWu & \ms{0.3808}{0.2719} & \ms{0.0266}{0.0267} \\
19
- FuXi & \ms{0.3262}{0.1262} & \ms{0.1810}{0.0481} \\
20
- Pangu-Weather & \ms{0.1159}{0.0743} & \ms{0.0112}{0.0032} \\
21
- AlphaEarth & \ms{0.9824}{0.0040} & \ms{0.9278}{0.0178} \\
22
- \bottomrule
23
- \end{tabular}
24
- \end{table*}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
paper_outputs/tables/tab_app_matching_rule_params.tex DELETED
@@ -1,17 +0,0 @@
1
- \begin{table}[h]
2
- \centering
3
- \small
4
- \setlength{\tabcolsep}{10pt}
5
- \renewcommand{\arraystretch}{1.2}
6
- \caption{Matching-rule values used in the evaluation contracts.}
7
- \label{tab:app_matching_rule_params}
8
- \begin{tabular}{lll}
9
- \toprule
10
- \textbf{Parameter} & \textbf{Occupancy} & \textbf{Fire spread} \\
11
- \midrule
12
- \(k\) & 8 cells & 4 cells \\
13
- \(\Delta t\) & 3 for union; 0 spatial-only & 0 \\
14
- \(\tau\) & val. strict \(F_1\) & val. spatial \(F_1\) \\
15
- \bottomrule
16
- \end{tabular}
17
- \end{table}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
paper_outputs/tables/tab_app_occupancy_ppr_scope.tex DELETED
@@ -1,27 +0,0 @@
1
- \begin{table*}[t]
2
- \centering
3
- \small
4
- \setlength{\tabcolsep}{4pt}
5
- \renewcommand{\arraystretch}{1.18}
6
- \caption{For fixed occupancy \(\mathcal{T}\), this table reports predicted-positive rate.
7
- Values are percentages under the same validation-selected strict threshold.
8
- Scopes \(\Omega\) are fixed before test scoring; cells report five-seed mean with std in small type.}
9
- \label{tab:app_occupancy_ppr_scope}
10
- \begin{tabular}{lcccc}
11
- \toprule
12
- \textbf{Backbone} & \textbf{\(\Omega=\)global} & \textbf{\(\Omega=\)top 5\%} & \textbf{\(\Omega=\)top 10\%} & \textbf{\(\Omega=\)top 20\%} \\
13
- \midrule
14
- FireWx-FM ref. & \ms{1.6808}{0.3684} & \ms{3.0619}{1.0925} & \ms{1.5310}{0.5463} & \ms{0.7655}{0.2732} \\
15
- Prithvi-WxC & \ms{61.9711}{30.9101} & \ms{57.4117}{47.8987} & \ms{58.4565}{51.0897} & \ms{58.9788}{52.6991} \\
16
- Aurora & \ms{55.5849}{19.7524} & \ms{57.2238}{35.3400} & \ms{68.7942}{37.6958} & \ms{67.2891}{38.3991} \\
17
- ClimaX & \ms{5.6763}{3.9261} & \ms{24.0091}{9.2816} & \ms{11.8450}{4.5067} & \ms{5.7442}{4.1341} \\
18
- StormCast & \ms{60.6507}{17.4895} & \ms{57.6017}{35.2921} & \ms{68.0766}{37.3899} & \ms{67.8397}{39.2410} \\
19
- DLWP & \ms{4.3221}{1.5619} & \ms{9.4001}{5.0807} & \ms{4.9700}{3.6849} & \ms{1.9198}{1.4678} \\
20
- FCN & \ms{1.5202}{1.3446} & \ms{4.7856}{2.9409} & \ms{2.7257}{1.6353} & \ms{0.8368}{0.2358} \\
21
- FengWu & \ms{0.4277}{0.4830} & \ms{0.6004}{0.3041} & \ms{0.2609}{0.1935} & \ms{0.1501}{0.1206} \\
22
- FuXi & \ms{0.4505}{0.2773} & \ms{2.9315}{2.6392} & \ms{0.5197}{0.6074} & \ms{0.3621}{0.4346} \\
23
- Pangu-Weather & \ms{1.0801}{1.1308} & \ms{2.0549}{2.1893} & \ms{1.4029}{1.4739} & \ms{1.0103}{1.1084} \\
24
- AlphaEarth & \ms{0.0691}{0.0499} & \ms{0.2826}{0.1497} & \ms{0.1524}{0.0770} & \ms{0.0656}{0.0414} \\
25
- \bottomrule
26
- \end{tabular}
27
- \end{table*}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
paper_outputs/tables/tab_app_scope_params.tex DELETED
@@ -1,19 +0,0 @@
1
- \begin{table}[h]
2
- \centering
3
- \small
4
- \setlength{\tabcolsep}{8pt}
5
- \renewcommand{\arraystretch}{1.2}
6
- \caption{Scope values used in the evaluation contracts.}
7
- \label{tab:app_scope_params}
8
- \begin{tabular}{lcc}
9
- \toprule
10
- \textbf{\(\Omega\)} & \textbf{Definition} & \textbf{Units} \\
11
- \midrule
12
- Global & full domain & 8,085,000 test cells \\
13
- Fire-prone top-5\% & top 5\% by training-period fire frequency & 404,280 test cells \\
14
- Fire-prone top-10\% & top 10\% by training-period fire frequency & 808,560 test cells \\
15
- Fire-prone top-20\% & top 20\% by training-period fire frequency & 1,617,000 test cells \\
16
- Spread region & union of \(\widehat{B}\) and \(B\) & event-specific cells \\
17
- \bottomrule
18
- \end{tabular}
19
- \end{table}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
paper_outputs/tables/tab_app_seed_robustness.tex DELETED
@@ -1,36 +0,0 @@
1
- \begin{table}[h]
2
- \centering
3
- \small
4
- \setlength{\tabcolsep}{5pt}
5
- \renewcommand{\arraystretch}{1.2}
6
- \caption{Seed summaries for stochastic checks. Values report mean with small std over completed seeds.}
7
- \label{tab:app_seed_robustness}
8
- \begin{adjustbox}{max width=\textwidth}
9
- \begin{tabular}{p{0.28\textwidth}cllp{0.18\textwidth}}
10
- \toprule
11
- \textbf{\(\mathcal{T}\) check} & \textbf{Seeds} & \textbf{Primary value} & \textbf{Other value(s)} & \textbf{Reading} \\
12
- \midrule
13
- Final burned area &
14
- 5 & log-RMSE \ms{1.1657}{0.0126} &
15
- log-MAE \ms{1.0423}{0.0081}; Spear.\ \ms{0.6298}{0.0338} &
16
- stable across seeds \\
17
- Smoke PM\(_{2.5}\) &
18
- 5 & RMSE \ms{4.4646}{0.0060} &
19
- MAE \ms{2.4108}{0.0016}; \(r\) \ms{0.6368}{0.0013} &
20
- stable at table precision \\
21
- Extreme heat &
22
- 5 & RMSE-C \ms{0.2179}{0.0043} &
23
- MAE-C \ms{0.1787}{0.0018}; exceed.\ \(F_1\) \ms{0.9541}{0.0164} &
24
- stable across seeds \\
25
- Fire spread &
26
- 5 & exact \(F_1\) \ms{37.6700}{0.9800} &
27
- spatial \(F_1\) \ms{80.9700}{2.0200}; AP \ms{30.0900}{1.2500} &
28
- stable across seeds \\
29
- Aurora paired-head check &
30
- 5 & fire-prone score diff.\ \ms{6.3500}{13.2800} &
31
- PR-AUC and union choices differ in 2/5 seeds &
32
- variable across seeds \\
33
- \bottomrule
34
- \end{tabular}
35
- \end{adjustbox}
36
- \end{table}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
paper_outputs/tables/tab_app_smoke_high_event.tex DELETED
@@ -1,24 +0,0 @@
1
- \begin{table*}[t]
2
- \centering
3
- \scriptsize
4
- \setlength{\tabcolsep}{3pt}
5
- \caption{For fixed smoke \(\mathcal{T}\) and station \(\Omega\), this table reports RMSE, MAE, and 90th-percentile absolute error on test rows with observed PM$_{2.5}\ge35$; std uses a row bootstrap over those rows. Cells report mean with small std.}
6
- \label{tab:app_smoke_high_event}
7
- \begin{tabular}{lccc}
8
- \toprule
9
- Backbone & high-smoke RMSE & high-smoke MAE & high-smoke 90th AE \\
10
- \midrule
11
- FireWx-FM ref. & \ms{47.4870}{0.6346} & \ms{34.3954}{0.7654} & \ms{65.6213}{3.8778} \\
12
- Prithvi-WxC & \ms{57.2224}{1.7268} & \ms{47.3871}{0.3153} & \ms{74.9666}{3.2381} \\
13
- Aurora & \ms{57.2752}{1.7248} & \ms{47.4368}{0.3149} & \ms{75.0755}{3.1074} \\
14
- ClimaX & \ms{57.2828}{1.7239} & \ms{47.4407}{0.3140} & \ms{75.1012}{3.0777} \\
15
- StormCast & \ms{56.6512}{1.7517} & \ms{46.7914}{0.3281} & \ms{74.0794}{3.4707} \\
16
- DLWP & \ms{57.0075}{1.7359} & \ms{47.1971}{0.3198} & \ms{74.4936}{3.3826} \\
17
- FCN & \ms{57.0582}{1.7339} & \ms{47.2401}{0.3187} & \ms{74.6431}{3.1982} \\
18
- FengWu & \ms{57.0158}{1.7357} & \ms{47.1957}{0.3194} & \ms{74.5652}{3.2871} \\
19
- FuXi & \ms{56.9622}{1.7371} & \ms{47.1508}{0.3201} & \ms{74.3278}{3.4435} \\
20
- Pangu-Weather & \ms{57.1282}{1.7307} & \ms{47.3050}{0.3170} & \ms{74.6830}{3.2375} \\
21
- AlphaEarth & \ms{48.0665}{0.7904} & \ms{35.6088}{0.7341} & \ms{66.7613}{3.9235} \\
22
- \bottomrule
23
- \end{tabular}
24
- \end{table*}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
paper_outputs/tables/tab_app_spread_ap_by_scope.tex DELETED
@@ -1,24 +0,0 @@
1
- \begin{table*}[t]
2
- \centering
3
- \scriptsize
4
- \setlength{\tabcolsep}{3pt}
5
- \caption{For fixed spread \(\mathcal{T}\) and strict \(\Lambda\), this table reports AP under three \(\Omega\) scopes: full test, top-5\% train-fire area, and top-10\% train-fire area. Values are percentages; cells report mean with small std.}
6
- \label{tab:app_spread_ap_by_scope}
7
- \begin{tabular}{lccc}
8
- \toprule
9
- Backbone & full \(\Omega\) AP & top-5\% \(\Omega\) AP & top-10\% \(\Omega\) AP \\
10
- \midrule
11
- FireWx-FM ref. & \ms{30.0197}{1.5651} & \ms{40.7452}{2.0542} & \ms{37.4096}{1.8731} \\
12
- Prithvi-WxC & \ms{4.8319}{0.1731} & \ms{12.6086}{0.4468} & \ms{8.7051}{0.1889} \\
13
- Aurora & \ms{17.7723}{0.4293} & \ms{30.3106}{0.9404} & \ms{26.4732}{0.6932} \\
14
- ClimaX & \ms{11.1726}{0.2337} & \ms{25.7871}{1.2896} & \ms{19.9977}{1.2217} \\
15
- StormCast & \ms{8.1147}{1.1569} & \ms{18.5461}{1.1727} & \ms{14.1286}{1.2956} \\
16
- DLWP & \ms{9.2142}{2.6587} & \ms{19.3346}{2.3922} & \ms{14.9788}{2.6696} \\
17
- FCN & \ms{6.6774}{1.3001} & \ms{16.7396}{3.2955} & \ms{11.9308}{2.3881} \\
18
- FengWu & \ms{11.0046}{2.7092} & \ms{21.1506}{1.2163} & \ms{17.0113}{1.5778} \\
19
- FuXi & \ms{13.5507}{0.3840} & \ms{22.5434}{0.4100} & \ms{19.1964}{0.3943} \\
20
- Pangu-Weather & \ms{10.6250}{1.4643} & \ms{19.8294}{1.3044} & \ms{15.8013}{1.1602} \\
21
- AlphaEarth & \ms{12.2847}{1.3562} & \ms{22.8692}{0.4915} & \ms{18.2992}{1.2110} \\
22
- \bottomrule
23
- \end{tabular}
24
- \end{table*}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
paper_outputs/tables/tab_appendix_selection_regret_tolerance.tex DELETED
@@ -1,2 +0,0 @@
1
- % Replaced by the all-backbone value table in sections/appendix.tex
2
- % (Table~\ref{tab:appendix_selection_regret_tolerance}).
 
 
 
paper_outputs/tables/tab_fireprone_contract_progression.tex DELETED
@@ -1,69 +0,0 @@
1
- \begin{table*}[t]
2
- \centering
3
- \scriptsize
4
- \setlength{\tabcolsep}{4pt}
5
- \caption{Occupancy scores across global and fire-prone scopes. Global uses the full validation/test domain; top-\(k\) rows use train-defined fire-prone masks from historical fire frequency. Values are \(F_1\) percentages from the same validation-selected strict threshold. Tolerance is spatial-only; union adds temporal and spatial matching. Difference is union minus strict. Rows report five-seed mean with small std. Values use four decimals.}
6
- \label{tab:fireprone_contract_progression}
7
- \begin{adjustbox}{max width=\textwidth}
8
- \begin{tabular}{@{}llcccc@{}}
9
- \toprule
10
- Backbone & Scope & Strict \(F_1\uparrow\) & Tolerance \(F_1\uparrow\) & Union \(F_1\uparrow\) & Difference \(\uparrow\) \\
11
- \midrule
12
- \textcolor{blue}{FireWx-FM ref.} & global & \ms{0.4550}{0.1410} & \ms{29.7480}{1.2870} & \ms{59.0660}{2.7370} & \ms{58.6110}{2.6950} \\
13
- & top 5\% & \ms{3.5600}{0.8810} & \ms{39.2620}{1.4010} & \ms{72.8280}{2.5780} & \ms{69.2680}{1.9960} \\
14
- & top 10\% & \ms{3.5580}{0.8800} & \ms{39.1660}{1.3910} & \ms{72.5200}{2.5670} & \ms{68.9630}{1.9890} \\
15
- & top 20\% & \ms{3.5300}{0.8700} & \ms{38.2850}{1.2950} & \ms{69.7230}{2.4660} & \ms{66.1930}{1.9270} \\
16
- \addlinespace[1pt]
17
- Prithvi-WxC & global & \ms{0.0550}{0.0040} & \ms{7.1600}{0.6600} & \ms{20.1900}{1.8300} & \ms{20.1300}{1.8300} \\
18
- & top 5\% & \ms{1.4100}{1.1600} & \ms{19.2600}{4.5000} & \ms{42.5800}{4.5500} & \ms{41.1700}{3.4800} \\
19
- & top 10\% & \ms{1.2400}{1.3200} & \ms{14.8800}{8.4400} & \ms{32.6900}{13.2100} & \ms{31.4500}{11.9100} \\
20
- & top 20\% & \ms{1.1500}{1.3800} & \ms{13.1500}{9.4600} & \ms{28.1300}{15.2900} & \ms{26.9800}{13.9200} \\
21
- \addlinespace[1pt]
22
- Aurora & global & \ms{0.0700}{0.0100} & \ms{8.5000}{1.9600} & \ms{23.1000}{4.9400} & \ms{23.0400}{4.9300} \\
23
- & top 5\% & \ms{0.9900}{0.9300} & \ms{15.1300}{6.0800} & \ms{35.4800}{11.0200} & \ms{34.5000}{10.3700} \\
24
- & top 10\% & \ms{0.7800}{1.0500} & \ms{12.7400}{6.5600} & \ms{30.5300}{10.8800} & \ms{29.7500}{9.8700} \\
25
- & top 20\% & \ms{0.6700}{1.1000} & \ms{10.5300}{7.4300} & \ms{24.9400}{12.5800} & \ms{24.2800}{11.4900} \\
26
- \addlinespace[1pt]
27
- ClimaX & global & \ms{0.3500}{0.0800} & \ms{29.7500}{3.6100} & \ms{60.1500}{7.5900} & \ms{59.8000}{7.5500} \\
28
- & top 5\% & \ms{1.2900}{0.1100} & \ms{34.5800}{2.3800} & \ms{69.2200}{5.7200} & \ms{67.9200}{5.7300} \\
29
- & top 10\% & \ms{1.2500}{0.1600} & \ms{34.3300}{2.2900} & \ms{68.5700}{5.5400} & \ms{67.3200}{5.5500} \\
30
- & top 20\% & \ms{1.0300}{0.2700} & \ms{30.2100}{4.2900} & \ms{60.0600}{7.5700} & \ms{59.0400}{7.5900} \\
31
- \addlinespace[1pt]
32
- StormCast & global & \ms{0.0560}{0.0110} & \ms{8.2000}{2.1900} & \ms{22.3800}{5.4300} & \ms{22.3200}{5.4200} \\
33
- & top 5\% & \ms{0.9600}{0.8000} & \ms{15.3200}{5.5300} & \ms{36.1900}{9.7300} & \ms{35.2300}{9.1800} \\
34
- & top 10\% & \ms{0.7300}{0.9300} & \ms{12.6700}{6.3300} & \ms{30.4700}{10.6500} & \ms{29.7500}{9.7500} \\
35
- & top 20\% & \ms{0.5800}{0.9100} & \ms{10.4200}{7.3400} & \ms{24.6600}{12.4000} & \ms{24.0800}{11.5000} \\
36
- \addlinespace[1pt]
37
- AlphaEarth & global & \ms{2.0600}{0.4400} & \ms{29.4500}{6.0100} & \ms{37.4300}{9.9500} & \ms{35.3700}{10.0300} \\
38
- & top 5\% & \ms{6.9100}{0.8500} & \ms{42.8800}{4.6100} & \ms{51.7400}{8.7300} & \ms{44.8300}{9.0800} \\
39
- & top 10\% & \ms{6.6400}{0.9900} & \ms{41.9000}{5.9500} & \ms{50.5700}{10.0100} & \ms{43.9300}{9.9200} \\
40
- & top 20\% & \ms{6.1900}{1.1300} & \ms{38.8300}{7.5000} & \ms{46.3800}{12.1700} & \ms{40.1900}{11.6800} \\
41
- \addlinespace[1pt]
42
- DLWP & global & \ms{0.1700}{0.0400} & \ms{14.9100}{3.2400} & \ms{28.1900}{6.9700} & \ms{28.0200}{6.9300} \\
43
- & top 5\% & \ms{1.8100}{0.4800} & \ms{31.7200}{3.2900} & \ms{55.4600}{5.2900} & \ms{53.6500}{5.4800} \\
44
- & top 10\% & \ms{1.6100}{0.6000} & \ms{27.6600}{5.9200} & \ms{47.1300}{8.0100} & \ms{45.5200}{7.7900} \\
45
- & top 20\% & \ms{1.5200}{0.9000} & \ms{20.9400}{4.8000} & \ms{34.9300}{7.8500} & \ms{33.4100}{7.8800} \\
46
- \addlinespace[1pt]
47
- FCN & global & \ms{0.2800}{0.0800} & \ms{19.5100}{3.3400} & \ms{40.0600}{9.3700} & \ms{39.7800}{9.3400} \\
48
- & top 5\% & \ms{1.6200}{0.5100} & \ms{29.3800}{2.7600} & \ms{54.3000}{7.4100} & \ms{52.6800}{7.4400} \\
49
- & top 10\% & \ms{1.1800}{0.5100} & \ms{22.4200}{3.9800} & \ms{43.4500}{9.2500} & \ms{42.2700}{9.0300} \\
50
- & top 20\% & \ms{1.0000}{0.4300} & \ms{16.9800}{3.9400} & \ms{34.0900}{8.2600} & \ms{33.0900}{7.9300} \\
51
- \addlinespace[1pt]
52
- FengWu & global & \ms{0.2600}{0.0800} & \ms{12.0000}{6.0200} & \ms{24.1000}{13.6300} & \ms{23.8400}{13.5700} \\
53
- & top 5\% & \ms{1.5700}{0.3600} & \ms{16.2800}{3.7000} & \ms{30.1100}{5.0100} & \ms{28.5400}{4.7700} \\
54
- & top 10\% & \ms{1.2400}{0.5300} & \ms{12.9500}{5.6100} & \ms{24.1900}{8.6900} & \ms{22.9400}{8.1900} \\
55
- & top 20\% & \ms{1.1200}{0.5000} & \ms{11.9500}{5.0700} & \ms{22.7900}{7.9100} & \ms{21.6700}{7.4400} \\
56
- \addlinespace[1pt]
57
- FuXi & global & \ms{0.3800}{0.1200} & \ms{21.0300}{4.8200} & \ms{37.2900}{9.4500} & \ms{36.9100}{9.4300} \\
58
- & top 5\% & \ms{2.0300}{0.6800} & \ms{31.8900}{4.7300} & \ms{53.9300}{8.3800} & \ms{51.9000}{8.6900} \\
59
- & top 10\% & \ms{1.6500}{0.7300} & \ms{24.0100}{5.7800} & \ms{40.2100}{9.9300} & \ms{38.5600}{9.7700} \\
60
- & top 20\% & \ms{1.3600}{0.6800} & \ms{21.9500}{5.8600} & \ms{36.7300}{10.0300} & \ms{35.3700}{9.9200} \\
61
- \addlinespace[1pt]
62
- Pangu-Weather & global & \ms{0.2800}{0.1100} & \ms{17.0900}{4.0500} & \ms{35.6400}{9.0300} & \ms{35.3600}{9.0800} \\
63
- & top 5\% & \ms{1.3700}{0.3100} & \ms{22.2200}{6.8600} & \ms{43.4200}{13.2400} & \ms{42.0600}{13.0600} \\
64
- & top 10\% & \ms{1.0900}{0.3500} & \ms{18.9300}{5.9300} & \ms{38.5300}{11.7200} & \ms{37.4400}{11.5300} \\
65
- & top 20\% & \ms{0.8800}{0.3600} & \ms{17.0200}{5.4900} & \ms{34.5700}{10.2900} & \ms{33.6800}{10.1300} \\
66
- \bottomrule
67
- \end{tabular}
68
- \end{adjustbox}
69
- \end{table*}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
paper_outputs/tables/tab_primary_results.tex DELETED
@@ -1,62 +0,0 @@
1
- \begin{table}[t]
2
- \centering
3
- \small
4
- \setlength{\tabcolsep}{4pt}
5
- \renewcommand{\arraystretch}{1.20}
6
- \caption{%
7
- \textbf{Primary fixed-contract transfer results (RQ3).}
8
- Occupancy metrics: exact, tolerated, and union $F_1$ (\%).
9
- Fire spread metrics: exact $F_1$, spatial $F_1$, and AP (\%).
10
- Each block fixes $\mathcal{T}$, $\Lambda$, $\Omega$, $\mathcal{A}$.
11
- \textbf{Bold} marks the best frozen backbone per metric.
12
- }
13
- \label{tab:primary_results}
14
- \setlength{\arrayrulewidth}{0.4pt}
15
- \resizebox{\textwidth}{!}{%
16
- \begin{tabular}{lcccccc}
17
- \toprule
18
- & \multicolumn{3}{c}{\textbf{Occupancy}}
19
- & \multicolumn{3}{c}{\textbf{Fire spread}} \\
20
- \cmidrule(lr){2-4}\cmidrule(lr){5-7}
21
- \textbf{Comparator}
22
- & \textbf{Exact $F_1\uparrow$} & \textbf{Tol.\ $F_1\uparrow$} & \textbf{Union $F_1\uparrow$}
23
- & \textbf{Exact $F_1\uparrow$} & \textbf{Spatial $F_1\uparrow$} & \textbf{AP$\uparrow$} \\
24
- \midrule
25
- \textcolor{blue}{FireWx-FM ref.}
26
- & \ms{0.4546}{0.1412} & \ms{29.7484}{1.2868} & \ms{59.0656}{2.7372}
27
- & \ms{37.6700}{0.9800} & \ms{80.9700}{2.0200} & \ms{30.0900}{1.2500} \\
28
- \midrule
29
- Prithvi-WxC
30
- & \ms{0.0552}{0.0039} & \ms{7.1649}{0.6557} & \ms{20.1853}{1.8299}
31
- & \ms{22.3500}{3.4500} & \ms{65.2600}{1.0700} & \ms{5.0000}{0.3000} \\
32
- Aurora
33
- & \ms{0.0656}{0.0094} & \ms{8.5009}{1.9594} & \ms{23.1037}{4.9418}
34
- & \textbf{\ms{30.8757}{0.1343}} & \textbf{\ms{71.7329}{0.0141}} & \textbf{\ms{16.6221}{1.6965}} \\
35
- ClimaX
36
- & \ms{0.3480}{0.0754} & \textbf{\ms{29.7535}{3.6073}} & \textbf{\ms{60.1506}{7.5865}}
37
- & \ms{27.9853}{2.0532} & \ms{69.0634}{2.3832} & \ms{11.1726}{0.2337} \\
38
- StormCast
39
- & \ms{0.0626}{0.0119} & \ms{8.1951}{2.1895} & \ms{22.3817}{5.4294}
40
- & \ms{14.8387}{7.5791} & \ms{55.7568}{21.3003} & \ms{2.8114}{0.7377} \\
41
- DLWP
42
- & \ms{0.1693}{0.0419} & \ms{14.9148}{3.2446} & \ms{28.1901}{6.9658}
43
- & \ms{5.9335}{10.0712} & \ms{22.8587}{22.3750} & \ms{5.9435}{5.5194} \\
44
- FCN
45
- & \ms{0.2829}{0.0839} & \ms{19.5061}{3.3412} & \ms{40.0604}{9.3701}
46
- & \ms{3.1798}{2.6598} & \ms{15.6203}{12.4531} & \ms{2.3861}{1.2614} \\
47
- FengWu
48
- & \ms{0.2613}{0.0757} & \ms{12.0050}{6.0239} & \ms{24.1022}{13.6293}
49
- & \ms{5.5189}{9.0883} & \ms{18.4774}{22.4703} & \ms{13.1658}{1.3408} \\
50
- FuXi
51
- & \ms{0.3774}{0.1212} & \ms{21.0323}{4.8211} & \ms{37.2888}{9.4470}
52
- & \ms{19.9909}{2.1364} & \ms{56.1826}{3.0412} & \ms{14.3526}{0.3554} \\
53
- Pangu-Weather
54
- & \ms{0.2755}{0.1089} & \ms{17.0909}{4.0477} & \ms{35.6386}{9.0327}
55
- & \ms{11.2583}{11.0719} & \ms{32.5081}{25.4969} & \ms{12.6881}{1.6790} \\
56
- AlphaEarth
57
- & \textbf{\ms{2.0606}{0.4404}} & \ms{29.4476}{6.0064} & \ms{37.4286}{9.9458}
58
- & \ms{11.0995}{3.6088} & \ms{32.8316}{7.4634} & \ms{11.8343}{1.5050} \\
59
- \bottomrule
60
- \end{tabular}
61
- }
62
- \end{table}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
paper_outputs/tables/tab_selection_regret_full_head.tex DELETED
@@ -1,2 +0,0 @@
1
- % Full per-head rows are kept in the supplementary CSV files.
2
- % The manuscript uses the all-backbone selection-regret summaries instead.
 
 
 
paper_outputs/tables/tab_selection_regret_scope.tex DELETED
@@ -1,24 +0,0 @@
1
- \begin{table*}[!t]
2
- \centering
3
- \small
4
- \setlength{\tabcolsep}{4pt}
5
- \caption{Fixed-feature selection-regret check across evaluation scopes. Values are percentage-point regret \(\delta = D(h_D)-D(h_R)\) under union-\(F_1\), where \(h_R\) is selected by PR-AUC and \(h_D\) by the decision metric. Top-\(k\) columns use train-defined fire-prone scopes. Rows report mean with small std over five seeds; \(0.0000\) means the two selectors give the same decision score for all seeds.}
6
- \label{tab:selection_regret_diagnostic}
7
- \begin{tabular}{lcccc}
8
- \toprule
9
- \textbf{Feature source} & \textbf{\(\Omega=\)global} & \textbf{\(\Omega=\)top 5\%} & \textbf{\(\Omega=\)top 10\%} & \textbf{\(\Omega=\)top 20\%} \\
10
- \midrule
11
- \textcolor{blue}{FireWx-FM ref.} & \ms{7.3831}{7.4536} & \ms{0.3664}{0.6812} & \ms{1.2275}{1.2665} & \ms{2.9385}{2.7513} \\
12
- Prithvi-WxC & 0.0000 & 0.0000 & 0.0000 & 0.0000 \\
13
- Aurora & \ms{4.9455}{10.6974} & \ms{15.4283}{34.4987} & \ms{13.9934}{31.2903} & \ms{14.3706}{32.1337} \\
14
- ClimaX & \ms{0.1296}{0.1775} & 0.0000 & 0.0000 & 0.0000 \\
15
- StormCast & 0.0000 & 0.0000 & 0.0000 & 0.0000 \\
16
- DLWP & 0.0000 & \ms{1.6716}{1.6079} & \ms{2.8465}{2.6938} & \ms{4.4634}{4.3561} \\
17
- FCN & 0.0000 & \ms{0.4510}{1.0071} & \ms{0.4200}{0.9390} & \ms{1.1680}{1.9872} \\
18
- FengWu & 0.0000 & \ms{0.8796}{0.5532} & \ms{0.4023}{0.5511} & \ms{0.5222}{0.6239} \\
19
- FuXi & 0.0000 & \ms{1.3545}{2.0970} & \ms{0.1656}{0.3703} & \ms{0.2833}{0.3681} \\
20
- Pangu-Weather & 0.0000 & \ms{0.7593}{0.8974} & \ms{0.3048}{0.5054} & \ms{0.1868}{0.3255} \\
21
- AlphaEarth & \ms{17.2217}{8.8492} & \ms{6.3846}{4.9653} & \ms{6.5738}{6.8970} & \ms{3.8804}{5.9483} \\
22
- \bottomrule
23
- \end{tabular}
24
- \end{table*}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
paper_outputs/tables/tab_selection_regret_scope_sweep.tex DELETED
@@ -1,24 +0,0 @@
1
- \begin{table*}[!t]
2
- \centering
3
- \small
4
- \setlength{\tabcolsep}{4pt}
5
- \caption{Fixed-feature selection-regret sweep across evaluation scopes. Values are percentage-point regret \(\delta = D(h_D)-D(h_R)\) under union-\(F_1\). Top-\(k\) scopes are train-defined fire-prone masks. Rows report mean with small std over five seeds.}
6
- \label{tab:selection_regret_scope_sweep}
7
- \begin{tabular}{lcccc}
8
- \toprule
9
- \textbf{Feature source} & \textbf{\(\Omega=\)global} & \textbf{\(\Omega=\)top 5\%} & \textbf{\(\Omega=\)top 10\%} & \textbf{\(\Omega=\)top 20\%} \\
10
- \midrule
11
- \textcolor{blue}{FireWx-FM ref.} & \ms{7.3831}{7.4536} & \ms{0.3664}{0.6812} & \ms{1.2275}{1.2665} & \ms{2.9385}{2.7513} \\
12
- Prithvi-WxC & 0.0000 & 0.0000 & 0.0000 & 0.0000 \\
13
- Aurora & \ms{4.9455}{10.6974} & \ms{15.4283}{34.4987} & \ms{13.9934}{31.2903} & \ms{14.3706}{32.1337} \\
14
- ClimaX & \ms{0.1296}{0.1775} & 0.0000 & 0.0000 & 0.0000 \\
15
- StormCast & 0.0000 & 0.0000 & 0.0000 & 0.0000 \\
16
- DLWP & 0.0000 & \ms{1.6716}{1.6079} & \ms{2.8465}{2.6938} & \ms{4.4634}{4.3561} \\
17
- FCN & 0.0000 & \ms{0.4510}{1.0071} & \ms{0.4200}{0.9390} & \ms{1.1680}{1.9872} \\
18
- FengWu & 0.0000 & \ms{0.8796}{0.5532} & \ms{0.4023}{0.5511} & \ms{0.5222}{0.6239} \\
19
- FuXi & 0.0000 & \ms{1.3545}{2.0970} & \ms{0.1656}{0.3703} & \ms{0.2833}{0.3681} \\
20
- Pangu-Weather & 0.0000 & \ms{0.7593}{0.8974} & \ms{0.3048}{0.5054} & \ms{0.1868}{0.3255} \\
21
- AlphaEarth & \ms{17.2217}{8.8492} & \ms{6.3846}{4.9653} & \ms{6.5738}{6.8970} & \ms{3.8804}{5.9483} \\
22
- \bottomrule
23
- \end{tabular}
24
- \end{table*}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
paper_outputs/tables/tab_supporting_results.tex DELETED
@@ -1,120 +0,0 @@
1
- \begin{table}[t]
2
- \centering
3
- \small
4
- \setlength{\tabcolsep}{3.5pt}
5
- \renewcommand{\arraystretch}{1.18}
6
- \caption{%
7
- \textbf{Supporting task-metric matrix (RQ4).}
8
- Top block: final burned area and analog retrieval.
9
- Bottom block: smoke PM$_{2.5}$ and extreme heat.
10
- Each block fixes $\mathcal{T}$, $\Lambda$, $\Omega$; backbone
11
- column is shared across paired tasks. \textcolor{blue}{FireWx-FM reference row is}
12
- separated by a rule as the empirical anchor. \textbf{Bold} marks
13
- the largest atmospheric-FM heat error values. For error metrics
14
- lower is better ($\downarrow$); for $F_1$, nDCG, and $r$ higher
15
- is better ($\uparrow$).
16
- }
17
- \label{tab:supporting_results}
18
- \resizebox{\textwidth}{!}{%
19
- \begin{tabular}{lcccccc}
20
- \toprule
21
- & \multicolumn{3}{c}{\textbf{Burned area}}
22
- & \multicolumn{3}{c}{\textbf{Analog retrieval}} \\
23
- \cmidrule(lr){2-4}\cmidrule(lr){5-7}
24
- \textbf{Backbone}
25
- & \textbf{log-RMSE$\downarrow$} & \textbf{log-MAE$\downarrow$}
26
- & \textbf{Spearman$\uparrow$}
27
- & \textbf{nDCG@10$\uparrow$} & \textbf{log-RMSE$\downarrow$}
28
- & \textbf{log-MAE$\downarrow$} \\
29
- \midrule
30
- \textcolor{blue}{FireWx-FM ref.}
31
- & \ms{1.1657}{0.0126} & \ms{1.0423}{0.0081} & \ms{0.6298}{0.0338}
32
- & \ms{0.5099}{0.0336} & \ms{1.1977}{0.1029} & \ms{1.0043}{0.0759} \\
33
- \midrule
34
- Prithvi-WxC
35
- & \ms{1.3630}{0.0681} & \ms{1.2435}{0.0668} & \ms{0.1799}{0.3002}
36
- & \ms{0.3857}{0.0189} & \ms{1.3908}{0.0938} & \ms{1.2585}{0.0865} \\
37
- Aurora
38
- & \ms{1.8658}{0.2009} & \ms{1.6717}{0.1245} & \ms{-0.1156}{0.2982}
39
- & \ms{0.4046}{0.0144} & \ms{1.3659}{0.0792} & \ms{1.2596}{0.0968} \\
40
- ClimaX
41
- & \ms{2.0300}{0.2103} & \ms{1.8443}{0.1528} & \ms{-0.2515}{0.2688}
42
- & \ms{0.4143}{0.0191} & \ms{1.4526}{0.0926} & \ms{1.2441}{0.1446} \\
43
- StormCast
44
- & \ms{1.6679}{0.1438} & \ms{1.4745}{0.1134} & \ms{0.1830}{0.1969}
45
- & \ms{0.4076}{0.0094} & \ms{1.3663}{0.0781} & \ms{1.2371}{0.1078} \\
46
- DLWP
47
- & \ms{1.3070}{0.0980} & \ms{1.1769}{0.0834} & \ms{0.4888}{0.1368}
48
- & \ms{0.3972}{0.0146} & \ms{1.5351}{0.0802} & \ms{1.3196}{0.0781} \\
49
- FCN
50
- & \ms{1.3693}{0.0885} & \ms{1.2599}{0.0723} & \ms{0.3484}{0.1662}
51
- & \ms{0.4316}{0.0134} & \ms{1.4604}{0.1035} & \ms{1.2351}{0.0586} \\
52
- FengWu
53
- & \ms{1.3715}{0.1011} & \ms{1.2604}{0.0820} & \ms{0.3221}{0.2004}
54
- & \ms{0.4246}{0.0237} & \ms{1.4179}{0.0986} & \ms{1.2233}{0.0915} \\
55
- FuXi
56
- & \ms{1.4068}{0.1011} & \ms{1.3023}{0.0789} & \ms{0.2663}{0.2561}
57
- & \ms{0.4279}{0.0212} & \ms{1.4290}{0.0929} & \ms{1.2236}{0.0961} \\
58
- Pangu-Weather
59
- & \ms{1.3280}{0.0735} & \ms{1.2081}{0.0607} & \ms{0.4141}{0.1573}
60
- & \ms{0.4017}{0.0245} & \ms{1.4235}{0.0731} & \ms{1.2225}{0.0847} \\
61
- AlphaEarth
62
- & \ms{2.4068}{0.2841} & \ms{2.0822}{0.2371} & \ms{-0.3428}{0.1716}
63
- & \ms{0.5086}{0.0440} & \ms{1.2158}{0.1310} & \ms{1.0350}{0.1018} \\
64
- \bottomrule
65
- \end{tabular}
66
- }
67
-
68
- \vspace{4pt}
69
-
70
- \resizebox{\textwidth}{!}{%
71
- \begin{tabular}{lcccccc}
72
- \toprule
73
- & \multicolumn{3}{c}{\textbf{Smoke PM$_{2.5}$}}
74
- & \multicolumn{3}{c}{\textbf{Extreme heat}} \\
75
- \cmidrule(lr){2-4}\cmidrule(lr){5-7}
76
- \textbf{Backbone}
77
- & \textbf{RMSE$\downarrow$} & \textbf{MAE$\downarrow$}
78
- & \textbf{Pearson $r\uparrow$}
79
- & \textbf{RMSE-C$\downarrow$} & \textbf{MAE-C$\downarrow$}
80
- & \textbf{Exceed.\ $F_1\uparrow$} \\
81
- \midrule
82
- \textcolor{blue}{FireWx-FM ref.}
83
- & \ms{4.4646}{0.0060} & \ms{2.4108}{0.0016} & \ms{0.6368}{0.0013}
84
- & \ms{0.2179}{0.0043} & \ms{0.1787}{0.0018} & \ms{0.9541}{0.0164} \\
85
- \midrule
86
- Prithvi-WxC
87
- & \ms{6.0382}{0.0828} & \ms{3.7301}{0.0055} & \ms{0.0243}{0.0045}
88
- & \ms{4.6225}{0.0192} & \ms{2.6315}{0.0128} & \ms{0.8693}{0.0023} \\
89
- Aurora
90
- & \ms{6.0384}{0.0828} & \ms{3.7265}{0.0055} & \ms{0.0193}{0.0043}
91
- & \textbf{\ms{18.0474}{0.0708}} & \textbf{\ms{15.3747}{0.0594}}
92
- & \ms{0.0951}{0.0038} \\
93
- ClimaX
94
- & \ms{6.0402}{0.0828} & \ms{3.7290}{0.0055} & \ms{0.0004}{0.0029}
95
- & \ms{17.6492}{0.0347} & \ms{14.4938}{0.0319} & \ms{0.7684}{0.0068} \\
96
- StormCast
97
- & \ms{6.1230}{0.0830} & \ms{3.8182}{0.0073} & \ms{0.0183}{0.0041}
98
- & \ms{1.7671}{0.2145} & \ms{1.3507}{0.1576} & \ms{0.9073}{0.0189} \\
99
- DLWP
100
- & \ms{5.9289}{0.1031} & \ms{3.7331}{0.0088} & \ms{0.0303}{0.0060}
101
- & \ms{2.2662}{0.1106} & \ms{1.7153}{0.0748} & \ms{0.9156}{0.0112} \\
102
- FCN
103
- & \ms{5.9277}{0.1033} & \ms{3.7345}{0.0088} & \ms{0.0312}{0.0062}
104
- & \ms{2.1657}{0.1800} & \ms{1.6033}{0.1039} & \ms{0.9257}{0.0096} \\
105
- FengWu
106
- & \ms{5.9297}{0.1032} & \ms{3.7395}{0.0088} & \ms{0.0304}{0.0063}
107
- & \ms{2.1266}{0.1589} & \ms{1.5801}{0.1004} & \ms{0.0481}{0.0459} \\
108
- FuXi
109
- & \ms{5.9319}{0.1029} & \ms{3.7398}{0.0088} & \ms{0.0299}{0.0061}
110
- & \ms{2.1282}{0.0969} & \ms{1.5759}{0.0719} & \ms{0.2268}{0.0623} \\
111
- Pangu-Weather
112
- & \ms{5.9270}{0.1036} & \ms{3.7320}{0.0088} & \ms{0.0301}{0.0060}
113
- & \ms{2.2045}{0.1483} & \ms{1.6307}{0.0889} & \ms{0.0199}{0.0062} \\
114
- AlphaEarth
115
- & \ms{4.4403}{0.0488} & \ms{2.3992}{0.0056} & \ms{0.6347}{0.0066}
116
- & \ms{0.2194}{0.0039} & \ms{0.1800}{0.0014} & \ms{0.9542}{0.0107} \\
117
- \bottomrule
118
- \end{tabular}
119
- }
120
- \end{table}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
scripts/audit_release.py CHANGED
@@ -18,7 +18,6 @@ REQUIRED = [
18
  "models/wildfire_fm/README.md",
19
  "models/wildfire_fm/modeling_unet.py",
20
  "models/wildfire_fm/checkpoint_manifest.json",
21
- "paper/manuscript_final.pdf",
22
  "paper_outputs/figures/overview_wildfire.pdf",
23
  "paper_outputs/figures/matching.pdf",
24
  "paper_outputs/figures/fig_task_contract_tiles.pdf",
@@ -34,14 +33,8 @@ REQUIRED = [
34
  "scripts/check_paper_output_hashes.py",
35
  ]
36
 
37
- TABLE_LABELS = [
38
- "tab_primary_results.tex",
39
- "tab_supporting_results.tex",
40
- "tab_fireprone_contract_progression.tex",
41
- "tab_selection_regret_scope.tex",
42
- "tab_selection_regret_scope_sweep.tex",
43
- "tab_appendix_selection_regret_tolerance.tex",
44
- ]
45
 
46
  FORBIDDEN_TEXT = [
47
  "/home/yx21e",
@@ -75,9 +68,11 @@ def main() -> None:
75
  for rel in REQUIRED:
76
  if not (ROOT / rel).exists():
77
  issues.append(f"missing required file: {rel}")
78
- for table in TABLE_LABELS:
79
- if not (ROOT / "paper_outputs/tables" / table).exists():
80
- issues.append(f"missing paper table output: {table}")
 
 
81
 
82
  for path in iter_text_files():
83
  text = path.read_text(errors="ignore")
@@ -127,7 +122,6 @@ def main() -> None:
127
  expected_paths = []
128
  for rel_root in ["paper_outputs", "assets"]:
129
  expected_paths.extend(str(p.relative_to(ROOT)) for p in (ROOT / rel_root).rglob("*") if p.is_file())
130
- expected_paths.append("paper/manuscript_final.pdf")
131
  expected = sorted(set(expected_paths))
132
  if sorted(listed) != expected:
133
  missing = sorted(set(expected) - set(listed))
 
18
  "models/wildfire_fm/README.md",
19
  "models/wildfire_fm/modeling_unet.py",
20
  "models/wildfire_fm/checkpoint_manifest.json",
 
21
  "paper_outputs/figures/overview_wildfire.pdf",
22
  "paper_outputs/figures/matching.pdf",
23
  "paper_outputs/figures/fig_task_contract_tiles.pdf",
 
33
  "scripts/check_paper_output_hashes.py",
34
  ]
35
 
36
+ FORBIDDEN_FILE_SUFFIXES = {".tex", ".bib", ".tikz"}
37
+ FORBIDDEN_FILE_NAMES = {"manuscript_final.pdf"}
 
 
 
 
 
 
38
 
39
  FORBIDDEN_TEXT = [
40
  "/home/yx21e",
 
68
  for rel in REQUIRED:
69
  if not (ROOT / rel).exists():
70
  issues.append(f"missing required file: {rel}")
71
+ for path in ROOT.rglob("*"):
72
+ if ".git" in path.parts or "__pycache__" in path.parts:
73
+ continue
74
+ if path.is_file() and (path.suffix in FORBIDDEN_FILE_SUFFIXES or path.name in FORBIDDEN_FILE_NAMES):
75
+ issues.append(f"forbidden manuscript/source artifact present: {path.relative_to(ROOT)}")
76
 
77
  for path in iter_text_files():
78
  text = path.read_text(errors="ignore")
 
122
  expected_paths = []
123
  for rel_root in ["paper_outputs", "assets"]:
124
  expected_paths.extend(str(p.relative_to(ROOT)) for p in (ROOT / rel_root).rglob("*") if p.is_file())
 
125
  expected = sorted(set(expected_paths))
126
  if sorted(listed) != expected:
127
  missing = sorted(set(expected) - set(listed))
scripts/reproduce_paper_outputs.py CHANGED
@@ -1,10 +1,8 @@
1
  #!/usr/bin/env python3
2
- """Verify the released WildFIRE-FM paper artifacts.
3
 
4
- The final paper figures/tables in this Hub release are copied from the current
5
- manuscript bundle. Raw-data reruns are intentionally outside this lightweight
6
- check because the public repository does not redistribute source data or local
7
- feature caches.
8
  """
9
 
10
  from __future__ import annotations
@@ -25,7 +23,7 @@ def run(cmd: list[str]) -> None:
25
  def main() -> None:
26
  run([sys.executable, "scripts/check_paper_output_hashes.py"])
27
  run([sys.executable, "scripts/audit_release.py"])
28
- print("Verified final paper outputs and release audit.")
29
 
30
 
31
  if __name__ == "__main__":
 
1
  #!/usr/bin/env python3
2
+ """Verify the released WildFIRE-FM public artifacts.
3
 
4
+ The Hub release intentionally excludes manuscript TeX/PDF source. This check
5
+ verifies public figure previews, sanitized summaries, and release hygiene.
 
 
6
  """
7
 
8
  from __future__ import annotations
 
23
  def main() -> None:
24
  run([sys.executable, "scripts/check_paper_output_hashes.py"])
25
  run([sys.executable, "scripts/audit_release.py"])
26
+ print("Verified public release artifacts and release audit.")
27
 
28
 
29
  if __name__ == "__main__":