Eshit commited on
Commit
e762d42
Β·
1 Parent(s): a070e2f

Add frontend demo interface and project updates

Browse files
assets/annotated_frame.gif ADDED

Git LFS Details

  • SHA256: 26592f216d657d05113bb745f7889d18262d576df973b39d5e7639dc6bce62e5
  • Pointer size: 131 Bytes
  • Size of remote file: 303 kB
assets/env_diagram.png ADDED
colab_prompts.md ADDED
@@ -0,0 +1,607 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Colab training prompts (feed to Claude, in order)
2
+
3
+ Each prompt is self-contained β€” paste as a fresh message with no prior context.
4
+
5
+ ---
6
+
7
+ ## Prompt 1 β€” SFT data generator script
8
+
9
+ ```
10
+ Write a standalone Python script `scripts/generate_sft_data.py` for the Wildfire Containment Simulator project.
11
+
12
+ PURPOSE: Generate supervised fine-tuning (SFT) training examples by running the HeuristicAgent through episodes and recording (prompt, action) pairs at every step.
13
+
14
+ REPO STRUCTURE (files that exist):
15
+ - env/wildfire_env.py β€” WildfireEnv with reset(task_id, seed) and step(action)
16
+ - env/serialization.py β€” serialize_observation(obs, step_num, max_steps, tier="", prev_cells_burning=0) -> str
17
+ - agents/heuristic_agent.py β€” HeuristicAgent with act(obs) -> Action
18
+ - env/models.py β€” TIER_EASY(episode_length=80), TIER_MEDIUM(episode_length=150), TIER_HARD(episode_length=300)
19
+ - env/action_parser.py β€” parse_action(text, obs) -> (Action, status)
20
+
21
+ SYSTEM_PROMPT constant to use in every example:
22
+ "You are an AI Incident Commander managing wildfire containment. You will receive a situation briefing each step. Respond with ONLY a valid JSON action object and nothing else. Example: {\"action_type\": \"idle\"}"
23
+
24
+ REQUIREMENTS:
25
+ 1. For each tier ("easy", "medium", "hard"), for each seed in a configurable range:
26
+ a. Reset the env
27
+ b. Run the heuristic for a random offset (0 to min(30, max_steps//4)) steps to get mid-episode states
28
+ c. Run the heuristic to EPISODE COMPLETION (env.done == True), recording every step
29
+ d. After the episode is complete, check env.state()["population_lost"] == 0. Only keep examples
30
+ from successful episodes (pop_lost == 0 at end). Discard the whole episode otherwise.
31
+ e. From the kept episodes, record every step as a training example EXCEPT: filter out IDLE
32
+ actions unless they represent more than 30% of the episode's actions (keep a realistic idle rate).
33
+ Concretely: keep all non-IDLE steps, then randomly sample IDLE steps to reach at most 20% of
34
+ total examples per episode.
35
+ f. Each example: {"messages": [{"role": "system", ...}, {"role": "user", "content": prompt_text}],
36
+ "completion": action_json_string, "tier": tier, "seed": seed, "step": step_num}
37
+ g. The "completion" field is the action serialised as compact JSON (action.model_dump_json(exclude_none=True))
38
+
39
+ 2. Track prev_cells_burning across steps to pass to serialize_observation for spread delta.
40
+
41
+ 3. Target counts after filtering: easy=2000 examples, medium=1500, hard=800.
42
+ Iterate seeds starting from 0, incrementing by 1, until targets are met.
43
+
44
+ 4. Save to training/sft_data.jsonl (one JSON object per line). Print progress every 50 seeds.
45
+ Print final tier distribution before exiting.
46
+
47
+ 5. Add argparse: --output (default training/sft_data.jsonl), --easy-seeds N (max seeds to try),
48
+ --medium-seeds N, --hard-seeds N
49
+
50
+ IMPORTANT:
51
+ - The script runs locally, not in Colab. Use sys.path.insert(0, project_root) to make env/ importable.
52
+ - No GPU needed.
53
+ - Do NOT filter mid-episode observations β€” they are intentionally included for training diversity.
54
+ The per-episode success filter (pop_lost==0) applies to the whole episode, not individual steps.
55
+ ```
56
+
57
+ ---
58
+
59
+ ## Prompt 2 β€” SFT training notebook
60
+
61
+ ```
62
+ Write a complete Google Colab notebook `training/sft_colab.ipynb` for supervised fine-tuning of
63
+ Qwen2.5-7B-Instruct on wildfire incident command data.
64
+
65
+ CONTEXT:
66
+ - Input: training/sft_data.jsonl, where each line has:
67
+ {"messages": [{"role":"system","content":"..."}, {"role":"user","content":"..."}],
68
+ "completion": "{\"action_type\":...}", "tier": "easy", "seed": 42, "step": 5}
69
+ - Goal: teach the model to output valid JSON action objects given wildfire observations
70
+ - Hardware target: A100 40GB on Colab (HF credits)
71
+
72
+ NOTEBOOK SECTIONS:
73
+
74
+ Section 1 β€” Install
75
+ - pip install: unsloth[colab-new] from git, trl==0.15.2, datasets==3.4.1
76
+ - assert torch.cuda.is_available(), print GPU name and total memory
77
+
78
+ Section 2 β€” Load Model
79
+ - unsloth FastLanguageModel.from_pretrained("unsloth/Qwen2.5-7B-Instruct",
80
+ max_seq_length=2048, load_in_4bit=True)
81
+ - FastLanguageModel.get_peft_model with r=32, lora_alpha=64, lora_dropout=0.05
82
+ - target_modules=['q_proj','k_proj','v_proj','o_proj','gate_proj','up_proj','down_proj']
83
+ - Use pad_token = eos_token if no pad token exists
84
+
85
+ Section 3 β€” Load Data
86
+ - Read sft_data.jsonl
87
+ - Format each example: apply tokenizer.apply_chat_template to the messages list, then append the
88
+ completion string as the assistant turn. The final string is the full conversation for causal LM loss.
89
+ - Use datasets.Dataset.from_list
90
+ - Print tier distribution (counts per tier)
91
+ - Train/val split: 95/5
92
+
93
+ Section 4 β€” Train
94
+ - Use trl SFTTrainer with:
95
+ - per_device_train_batch_size=2, gradient_accumulation_steps=4 (effective batch 8)
96
+ - num_train_epochs=1
97
+ - learning_rate=2e-4, warmup_ratio=0.05, lr_scheduler_type="cosine"
98
+ - logging_steps=10, save_steps=100, save_total_limit=2
99
+ - output_dir="./sft_checkpoints"
100
+ - report_to="none"
101
+ - max_seq_length=2048, packing=True
102
+
103
+ Section 5 β€” Quick Eval (runs in Colab, requires env imports)
104
+ - Add sys.path and import WildfireEnv, serialize_observation, parse_action
105
+ - Run 10 full episodes (seeds 42–51) on easy tier with the trained model driving EVERY step:
106
+ - FastLanguageModel.for_inference(model)
107
+ - For each step: build messages, apply_chat_template, model.generate(max_new_tokens=128),
108
+ decode, parse_action(completion, obs), env.step(action)
109
+ - Accumulate total_reward; track parse_status counts
110
+ - Print: mean reward, std, json_success_rate, mean pop_saved_pct
111
+ - assert mean_reward > 2.0, "SFT warm-up insufficient β€” do not proceed to GRPO"
112
+ - FastLanguageModel.for_training(model) before returning
113
+
114
+ Section 6 β€” Save
115
+ - model.save_pretrained("./sft_final")
116
+ - tokenizer.save_pretrained("./sft_final")
117
+ - model.push_to_hub("YOUR_HF_USERNAME/wildfire-sft-7b") # leave as placeholder
118
+ - !zip -r sft_final.zip ./sft_final
119
+ - from google.colab import files; files.download("sft_final.zip")
120
+
121
+ IMPORTANT NOTES:
122
+ - parse_action(text, obs) requires a real obs object (it reads obs.grid). Always pass the current obs.
123
+ - serialize_observation signature: (obs, step_num, max_steps, tier="", prev_cells_burning=0)
124
+ - Instantiate a fresh HeuristicAgent (if used) for each episode β€” it has step_count state.
125
+ ```
126
+
127
+ ---
128
+
129
+ ## Prompt 3 β€” GRPO training notebook
130
+
131
+ ```
132
+ Write a complete Google Colab notebook `training/grpo_v2_colab.ipynb` for GRPO reinforcement
133
+ learning of a wildfire incident command model. This is a redesigned version that fixes five
134
+ critical issues from the previous attempt.
135
+
136
+ FIVE ISSUES FIXED IN THIS VERSION (do not reintroduce them):
137
+
138
+ Issue 1 β€” Prompt/reward state mismatch (critical):
139
+ Previous: dataset used mid-episode prompts; reward_fn picked a random seed β†’ model was scored
140
+ in a completely different env state than the one that produced its prompt.
141
+ Fix: Dataset uses step-0 prompts ONLY. Each row stores the seed used. The reward_fn resets the
142
+ env to that exact (tier, seed) pair before scoring the completion. Prompt state = reward state.
143
+
144
+ Issue 2 β€” Truncated rollout reward incomparable to curriculum thresholds (critical):
145
+ Previous: 15-step rollouts never reached min_active_steps=25, so terminal reward (+5.0) never
146
+ fired. GRPO rewards capped at ~1-2 while thresholds were set to 7.0/5.5. Promotion never happened.
147
+ Fix: The reward function runs the FULL episode to completion (model's 1 action at step 0, then
148
+ heuristic until env.done). Terminal reward is always included. Reward is comparable to baselines.
149
+
150
+ Issue 3 β€” Wasted inner model generations:
151
+ Previous: reward_fn called model.generate() 7 extra times per completion inside the reward loop.
152
+ GRPO gradients only flow through the originally sampled completion, making inner model steps
153
+ expensive noise with no gradient benefit.
154
+ Fix: MODEL_STEPS = 1. Only the sampled completion is applied. Heuristic drives everything after.
155
+
156
+ Issue 4 β€” GRPO loop too slow:
157
+ Consequence of Issue 3. Fix is same: MODEL_STEPS = 1 reduces reward_fn generate calls to 0.
158
+
159
+ Issue 5 β€” parse_action(text, None) crashes:
160
+ The parser reads obs.grid at line 1. Cannot pass None.
161
+ Fix: Use a standalone check_json_format(text) function in the format reward that does its own
162
+ JSON validation without needing an obs.
163
+
164
+ CORRECT FULL-EPISODE BASELINES (from scripts/results.json):
165
+ random: easy=+6.23 medium=+1.31 hard=+2.16
166
+ heuristic: easy=+7.53 medium=+6.31 hard=+4.74
167
+
168
+ STARTING POINT: SFT checkpoint at "YOUR_HF_USERNAME/wildfire-sft-7b" (or local sft_final.zip)
169
+
170
+ EXISTING ENV FILES (correct and working β€” do not reimplement):
171
+ - env/wildfire_env.py: WildfireEnv, reset(task_id, seed), step(action)->StepResult(observation,reward,done,info)
172
+ - env/serialization.py: serialize_observation(obs, step_num, max_steps, tier="", prev_cells_burning=0)->str
173
+ - env/action_parser.py: parse_action(text, obs)->(Action, status); status in ["json_success","regex_fallback","safe_idle"]
174
+ - agents/heuristic_agent.py: HeuristicAgent().act(obs)->Action [stateful: re-instantiate per episode]
175
+ - env/curriculum.py: CurriculumController(start_tier, thresholds); after_episode(reward)->Optional[str]; get_tier()->str
176
+ - env/models.py: TIER_EASY(episode_length=80), TIER_MEDIUM(episode_length=150), TIER_HARD(episode_length=300)
177
+
178
+ NOTEBOOK SECTIONS:
179
+
180
+ Section 1 β€” Install and assert GPU
181
+ - pip install: unsloth[colab-new] from git, trl==0.15.2, datasets==3.4.1, wandb
182
+ - assert torch.cuda.is_available()
183
+ - print GPU name and total VRAM
184
+
185
+ Section 2 β€” Load SFT checkpoint
186
+ - FastLanguageModel.from_pretrained("YOUR_HF_USERNAME/wildfire-sft-7b", load_in_4bit=True, max_seq_length=2048)
187
+ OR if loading from local zip: load base model first, then model.load_adapter(sft_path, adapter_name="default")
188
+ - Same LoRA: r=32, lora_alpha=64, target_modules=['q_proj','k_proj','v_proj','o_proj','gate_proj','up_proj','down_proj']
189
+
190
+ Section 3 β€” Constants and controller setup
191
+
192
+ ```python
193
+ import os, random, json
194
+ import torch
195
+ from env import WildfireEnv
196
+ from env.serialization import serialize_observation
197
+ from env.action_parser import parse_action
198
+ from agents.heuristic_agent import HeuristicAgent
199
+ from env.curriculum import CurriculumController
200
+ from datasets import Dataset
201
+
202
+ SEED_POOL = list(range(100)) # training seeds; eval uses 200+
203
+ TIER_MAX_STEPS = {'easy': 80, 'medium': 150, 'hard': 300}
204
+ SYSTEM_PROMPT = (
205
+ 'You are an AI Incident Commander managing wildfire containment. '
206
+ 'You will receive a situation briefing each step. '
207
+ 'Respond with ONLY a valid JSON action object and nothing else. '
208
+ 'Example: {"action_type": "idle"}'
209
+ )
210
+
211
+ # Thresholds calibrated to full-episode reward with heuristic continuation.
212
+ # Promote easy→medium once model's first action consistently beats random (+6.23).
213
+ # Promote medium→hard once model demonstrates meaningful improvement over random (+1.31).
214
+ controller = CurriculumController(
215
+ start_tier='easy',
216
+ thresholds={'easy': 6.5, 'medium': 3.5},
217
+ )
218
+
219
+ os.makedirs('training/samples', exist_ok=True)
220
+ _reward_call_count = 0
221
+ ```
222
+
223
+ Section 4 β€” Standalone JSON format checker (replaces parse_action for format reward)
224
+
225
+ ```python
226
+ import json as _json
227
+ from env.models import ActionType as _AT
228
+
229
+ _VALID_ACTION_TYPES = {a.value for a in _AT}
230
+
231
+ def check_json_format(text: str) -> str:
232
+ """
233
+ Validate LLM output format without needing an obs object.
234
+ Returns "json_success", "regex_fallback", or "safe_idle".
235
+ Does NOT use parse_action β€” avoids the obs.grid dependency.
236
+ """
237
+ # Strip code fences
238
+ import re
239
+ text = re.sub(r"```(?:json)?\s*", "", text).replace("```", "")
240
+ start = text.find("{")
241
+ if start == -1:
242
+ return "safe_idle"
243
+ depth = 0
244
+ end = -1
245
+ for i, ch in enumerate(text[start:], start=start):
246
+ if ch == "{": depth += 1
247
+ elif ch == "}":
248
+ depth -= 1
249
+ if depth == 0:
250
+ end = i
251
+ break
252
+ if end == -1:
253
+ return "safe_idle"
254
+ try:
255
+ obj = _json.loads(text[start:end+1])
256
+ if not isinstance(obj, dict):
257
+ return "safe_idle"
258
+ at = str(obj.get("action_type", "")).lower()
259
+ if at in _VALID_ACTION_TYPES:
260
+ return "json_success"
261
+ return "regex_fallback" # valid JSON but unrecognised action_type
262
+ except Exception:
263
+ return "regex_fallback" # JSON parse failed but had braces
264
+ ```
265
+
266
+ Section 5 β€” Two reward functions
267
+
268
+ reward_fn_outcome(completions, prompts, tier=None, seed=None, **kwargs):
269
+ """
270
+ Score each GRPO completion by:
271
+ 1. Resetting the env to the EXACT (tier, seed) that generated the prompt (Issue 1 fix).
272
+ 2. Applying the sampled completion as the single first action (MODEL_STEPS=1, Issue 3/4 fix).
273
+ 3. Running HeuristicAgent until episode completion (Issue 2 fix β€” captures terminal reward).
274
+
275
+ tier and seed are dataset columns forwarded by GRPOTrainer.
276
+ """
277
+ global _reward_call_count
278
+ _reward_call_count += 1
279
+ rewards = []
280
+
281
+ for i, completion in enumerate(completions):
282
+ ep_tier = tier[i] if tier is not None else controller.get_tier()
283
+ ep_seed = seed[i] if seed is not None else random.choice(SEED_POOL)
284
+
285
+ env = WildfireEnv()
286
+ obs = env.reset(task_id=ep_tier, seed=ep_seed) # step-0: matches prompt state exactly
287
+ total_reward = 0.0
288
+
289
+ # Apply the sampled completion as step 0
290
+ text = completion if isinstance(completion, str) else completion[0]['content']
291
+ action, _ = parse_action(text, obs)
292
+ result = env.step(action)
293
+ total_reward += result.reward
294
+ obs = result.observation
295
+
296
+ # Heuristic drives everything after (full episode to capture terminal reward)
297
+ heuristic = HeuristicAgent() # fresh instance per episode (stateful step_count)
298
+ while not env.done:
299
+ action = heuristic.act(obs)
300
+ result = env.step(action)
301
+ total_reward += result.reward
302
+ obs = result.observation
303
+
304
+ rewards.append(total_reward)
305
+
306
+ # Update curriculum (once per batch, not per completion)
307
+ mean_r = sum(rewards) / len(rewards)
308
+ promoted = controller.after_episode(mean_r)
309
+ if promoted:
310
+ print(f" *** Curriculum promoted to: {promoted} (mean batch reward={mean_r:.2f}) ***")
311
+
312
+ # Sample completions to disk for inspection (Issue 4 in HACKATHON_ALIGNMENT.md)
313
+ if _reward_call_count % 10 == 0:
314
+ sample_path = f'training/samples/call_{_reward_call_count}.txt'
315
+ with open(sample_path, 'w') as f:
316
+ f.write(f"call={_reward_call_count} tier={tier[0] if tier else '?'} reward={rewards[0]:.3f}\n")
317
+ f.write("---\n")
318
+ c = completions[0]
319
+ f.write(c if isinstance(c, str) else c[0]['content'])
320
+ f.write("\n")
321
+
322
+ return rewards
323
+
324
+
325
+ reward_fn_format(completions, prompts, **kwargs):
326
+ """
327
+ Scores JSON formatting quality using check_json_format() (no obs needed).
328
+ Runs independently of the env β€” fast and always well-defined.
329
+ """
330
+ rewards = []
331
+ for completion in completions:
332
+ text = completion if isinstance(completion, str) else completion[0]['content']
333
+ status = check_json_format(text)
334
+ if status == "json_success": r = 0.15
335
+ elif status == "regex_fallback": r = 0.0
336
+ else: r = -0.20 # safe_idle / garbage
337
+ rewards.append(r)
338
+ return rewards
339
+
340
+ Section 6 β€” Dataset builder (step-0 only; stores seed for reward alignment)
341
+
342
+ ```python
343
+ def build_prompt_dataset(n=200):
344
+ """
345
+ Build step-0 prompts for the current curriculum tier.
346
+ Stores the seed in each row so reward_fn can replay the exact same env state.
347
+ No mid-episode offset β€” GRPO prompt and reward state are always step-0.
348
+ Mid-episode diversity is handled by SFT, not GRPO.
349
+ """
350
+ rows = []
351
+ env_tmp = WildfireEnv()
352
+ tier = controller.get_tier()
353
+ max_steps = TIER_MAX_STEPS[tier]
354
+
355
+ for i in range(n):
356
+ seed = SEED_POOL[i % len(SEED_POOL)]
357
+ obs = env_tmp.reset(task_id=tier, seed=seed) # step-0
358
+ prompt = serialize_observation(obs, 0, max_steps, tier=tier, prev_cells_burning=0)
359
+ rows.append({
360
+ 'prompt': [
361
+ {'role': 'system', 'content': SYSTEM_PROMPT},
362
+ {'role': 'user', 'content': prompt},
363
+ ],
364
+ 'tier': tier,
365
+ 'seed': seed, # forwarded to reward_fn_outcome for exact state replay
366
+ })
367
+ return rows
368
+ ```
369
+
370
+ Section 7 β€” CurriculumDatasetCallback
371
+
372
+ Implement a trl TrainerCallback subclass that rebuilds the training dataset whenever the
373
+ curriculum controller promotes to a new tier:
374
+
375
+ ```python
376
+ from trl import TrainerCallback
377
+
378
+ class CurriculumDatasetCallback(TrainerCallback):
379
+ def __init__(self, trainer_ref):
380
+ self._trainer = trainer_ref
381
+ self._last_tier = controller.get_tier()
382
+
383
+ def on_step_end(self, args, state, control, **kwargs):
384
+ current_tier = controller.get_tier()
385
+ if current_tier != self._last_tier:
386
+ print(f" Rebuilding dataset for tier: {current_tier}")
387
+ new_ds = Dataset.from_list(build_prompt_dataset(200))
388
+ self._trainer.train_dataset = new_ds
389
+ self._last_tier = current_tier
390
+ ```
391
+
392
+ Section 8 β€” GRPOTrainer setup
393
+
394
+ ```python
395
+ from trl import GRPOTrainer, GRPOConfig
396
+
397
+ grpo_config = GRPOConfig(
398
+ output_dir="./grpo_checkpoints",
399
+ num_generations=8,
400
+ learning_rate=3e-6,
401
+ max_steps=400,
402
+ save_steps=20,
403
+ per_device_train_batch_size=1,
404
+ gradient_accumulation_steps=4,
405
+ max_completion_length=192, # enough for any valid action JSON
406
+ logging_steps=1,
407
+ report_to="wandb",
408
+ )
409
+
410
+ FastLanguageModel.for_training(model)
411
+
412
+ dataset = Dataset.from_list(build_prompt_dataset(200))
413
+
414
+ trainer = GRPOTrainer(
415
+ model=model,
416
+ processing_class=tokenizer,
417
+ reward_funcs=[reward_fn_outcome, reward_fn_format],
418
+ args=grpo_config,
419
+ train_dataset=dataset,
420
+ )
421
+ trainer.add_callback(CurriculumDatasetCallback(trainer))
422
+ ```
423
+
424
+ Section 9 β€” Run training
425
+
426
+ ```python
427
+ import wandb
428
+ wandb.init(project="wildfire-grpo", name="qwen7b-v2")
429
+
430
+ print(f"Starting GRPO β€” {grpo_config.max_steps} steps, {grpo_config.num_generations} gen/prompt")
431
+ print(f"Reward: 1 model step at step-0, heuristic continuation to episode completion")
432
+ print(f"Start tier: {controller.get_tier()}")
433
+
434
+ trainer.train()
435
+ print("Training complete.")
436
+
437
+ history = controller.get_history()
438
+ stats = [{'step': ep, 'tier': t, 'mean_reward': r} for ep, t, r in history]
439
+ with open('./training_stats.json', 'w') as f:
440
+ json.dump(stats, f, indent=2)
441
+ print("Stats saved -> training_stats.json")
442
+ ```
443
+
444
+ Section 10 β€” Evaluate vs baselines
445
+
446
+ - Load scripts/results.json for heuristic and random baseline scores
447
+ - For each tier in [easy, medium, hard], run 15 full episodes (seeds 42–56):
448
+ - FastLanguageModel.for_inference(model)
449
+ - Instantiate a FRESH LLMAgent per episode (it is stateful: _step, _prev_burning, parse counters)
450
+ - Model drives every step until env.done
451
+ - Record total_reward, pop_saved_pct, json_success_rate
452
+ - Print comparison table: Trained vs Heuristic vs Random, including vs_heuristic delta
453
+ - Print JSON success rate per tier
454
+ - assert: for at least 1 tier, trained_mean > heuristic_mean - 1.0
455
+
456
+ LLMAgent class to implement:
457
+ ```python
458
+ class LLMAgent:
459
+ def __init__(self, model, tokenizer, tier, max_steps):
460
+ self.model = model
461
+ self.tokenizer = tokenizer
462
+ self.tier = tier
463
+ self.max_steps = max_steps
464
+ self._step = 0
465
+ self._prev_burning = 0
466
+ self.json_success = self.regex_fallback = self.safe_idle = 0
467
+
468
+ def act(self, obs):
469
+ prompt = serialize_observation(obs, self._step, self.max_steps,
470
+ tier=self.tier,
471
+ prev_cells_burning=self._prev_burning)
472
+ self._prev_burning = obs.stats.cells_burning
473
+ messages = [{"role": "system", "content": SYSTEM_PROMPT},
474
+ {"role": "user", "content": prompt}]
475
+ input_ids = tokenizer.apply_chat_template(
476
+ messages, tokenize=True, add_generation_prompt=True, return_tensors='pt'
477
+ ).to(model.device)
478
+ with torch.no_grad():
479
+ out = model.generate(input_ids, max_new_tokens=128,
480
+ pad_token_id=tokenizer.eos_token_id)
481
+ text = tokenizer.decode(out[0][input_ids.shape[1]:], skip_special_tokens=True)
482
+ action, status = parse_action(text, obs)
483
+ if status == "json_success": self.json_success += 1
484
+ elif status == "regex_fallback": self.regex_fallback += 1
485
+ else: self.safe_idle += 1
486
+ self._step += 1
487
+ return action
488
+ ```
489
+
490
+ Section 11 β€” Save and push
491
+
492
+ - model.save_pretrained("./grpo_final")
493
+ - tokenizer.save_pretrained("./grpo_final")
494
+ - model.push_to_hub("YOUR_HF_USERNAME/wildfire-grpo-7b")
495
+ - !zip -r grpo_final.zip ./grpo_final
496
+ - files.download("grpo_final.zip")
497
+
498
+ IMPLEMENTATION CHECKLIST:
499
+ [ ] reward_fn_outcome uses seed from dataset row, NOT random.choice(SEED_POOL)
500
+ [ ] reward_fn_outcome resets env with env.reset(task_id=ep_tier, seed=ep_seed) β€” step-0 only
501
+ [ ] reward_fn_outcome runs heuristic until env.done (not a fixed step count)
502
+ [ ] reward_fn_format calls check_json_format(), NOT parse_action(text, None)
503
+ [ ] build_prompt_dataset has no step offset β€” always step-0 β€” and always saves seed in the row
504
+ [ ] CurriculumDatasetCallback triggers dataset rebuild on tier change
505
+ [ ] LLMAgent instantiated FRESH per episode in the eval section
506
+ [ ] FastLanguageModel.for_inference/for_training toggled correctly around eval calls
507
+ [ ] WildfireEnv instantiated fresh per completion in reward_fn_outcome (not shared)
508
+ [ ] HeuristicAgent instantiated fresh per episode in reward_fn_outcome (it has step_count state)
509
+ ```
510
+
511
+ ---
512
+
513
+ ## Prompt 4 β€” Evaluation and comparison script
514
+
515
+ ```
516
+ Write a standalone Python script `scripts/eval_trained_model.py` that evaluates a trained HF
517
+ adapter model against the heuristic and random baselines on the Wildfire Containment Simulator.
518
+
519
+ PURPOSE: Source-of-truth comparison table after training is complete.
520
+ Saves results to scripts/trained_results.json.
521
+
522
+ INPUTS (argparse):
523
+ - --model-path: HF hub ID or local path to the trained adapter (e.g. "username/wildfire-grpo-7b")
524
+ - --base-model: base model (default "unsloth/Qwen2.5-7B-Instruct")
525
+ - --num-seeds: evaluation seeds per tier (default 15, uses seeds 200–214 to avoid train overlap)
526
+ - --tiers: space-separated list (default "easy medium hard")
527
+
528
+ EXISTING FILES:
529
+ - graders/grader_easy.py, grader_medium.py, grader_hard.py β€” grade(agent, seed) -> (float, details_dict)
530
+ - agents/heuristic_agent.py β€” HeuristicAgent
531
+ - agents/random_agent.py β€” RandomAgent(seed=N)
532
+ - scripts/results.json β€” existing baselines
533
+ - env/wildfire_env.py, env/serialization.py, env/action_parser.py
534
+
535
+ SYSTEM_PROMPT = (
536
+ 'You are an AI Incident Commander managing wildfire containment. '
537
+ 'You will receive a situation briefing each step. '
538
+ 'Respond with ONLY a valid JSON action object and nothing else. '
539
+ 'Example: {"action_type": "idle"}'
540
+ )
541
+
542
+ LLM AGENT CLASS (stateful β€” MUST be instantiated fresh per episode):
543
+ ```python
544
+ class LLMAgent:
545
+ """
546
+ Wraps the trained model for grader compatibility.
547
+ Must be re-instantiated for every episode β€” _step and _prev_burning
548
+ are per-episode state and will produce wrong prompts if reused.
549
+ """
550
+ def __init__(self, model, tokenizer, tier, max_steps):
551
+ self.model = model
552
+ self.tokenizer = tokenizer
553
+ self.tier = tier
554
+ self.max_steps = max_steps
555
+ self._step = 0
556
+ self._prev_burning = 0
557
+ self.json_success = self.regex_fallback = self.safe_idle = 0
558
+
559
+ def act(self, obs):
560
+ import torch
561
+ prompt = serialize_observation(obs, self._step, self.max_steps,
562
+ tier=self.tier,
563
+ prev_cells_burning=self._prev_burning)
564
+ self._prev_burning = obs.stats.cells_burning
565
+ messages = [{"role": "system", "content": SYSTEM_PROMPT},
566
+ {"role": "user", "content": prompt}]
567
+ input_ids = self.tokenizer.apply_chat_template(
568
+ messages, tokenize=True, add_generation_prompt=True, return_tensors='pt'
569
+ ).to(self.model.device)
570
+ with torch.no_grad():
571
+ out = self.model.generate(input_ids, max_new_tokens=128,
572
+ pad_token_id=self.tokenizer.eos_token_id)
573
+ text = self.tokenizer.decode(out[0][input_ids.shape[1]:], skip_special_tokens=True)
574
+ action, status = parse_action(text, obs)
575
+ if status == "json_success": self.json_success += 1
576
+ elif status == "regex_fallback": self.regex_fallback += 1
577
+ else: self.safe_idle += 1
578
+ self._step += 1
579
+ return action
580
+ ```
581
+
582
+ GRADER WRAPPER (because graders pass agent to grade(), so agent is shared across seeds by default):
583
+ For LLMAgent, override this by not using grade() directly. Instead inline the grader logic and
584
+ instantiate a fresh LLMAgent(model, tokenizer, tier, max_steps) before EACH episode.
585
+
586
+ OUTPUT FORMAT:
587
+ ```
588
+ === Evaluation: Trained Model vs Baselines ===
589
+ Model: username/wildfire-grpo-7b
590
+ Seeds: 200-214 (15 per tier)
591
+
592
+ Tier Trained Heuristic Random vs Heuristic
593
+ -------------------------------------------------------
594
+ easy +7.21Β±0.3 +7.53Β±0.1 +6.23Β±3.1 -0.32
595
+ medium +6.89Β±1.2 +6.31Β±2.8 +1.31Β±3.2 +0.58 βœ“
596
+ hard +4.12Β±2.1 +4.74Β±3.8 +2.16Β±3.0 -0.62
597
+
598
+ JSON success rate: easy=91.2% medium=88.4% hard=85.1%
599
+ Pop saved rate: easy=100% medium=97% hard=93%
600
+ ```
601
+
602
+ Also save to scripts/trained_results.json in the same format as scripts/results.json, with an
603
+ additional "json_success_rate" field per tier.
604
+ ```
605
+
606
+ ---
607
+
frontend/app.js CHANGED
@@ -11,69 +11,6 @@
11
 
12
  "use strict";
13
 
14
- // ── API field helpers (snake_case from Python; tolerate camelCase if ever used) ─
15
- function pickStat(obj, ...keys) {
16
- if (!obj) return undefined;
17
- for (const k of keys) {
18
- if (Object.prototype.hasOwnProperty.call(obj, k) && obj[k] != null) {
19
- return obj[k];
20
- }
21
- }
22
- return undefined;
23
- }
24
-
25
- /**
26
- * Build display-ready episode metrics from the latest observation.
27
- * Falls back to grid-visible cells for land % only when server omits area_saved_pct.
28
- */
29
- function normalizeEpisodeStats(obs) {
30
- const st = obs?.stats ?? {};
31
- const cellsBurned = pickStat(st, "cells_burned", "cellsBurned") ?? 0;
32
- const popLost = pickStat(st, "population_lost", "populationLost") ?? 0;
33
- const totalPop = pickStat(st, "total_population", "totalPopulation") ?? 0;
34
-
35
- let areaSaved = pickStat(st, "area_saved_pct", "areaSavedPct");
36
- let civSafe = pickStat(st, "civilians_saved_pct", "civiliansSavedPct");
37
-
38
- if (areaSaved == null && obs?.grid?.length) {
39
- let burnable = 0;
40
- let burnedVis = 0;
41
- for (const row of obs.grid) {
42
- for (const cell of row) {
43
- const f = cell.fuel_type;
44
- if (!f || f === "water" || f === "road") continue;
45
- if (cell.fire_state === "unknown") continue;
46
- burnable++;
47
- if (cell.fire_state === "burned_out") burnedVis++;
48
- }
49
- }
50
- if (burnable > 0) {
51
- areaSaved = Math.round(1000 * (burnable - burnedVis) / burnable) / 10;
52
- }
53
- }
54
-
55
- if (civSafe == null && totalPop > 0) {
56
- civSafe = Math.round(1000 * (totalPop - popLost) / totalPop) / 10;
57
- } else if (civSafe == null && popLost === 0) {
58
- civSafe = 100.0;
59
- }
60
-
61
- const containment = pickStat(st, "containment_pct", "containmentPct");
62
- if (areaSaved == null && containment != null) {
63
- areaSaved = containment;
64
- }
65
-
66
- return {
67
- areaSaved,
68
- civSafe,
69
- cellsBurned,
70
- popLost,
71
- totalPop,
72
- currentStep: pickStat(st, "current_step", "currentStep"),
73
- raw: st,
74
- };
75
- }
76
-
77
  // ── Simulation state ──────────────────────────────────────────────────────────
78
  const sim = {
79
  obs: null, // current Observation (agent's view)
@@ -224,28 +161,17 @@ function renderCanvas(obs, groundTruth = null) {
224
  }
225
 
226
  // ── Stats panel ───────────────────────────────────────────────────────────────
227
- function updateStats(obs, cumulativeReward, lastStepReward) {
228
- if (!obs?.stats) return;
229
- const stats = obs.stats;
230
-
231
- const cur = pickStat(stats, "current_step", "currentStep") ?? 0;
232
- const max = pickStat(stats, "max_steps", "maxSteps") ?? 1;
233
-
234
- setText("stat-step", `${cur} / ${max}`);
235
-
236
- const n = normalizeEpisodeStats(obs);
237
- setText(
238
- "stat-land-saved-val",
239
- n.areaSaved != null ? `${Number(n.areaSaved).toFixed(1)}%` : "β€”"
240
- );
241
- setText(
242
- "stat-civilians-safe-val",
243
- n.civSafe != null ? `${Number(n.civSafe).toFixed(1)}%` : "β€”"
244
- );
245
- setText("stat-cells-burned-val", n.cellsBurned);
246
- setText("stat-burning-val", pickStat(stats, "cells_burning", "cellsBurning") ?? 0);
247
- setText("stat-pop-threat-val", pickStat(stats, "population_threatened", "populationThreatened") ?? 0);
248
- setText("stat-pop-lost-val", n.popLost);
249
 
250
  // Cumulative reward
251
  setText("reward-total", cumulativeReward.toFixed(3));
@@ -372,53 +298,31 @@ function updateActionLog(action) {
372
  }
373
 
374
  // ── Terminal overlay ──────────────────────────────────────────────────────────
375
- async function showTerminal() {
376
  const overlay = document.getElementById("terminal-overlay");
377
  if (!overlay) return;
378
 
379
- const card = document.getElementById("terminal-card");
380
- if (!card) return;
 
381
 
382
- const n = normalizeEpisodeStats(sim.obs);
383
  const title = card.querySelector("h2");
384
 
385
- if (n.popLost === 0) {
386
- title.textContent = "βœ… EPISODE COMPLETE";
387
  title.className = "win";
388
  } else {
389
  title.textContent = "⚠ EPISODE ENDED";
390
  title.className = "loss";
391
  }
392
 
393
- const landStr = n.areaSaved != null ? `${Number(n.areaSaved).toFixed(1)}%` : "β€”";
394
- const civStr = n.civSafe != null ? `${Number(n.civSafe).toFixed(1)}%` : "β€”";
395
- setText("terminal-land-saved", landStr);
396
- setText("terminal-civilians-safe", civStr);
397
- setText("terminal-cells-burned", String(n.cellsBurned));
398
- setText("terminal-pop-lost", n.popLost);
399
- setText("terminal-reward", sim.cumulativeReward.toFixed(3));
400
- setText("terminal-step", n.currentStep ?? "β€”");
401
 
402
  overlay.classList.add("show");
403
-
404
- // Authoritative end-game numbers (ground truth β€” fixes blank UI if observation JSON differed)
405
- try {
406
- const st = await apiGet("/state");
407
- if (st.error) return;
408
- const tb = st.total_burnable ?? 0;
409
- const burned = st.cells_burned ?? 0;
410
- const landPct = tb > 0 ? Math.round(1000 * (tb - burned) / tb) / 10 : 100;
411
- const tp = st.total_population ?? 0;
412
- const lost = st.population_lost ?? 0;
413
- const civPct = tp > 0 ? Math.round(1000 * (tp - lost) / tp) / 10 : 100;
414
- setText("terminal-land-saved", `${landPct}%`);
415
- setText("terminal-civilians-safe", `${civPct}%`);
416
- setText("terminal-cells-burned", String(burned));
417
- setText("terminal-pop-lost", String(lost));
418
- setText("terminal-step", st.current_step ?? "β€”");
419
- } catch (e) {
420
- console.warn("Could not refresh end-game stats from /state", e);
421
- }
422
  }
423
 
424
  function hideTerminal() {
@@ -452,7 +356,7 @@ async function apiGet(path) {
452
  function applyObservation(obs) {
453
  sim.obs = obs;
454
  renderCanvas(obs, sim.groundTruthData);
455
- updateStats(obs, sim.cumulativeReward, sim.lastStepReward);
456
  updateResources(obs.resources);
457
  updateWeather(obs.weather);
458
  updateEvents(obs.recent_events ?? []);
@@ -513,7 +417,7 @@ async function doAutoStep() {
513
 
514
  if (snap.done) {
515
  stopPlay();
516
- await showTerminal();
517
  break;
518
  }
519
  }
 
11
 
12
  "use strict";
13
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  // ── Simulation state ──────────────────────────────────────────────────────────
15
  const sim = {
16
  obs: null, // current Observation (agent's view)
 
161
  }
162
 
163
  // ── Stats panel ───────────────────────────────────────────────────────────────
164
+ function updateStats(stats, cumulativeReward, lastStepReward) {
165
+ if (!stats) return;
166
+
167
+ const cur = stats.current_step ?? 0;
168
+ const max = stats.max_steps ?? 1;
169
+
170
+ setText("stat-step", `${cur} / ${max}`);
171
+ setText("stat-containment-val", `${(stats.containment_pct ?? 0).toFixed(1)}%`);
172
+ setText("stat-burning-val", stats.cells_burning ?? 0);
173
+ setText("stat-pop-threat-val", stats.population_threatened ?? 0);
174
+ setText("stat-pop-lost-val", stats.population_lost ?? 0);
 
 
 
 
 
 
 
 
 
 
 
175
 
176
  // Cumulative reward
177
  setText("reward-total", cumulativeReward.toFixed(3));
 
298
  }
299
 
300
  // ── Terminal overlay ──────────────────────────────────────────────────────────
301
+ function showTerminal(obs) {
302
  const overlay = document.getElementById("terminal-overlay");
303
  if (!overlay) return;
304
 
305
+ const stats = obs?.stats ?? {};
306
+ const popLost = stats.population_lost ?? 0;
307
+ const containment = stats.containment_pct ?? 0;
308
 
309
+ const card = document.getElementById("terminal-card");
310
  const title = card.querySelector("h2");
311
 
312
+ if (popLost === 0) {
313
+ title.textContent = "βœ… FIRE CONTAINED";
314
  title.className = "win";
315
  } else {
316
  title.textContent = "⚠ EPISODE ENDED";
317
  title.className = "loss";
318
  }
319
 
320
+ setText("terminal-containment", `${containment.toFixed(1)}%`);
321
+ setText("terminal-pop-lost", popLost);
322
+ setText("terminal-reward", sim.cumulativeReward.toFixed(3));
323
+ setText("terminal-step", stats.current_step ?? "β€”");
 
 
 
 
324
 
325
  overlay.classList.add("show");
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
326
  }
327
 
328
  function hideTerminal() {
 
356
  function applyObservation(obs) {
357
  sim.obs = obs;
358
  renderCanvas(obs, sim.groundTruthData);
359
+ updateStats(obs.stats, sim.cumulativeReward, sim.lastStepReward);
360
  updateResources(obs.resources);
361
  updateWeather(obs.weather);
362
  updateEvents(obs.recent_events ?? []);
 
417
 
418
  if (snap.done) {
419
  stopPlay();
420
+ showTerminal(snap.observation);
421
  break;
422
  }
423
  }
frontend/index.html CHANGED
@@ -83,16 +83,8 @@
83
  <div id="terminal-card">
84
  <h2 class="win">βœ… FIRE CONTAINED</h2>
85
  <div class="stat-row">
86
- <span>Land saved (unburned)</span>
87
- <span id="terminal-land-saved">β€”</span>
88
- </div>
89
- <div class="stat-row">
90
- <span>Civilians safe</span>
91
- <span id="terminal-civilians-safe">β€”</span>
92
- </div>
93
- <div class="stat-row">
94
- <span>Cells burned (total)</span>
95
- <span id="terminal-cells-burned">β€”</span>
96
  </div>
97
  <div class="stat-row">
98
  <span>Population lost</span>
@@ -112,10 +104,6 @@
112
  </div>
113
  </div>
114
  </div>
115
- <p id="map-legend" class="map-legend">
116
- <strong>Map:</strong> green dot / circle = ground crew Β· blue outline = populated zone Β·
117
- bright blue cells = water Β· grey = roads
118
- </p>
119
  </main>
120
 
121
  <!-- Sidebar -->
@@ -129,17 +117,9 @@
129
  <span class="stat-label">STEP</span>
130
  <span class="stat-value" id="stat-step">β€” / β€”</span>
131
  </div>
132
- <div class="stat-item" id="stat-land-saved">
133
- <span class="stat-label">LAND SAVED</span>
134
- <span class="stat-value" id="stat-land-saved-val">β€”</span>
135
- </div>
136
- <div class="stat-item" id="stat-civilians-safe">
137
- <span class="stat-label">CIVILIANS SAFE</span>
138
- <span class="stat-value" id="stat-civilians-safe-val">β€”</span>
139
- </div>
140
- <div class="stat-item" id="stat-cells-burned">
141
- <span class="stat-label">CELLS BURNED</span>
142
- <span class="stat-value" id="stat-cells-burned-val">β€”</span>
143
  </div>
144
  <div class="stat-item" id="stat-burning">
145
  <span class="stat-label">BURNING</span>
@@ -294,6 +274,6 @@
294
  </span>
295
  </footer>
296
 
297
- <script src="app.js?v=4"></script>
298
  </body>
299
  </html>
 
83
  <div id="terminal-card">
84
  <h2 class="win">βœ… FIRE CONTAINED</h2>
85
  <div class="stat-row">
86
+ <span>Containment</span>
87
+ <span id="terminal-containment">β€”</span>
 
 
 
 
 
 
 
 
88
  </div>
89
  <div class="stat-row">
90
  <span>Population lost</span>
 
104
  </div>
105
  </div>
106
  </div>
 
 
 
 
107
  </main>
108
 
109
  <!-- Sidebar -->
 
117
  <span class="stat-label">STEP</span>
118
  <span class="stat-value" id="stat-step">β€” / β€”</span>
119
  </div>
120
+ <div class="stat-item" id="stat-containment">
121
+ <span class="stat-label">CONTAINMENT</span>
122
+ <span class="stat-value" id="stat-containment-val">β€”</span>
 
 
 
 
 
 
 
 
123
  </div>
124
  <div class="stat-item" id="stat-burning">
125
  <span class="stat-label">BURNING</span>
 
274
  </span>
275
  </footer>
276
 
277
+ <script src="app.js"></script>
278
  </body>
279
  </html>
frontend/style.css CHANGED
@@ -250,16 +250,6 @@ input[type="range"]::-webkit-slider-thumb {
250
 
251
  #grid-canvas { display: block; image-rendering: pixelated; }
252
 
253
- .map-legend {
254
- margin: 8px 0 0;
255
- padding: 6px 10px;
256
- font-size: 11px;
257
- color: var(--text-muted);
258
- line-height: 1.45;
259
- max-width: 100%;
260
- }
261
- .map-legend strong { color: var(--text); }
262
-
263
  /* Tooltip overlay (shows cell info on hover) */
264
  #cell-tooltip {
265
  position: absolute;
@@ -366,10 +356,8 @@ input[type="range"]::-webkit-slider-thumb {
366
  .stat-item.step-item { grid-column: 1 / -1; }
367
  .stat-item.step-item .stat-value { font-size: 14px; }
368
 
369
- #stat-land-saved .stat-value { color: var(--safe); }
370
- #stat-civilians-safe .stat-value { color: var(--safe); }
371
- #stat-cells-burned .stat-value { color: var(--warn); }
372
- #stat-burning .stat-value { color: var(--fire); }
373
  #stat-pop-threat .stat-value { color: var(--warn); }
374
  #stat-pop-lost .stat-value { color: var(--crit); }
375
 
 
250
 
251
  #grid-canvas { display: block; image-rendering: pixelated; }
252
 
 
 
 
 
 
 
 
 
 
 
253
  /* Tooltip overlay (shows cell info on hover) */
254
  #cell-tooltip {
255
  position: absolute;
 
356
  .stat-item.step-item { grid-column: 1 / -1; }
357
  .stat-item.step-item .stat-value { font-size: 14px; }
358
 
359
+ #stat-containment .stat-value { color: var(--safe); }
360
+ #stat-burning .stat-value { color: var(--fire); }
 
 
361
  #stat-pop-threat .stat-value { color: var(--warn); }
362
  #stat-pop-lost .stat-value { color: var(--crit); }
363