clarify-rl / docs /09-risks.md
Anurag Agarwal
ClarifyRL: initial HF Space deploy
2414d31
# 09 β€” Risk Register & Mitigations
Ranked by likelihood Γ— impact. Top of list = address first.
## R1 β€” Reward curve goes flat (HIGH likelihood, HIGH impact)
**Symptom**: After 100 GRPO steps, mean episode reward stays at baseline (~0.25).
**Causes**:
- Reward signal too sparse
- Per-step shaping too small relative to terminal reward
- Rollout parsing broken (model outputs gibberish, parser silently fails)
- KL coefficient (Ξ²) too high β†’ policy can't move
**Mitigations**:
- Sanity-check rollout parser: print 5 random completions + parsed actions
- Verify shaping rewards firing: log per-step reward by action type
- Reduce Ξ² to 0.01
- Increase shaping reward magnitude (Γ—2)
- Simplify rubric: drop InfoGain temporarily, use only FieldMatch
- Pre-warm with SFT on synthetic "ask first" trajectories (1-2 epochs)
**Time to detect**: 15 min (smoke test of 100 steps)
**Time to fix**: 30-60 min
## R2 β€” Reward hacking (HIGH likelihood, MEDIUM impact)
**Symptom**: Reward curve climbs but qualitative outputs are gibberish/repetitive.
**Likely hacks**:
- Always ask same generic question 6 times then submit empty plan
- Submit JSON with all profile field keys but garbage values
- Output the same action token over and over
**Mitigations**:
- Duplicate-Q penalty (already in plan)
- HallucinationCheckRubric (already in plan)
- FormatCheck Gate with strict schema (already in plan)
- Add EntropyRubric: penalize repeated actions (component if needed)
- Manual inspection of 10 trained outputs every 100 steps
**Time to detect**: 100 GRPO steps + manual inspection (15 min)
**Time to fix**: 30 min (add penalty component)
## R3 β€” Colab session times out mid-training (MEDIUM, MEDIUM)
**Symptom**: Long training run gets killed by Colab free-tier session limits.
**Mitigations**:
- Save LoRA checkpoint every 100 steps
- Always run training in resumable form (TRL supports resume from checkpoint)
- Plan training in 100-step chunks, not one mega-run
- Have second Google account ready for backup
**Time to detect**: live
**Time to fix**: 5 min (resume from last checkpoint)
## R4 β€” HF Space build fails (MEDIUM, HIGH)
**Symptom**: `git push space main` succeeds but Space build errors out.
**Common causes**:
- Dockerfile issues (missing deps, wrong Python version)
- pyproject.toml resolution failure
- HF Space hardware mismatch
**Mitigations**:
- Test Docker build LOCALLY before pushing: `docker build -t clarify-rl . && docker run -p 8000:8000 clarify-rl`
- Mirror EXACT Dockerfile from working SRE env (which we know builds)
- Push minimal stub Space FIRST (just FastAPI hello world), confirm builds, then layer on env
- Keep Space build logs open in browser tab while pushing
**Time to detect**: 5-10 min (HF build logs)
**Time to fix**: 15-30 min (Docker iteration)
## R5 β€” Validator rejects submission (LOW likelihood, FATAL impact)
**Symptom**: Auto-validator marks submission incomplete; never reaches human judges.
**Mitigations**:
- Run through every item in `docs/07-deployment.md` checklist
- 1-hour pre-deadline buffer for fixes
- Test ALL deliverable links from incognito browser
- Make sure plots are committed as files, not just in notebook outputs
**Time to detect**: post-submission (TOO LATE β€” must validate before)
**Time to fix**: depends on what's missing
## R6 β€” Training takes too long on T4 (LOW, MEDIUM)
**Symptom**: 600 GRPO steps take >2 hours; eats into Day 2 schedule.
**Mitigations**:
- Use Unsloth (we already are)
- Use 4-bit quantization (we already are)
- Reduce max_seq_length to 2048 if needed
- Reduce num_generations to 2 (instead of 4)
- Stop at 300 steps if curve is good β€” quality > quantity
**Time to detect**: 30 min into training (extrapolate)
**Time to fix**: tune config, restart from checkpoint
## R7 β€” Rubric doesn't separate good from bad (LOW, HIGH) β€” βœ… VERIFIED OK
**Symptom**: Even oracle policy gets ~0.5; even random policy gets ~0.5.
**Causes**:
- Weights wrong, components average out
- FormatCheck too lenient
- HallucinationCheck too punitive
**Mitigations**:
- Run sanity policies BEFORE training:
- Random: should get ~0.20
- Oracle (asks all critical Qs, perfect plan): should get ~0.95
- Blank plan: should get 0.0
- If gap is small, retune weights and component logic before training
**Current status**: Oracle scores ~0.89 via `smoke_env.py` (FormatCheck=1.0, FieldMatch=1.0, InfoGain=1.0, Efficiency=0.5, Hallucination=0.75). Gap is healthy.
**Time to detect**: 10 min (sanity script)
**Time to fix**: 30-60 min
## R8 β€” Profile generator produces unsolvable scenarios (LOW, MEDIUM) β€” βœ… MITIGATED
**Symptom**: Even oracle can't get high score on some scenarios.
**Causes**:
- Field vocabulary too sparse β†’ user simulator returns wrong field
- Critical fields not always present
- Request template too vague to even hint at task type
**Mitigations**:
- Validate generator: 100 random scenarios β†’ oracle scores them β†’ all should be β‰₯0.7
- Add task_type hint to every request template (subtle, e.g. "dinner" β†’ restaurant)
- Ensure FIELD_KEYWORDS covers all profile fields
**Fix applied**: `scenarios.py` now always includes `required_keys` in the profile for medium/hard difficulty. Hard range adjusted to (6,7) to match actual field pool sizes (max 7).
**Time to detect**: 5 min (sanity check)
**Time to fix**: 15-30 min
## R9 β€” One team member becomes unavailable (LOW, HIGH)
**Symptom**: Anurag or Kanan can't continue (illness, technical issues, lost device).
**Mitigations**:
- Both can git-push to both remotes
- Both have HF + GitHub credentials
- Both have Colab access
- Pair-program critical sections (env, rubric)
**Time to detect**: live
**Time to fix**: depends, but project should continue
## R10 β€” Last-minute organizational changes (LOW, VARIABLE)
**Symptom**: Submission form changes, deadline shifts, theme reinterpretations announced.
**Mitigations**:
- Monitor Discord every 2 hours
- Both team members on Discord notifications
- Have a Plan B for each deliverable (video OR blog, not both required)
## Fallback Plans (graceful degradation)
If we run out of time:
1. **Cut difficulty levels**: Ship only "medium" task β€” still scores well on Storytelling
2. **Cut task types**: Ship 3 of 5 task types instead of all 5
3. **Cut training**: Use Unsloth pre-trained on synthetic SFT data, skip GRPO. Worse story but still ships.
4. **Cut video**: Ship blog post only.
5. **Cut blog**: Ship video only.
The core ship is: **HF Space + Colab + plots + README**. Everything else is bonus.
## Risk Score Summary
| ID | Risk | L | I | Score |
|----|------|---|---|-------|
| R1 | Reward curve flat | H | H | 9 |
| R2 | Reward hacking | H | M | 6 |
| R3 | Colab timeout | M | M | 4 |
| R4 | HF Space build fail | M | H | 6 |
| R5 | Validator rejection | L | F | 5 |
| R6 | Training too slow | L | M | 2 |
| R7 | Rubric doesn't separate | L | H | 3 |
| R8 | Bad scenarios | L | M | 2 |
| R9 | Team member down | L | H | 3 |
| R10 | Org changes | L | V | 1 |
L=likelihood, I=impact, F=fatal.
**Top 3 to actively mitigate during build**: R1, R2, R4.