Spaces:
Sleeping
risk_book.md β DriftCall Consolidated Risk Register
Version: v1.0 Owner: Person B (Rewards & Tests; CLAUDE.md Β§2.2) Primary source: DESIGN.md Β§14 (12 risks) + CLAUDE.md Β§11 (escalation & stop conditions) Consumes from: training.md Β§5, vendors.md Β§5, audio.md Β§5 Consumed by: env.md, training.md, audio.md, deploy_env_space.md, deploy_demo_space.md, pitch_demo.md, evaluation.md
1. Purpose
The Risk Book is the single consolidated risk register for DriftCall's 48-hour onsite execution (Apr 25β26, 2026) plus the β18 hours of pre-onsite doc + smoke work. It unifies three previously-separate concerns:
- DESIGN.md Β§14 static register β 12 risks known at spec-lock time, each with P Γ I Γ mitigation.
- Module-level error modes β concrete exceptions declared in
training.md Β§5,vendors.md Β§5,audio.md Β§5that, when they fire at runtime, are RISK EVENTS this book owns routing for. - CLAUDE.md Β§11 escalate / stop conditions β the gates that halt agent dispatch or terminate the plan.
Without this consolidation every teammate would independently rediscover stop conditions, escalation paths, and who owns which mitigation. This doc centralizes the answer so that β under the hackathon clock β decisions are table-lookups, not debates.
The Risk Book is live. It is consulted at three moments:
- Onsite start (hour 0 of Apr 25): Person B runs
Risk.assess(), reviews every entry's owner + trigger_signal with that owner, and blesses the register as the active contract. - Runtime (continuously): the orchestrator and module owners call
Risk.triage(signal)whenever a module-level error mode fires or an observable trigger is crossed. Triage returns the rooted mitigation + any escalation path. - Post-mortem (hour 48+): the
RiskLog(append-only) feeds the reward-hacking probe (DESIGN.md Β§13 deliverable #9) and the blog post (Β§15 Pitch).
Goal: every mitigation has a named owner and an observable trigger signal. No "we'll handle it when it comes up." If a risk fires and no one knows they own it, that is the design failure this book exists to prevent.
2. Interface
driftcall/risk.py (authored in Phase C by Person B) exposes the following. Phase D spec only β no code yet.
2.1 RiskEntry (frozen dataclass)
from __future__ import annotations
from dataclasses import dataclass
from enum import Enum
from typing import Optional
@dataclass(frozen=True)
class RiskEntry:
id: str # "R01" ... "R15", "R01-STOP", "R06-STOP", "R99" (stable)
description: str # one-sentence human-readable risk
probability: Probability # Low | Med | High
impact: Impact # low | med | high | kills
mitigation: str # concrete action; cites module doc Β§
owner: str # "A" | "B" | "C" | "D" | "orchestrator" | "team"
trigger_signal: TriggerSignal # observable condition that fires mitigation
stop_condition: Optional[StopCondition] = None # None if not a hard-stop risk
design_refs: tuple[str, ...] = () # e.g. ("DESIGN.md Β§14 #1", "training.md Β§7a")
2.2 Enums
class Probability(str, Enum):
LOW = "Low"
MED = "Med"
HIGH = "High"
class Impact(str, Enum):
LOW = "low" # nuisance; deliverable unaffected
MED = "med" # one deliverable degraded
HIGH = "high" # β₯ 2 deliverables at risk
KILLS = "kills" # training or demo cannot ship
class TriggerSignal(str, Enum):
# Training-loop signals (mirrors training.md Β§5)
GRAD_NORM_INF = "grad_norm_inf"
POLICY_KL_OVER_10 = "policy_kl_over_10"
R5_DROP_WITH_HACK_SPIKE = "r5_drop_with_hack_spike"
CHECKPOINT_IO_ERROR = "checkpoint_io_error"
OOM_AT_G8 = "oom_at_g8"
STAGE1_R1_BELOW_0_4_AT_100 = "stage1_r1_below_0_4_at_step_100"
# Vendor signals (mirrors vendors.md Β§5)
UNEXPECTED_STATUS_VALUE = "vendor_status_not_in_five"
ERROR_ENVELOPE_VIOLATION = "vendor_response_field_outside_catalogue"
# Audio signals (mirrors audio.md Β§5)
KOKORO_LOAD_FAILED = "kokoro_model_load_error"
WHISPER_LOAD_FAILED = "whisper_model_load_error"
INDIC_VOICE_PACK_MISSING = "indic_voice_pack_missing"
# Infra / process signals
V100_DOWN = "v100_unreachable"
V100_DOWN_OVER_8H = "v100_unavailable_over_8h"
HF_HUB_OUTAGE = "hf_hub_or_spaces_5xx"
HF_HUB_OUTAGE_OVER_2H = "hf_hub_outage_over_2h"
WANDB_STARTUP_FAIL = "wandb_init_failed_nonoffline"
DOCKER_IMAGE_OVER_2GB = "docker_image_over_2gb"
OPENENV_VALIDATE_FAIL = "openenv_validate_failure"
GEMMA4_SMOKE_FAIL = "gemma4_e2b_smoke_test_failed"
MERGE_CONFLICT_CROSS_OWNER = "merge_conflict_a_and_b_owned_files"
TEAM_MEMBER_DROP = "team_member_dropped"
TEAM_3PERSON_BELOW_GATE = "three_person_cannot_meet_phase_d_gate"
JUDGE_LANGUAGE_MISMATCH = "judge_does_not_speak_indic"
# Meta
UNKNOWN = "unknown"
class StopCondition(str, Enum):
ESCALATE_TO_USER = "escalate" # pause dispatch; ping user
HARD_STOP = "hard_stop" # terminate plan; re-plan with user
2.3 Assessment + triage API
class Risk:
@staticmethod
def assess() -> list[RiskEntry]:
"""Return the frozen register (Β§6 below, as data).
Called once at onsite start by Person B. No side effects.
Invariant: `len(assess()) >= 15` and every entry has a non-None owner."""
@staticmethod
def triage(signal: TriggerSignal, *, context: Optional[str] = None) -> TriageResult:
"""Look up the RiskEntry keyed by `signal`, return the handling plan.
Idempotent. Never raises β if `signal == UNKNOWN` or no entry matches,
returns a TriageResult pointing to R99 (unknown-class risk)."""
@dataclass(frozen=True)
class TriageResult:
entry: RiskEntry
action: str # copy of entry.mitigation, ready to execute
escalate: bool # True iff entry.stop_condition == ESCALATE_TO_USER or HARD_STOP
hard_stop: bool # True iff entry.stop_condition == HARD_STOP
log_line: str # one-line summary for RiskLog append
class RiskLog:
"""Append-only audit log. One line per triage call during the 48h.
Serialized to `risk_log.jsonl` at episode end; feeds reward-hacking probe."""
def append(self, t: TriageResult, ts_iso: str) -> None: ...
def to_jsonl(self, path: str) -> None: ...
def summary(self) -> dict[str, int]:
"""{risk_id: fire_count} for the post-mortem report."""
2.4 Wiring points
| Caller | Trigger | How |
|---|---|---|
training/train_grpo.py callback |
KLDivergenceExplosion, RewardCollapseError, CheckpointIOError, OutOfMemoryError |
Risk.triage(signal) then honor hard_stop / escalate |
driftcall/env.py |
AudioDecodeError, ModelLoadError, UnsupportedLanguageError |
Risk.triage(signal) for logging; env already has local mitigations |
driftcall/vendors/*.py |
envelope-shape test fires | Risk.triage(ERROR_ENVELOPE_VIOLATION) β always hard_stop |
| Orchestrator (this CLAUDE session) | openenv validate fail, smoke-test fail, team drop |
Risk.triage(signal) before deciding to escalate |
deploy_env_space.md health check |
HF Hub 5xx, Docker image > 2 GB | Risk.triage(signal) for the deploy runbook |
3. Behavior Spec
3.1 Invariants
- Minimum 17 entries.
len(Risk.assess()) >= 17β 12 from DESIGN.md Β§14 (R01βR12) + 3 mandated here (R13βR15) + R01-STOP + R06-STOP + R99 + room for post-onsite additions. Asserted intests/test_risk_book.py. - Every entry has a named owner. Owner β {A, B, C, D, orchestrator, team}. No
None, no "TBD". Asserted. - Every entry has an observable trigger signal. Either a module error class (training.md Β§5, vendors.md Β§5, audio.md Β§5) or an explicit infra condition. No "we'll notice it." Asserted.
- Every CLAUDE.md Β§11 condition is represented. Both escalation items (5) AND hard-stop items (3) appear as
StopCondition-bearing entries. Asserted by a coverage test. - Mitigations cite their source.
design_refsis non-empty for every entry. This lets a reviewer trace any mitigation back to the canonical spec.
3.2 Triage loop (runtime)
on signal s fired at time t:
1. triage = Risk.triage(s)
2. append triage to RiskLog with ts=t
3. if triage.hard_stop:
orchestrator halts all Agent dispatch
SendMessage team-lead: "HARD_STOP fired for {triage.entry.id}: {triage.log_line}"
await user intervention
4. elif triage.escalate:
orchestrator pauses new dispatch in affected scope
SendMessage team-lead: "ESCALATE {triage.entry.id}: {triage.log_line}"
continue other independent work
5. else:
owner executes triage.action in their worktree
orchestrator continues
3.3 Escalation path (from CLAUDE.md Β§11 verbatim)
| Signal | Scope | First responder | Escalates to |
|---|---|---|---|
| Gemma 3n E2B smoke fails on V100 | training | Person C | User (block; may downshift to Gemma 3 4B) |
openenv validate fails 3Γ |
deploy | Person D | User (block; may need schema deviation approval) |
| Stage-1 R1 < 0.4 at step 100 | training | Person C β Person B | User (curriculum/reward redesign required) |
| Critic flags consistent DESIGN.md flaw | any phase | Orchestrator | User (update spec before continuing) |
| Merge conflict across A-owned and B-owned files | any phase | Orchestrator | User β ownership was wrong (process fix) |
| V100 unavailable > 8h during onsite | training | Person C | Hard stop β re-plan with user |
| HF Hub / Spaces outage > 2h | deploy | Person D | Hard stop |
| Team member drops AND 3-person cannot meet Phase D gate | all | Orchestrator | Hard stop |
3.4 Decision thresholds
| Impact | Probability | Action |
|---|---|---|
kills |
any | Immediate surface on first trigger; mitigation is non-negotiable |
high |
High or Med |
Immediate surface; mitigation inline |
high |
Low |
Pre-stage mitigation; surface on first trigger |
med |
High |
Pre-stage mitigation; surface on repeat |
med |
Med or Low |
Defer to next batch boundary; log only on first trigger |
low |
any | Defer; log only |
"Surface" = SendMessage team-lead + RiskLog.append. "Defer" = RiskLog.append only, no interruption.
3.5 Retry semantics
Mitigations that retry (e.g., HF Hub upload with 3Γ backoff) do so at their own layer (training.md Β§5 CheckpointIOError). This book does NOT implement retries β it routes the final failure after retries are exhausted. A CheckpointIOError in Risk.triage means 3 retries already failed.
3.6 Post-mortem
At hour 48, Person B runs RiskLog.summary(). Any risk that fired β₯ 3Γ is a candidate spec bug β update DESIGN.md Β§14 + this doc in the post-hackathon merge (not during the 48h).
4. Data Structures
4.1 RiskEntry β frozen, immutable, append-once at assess().
Serialization: to_dict() / from_dict() round-trip stable. id is the primary key; changing any other field requires a new entry (versioning by file revision, not in-memory mutation). Enforced by frozen=True.
4.2 RiskLog β append-only in-memory list wrapped as immutable view per append.
- Storage format: JSONL (one
TriageResult.to_dict()per line). - Path:
logs/risk_log.jsonlwithin the run directory (DESIGN.md Β§13 deliverable #6 traces). - Never truncated during a run. A new run writes a new file (run_id prefix).
4.3 TriggerSignal β string Enum; stable wire format.
Adding a new signal requires (a) a new RiskEntry with that signal OR routing to R99 (unknown), and (b) a new line in the Β§2.2 enum above. Removing a signal is forbidden during the 48h.
4.4 StopCondition β three-valued sum type (None | ESCALATE | HARD_STOP).
None is the default (most risks have a local mitigation, not an escalation). ESCALATE and HARD_STOP are reserved for the CLAUDE.md Β§11 set.
5. Error Modes
These are the error classes this module itself can raise or encounter β distinct from the risks it catalogs.
| Situation | Handling |
|---|---|
Unknown-class risk (not in register) β Risk.triage(UNKNOWN) or a signal with no matching entry |
Returns TriageResult(entry=R99, escalate=True, log_line="unknown signal: <raw>"). Does NOT raise. Orchestrator is notified via SendMessage. R99 is the catch-all so triage() is total. |
| Stop-condition met (V100 down > 8h, HF outage > 2h, team drop + gate-miss) | triage().hard_stop == True. Orchestrator halts dispatch. This is not an error in risk.py β it is the correct behavior. |
| Unresolvable mitigation β owner reports mitigation attempt failed | Owner calls Risk.triage(signal) a second time within 5 min with context="mitigation_failed". If stop_condition is already ESCALATE, upgrade path to HARD_STOP is explicitly a user decision (not automatic). |
Duplicate registration β same id appears twice in assess() |
AssertionError at import-time test (tests/test_risk_book.py::test_unique_ids). The register is a static data structure; duplicates are a code bug. |
design_refs cites a non-existent section |
Caught by tests/test_risk_book.py::test_design_refs_valid which greps the referenced files for the cited headings. Missing β test failure β fix before merge. |
RiskLog write fails (disk full) |
Log to stderr + continue. The run is more important than the log line. This is a silent-log fallback, not a silent-failure of the risk itself. |
Risk.triage called before Risk.assess |
Safe β assess() returns pure data; triage() internally calls it if not cached. No initialization-order coupling. |
Policy: risk.py never crashes the run. If in doubt, route to R99 and escalate.
6. Dependencies
6.1 Consumes
DESIGN.md Β§14β the 12-risk table (R01βR12 below).CLAUDE.md Β§11β escalation + stop conditions (R13βR15 andStopConditionwiring).docs/modules/training.md Β§5β exception class names mapped toTriggerSignalvalues for training risks.docs/modules/vendors.md Β§5β envelope-shape invariant mapped toERROR_ENVELOPE_VIOLATION.docs/modules/audio.md Β§5β model-load / voice-pack failures mapped toKOKORO_LOAD_FAILED,WHISPER_LOAD_FAILED,INDIC_VOICE_PACK_MISSING.
6.2 Consumed by
env.mdβ catches audio errors; callsRisk.triagefor logging.env.pyalready has local fallbacks (audio.md Β§5 table) so the triage call is informational.training.mdβ the training callback wiring (training.md Β§5) mapsKLDivergenceExplosion,RewardCollapseError,CheckpointIOError,OutOfMemoryError,LanguageCohortCollapseError,WandBStartupErrorto triggers. Callback delegates toRisk.triage.audio.mdβ reports startup / missing-pack conditions but owns local mitigations.deploy_env_space.md,deploy_demo_space.mdβ the deploy runbooks reference R06 (ZeroGPU quota), R10 (Docker image size), R11 (openenv validate), R06-STOP (HF Hub/Spaces outage > 2h) for their Day-2 checklists.pitch_demo.mdβ references R06 (ZeroGPU) as the reason for thegradio share=Truefallback demo path; references R12 (judge language) for the English-captions design choice.evaluation.mdβ references R13 (Stage 1 convergence) as the gate for final-eval vs retrain-decision.
6.3 Does not depend on
- The agent / LLM β this book is pure environment + process code.
- HF Hub at runtime β
assess()reads from local Python data only. - Any vendor module β vendors report errors to this book, not the other way.
7. Edge Cases
Minimum 5 required. Ten here for completeness.
7.1 Compound risk trigger (two risks fire in the same step)
Scenario: During Stage-3 training, at step 247, the V100 briefly disconnects (R01-V100-adjacent) AND train/policy_kl crosses 10.0 (R13 KL catastrophe) within a 30-second window.
Handling: Risk.triage is called twice β once per signal β in whatever order the callback observes them. Both append to RiskLog. The orchestrator sees two HARD_STOP / ESCALATE messages and treats them as independent incidents. RiskLog.summary() post-mortem will show both. We do NOT try to merge compound events into a single "meta-risk" entry β that invites silent de-duplication bugs.
7.2 New risk surfaced mid-event
Scenario: Hour 22 of onsite, Person D discovers the HF Hub deprecated huggingface-cli upload in favor of hf upload and the Dockerfile command silently no-ops. This is not in the register.
Handling: Person D calls Risk.triage(UNKNOWN, context="hf-cli-deprecated"). Returns R99 (unknown-class). Person D messages the team lead. A one-line entry is appended to the doc (not the frozen register) under Β§11 "Live additions". The frozen register gets updated in the post-hackathon PR β during the 48h we never mutate assess() because test suites pin its length + contents.
7.3 Mitigation fails (first attempt)
Scenario: R01 (V100 FP16 instability) fires. Mitigation: learning_rate 5e-6 β 2e-6, resume from last checkpoint. Person C does that. Step 30 later, grad_norm spikes inf again.
Handling: Person C re-triages with context="mitigation_failed_round_1". R01's stop_condition is None (not originally escalate). But two consecutive mitigation failures promote this to ESCALATE β the promotion is a human decision, encoded in the triage-loop Β§3.2 step 3 ("elif triage.escalate"). Person C SendMessage team lead with the promoted severity. Never silent, never automatic.
7.4 Team member reassign (A covers B)
Scenario: Person B gets a fever at hour 30. Person A picks up rewards + test plans. R09 (team drop) fires with TEAM_MEMBER_DROP signal.
Handling: R09's mitigation is "roles are additive β Person D covers A+env, Person C covers rewards". But in this specific case Person A takes B's load (A has capacity because env code is already in C-batch). Orchestrator updates the live-ownership mapping in memory (not in CLAUDE.md Β§2.2 β that is a spec, not a runtime roster). If 3-person coverage cannot make the Phase D gate on time (TEAM_3PERSON_BELOW_GATE signal), R09 promotes to HARD_STOP.
7.5 Mitigation conflicts with deliverable
Scenario: R03 (Hinglish Whisper noise) mitigation: "score R3/R4 on semantic match not exact string". This was a DESIGN.md Β§14 decision pre-locked. At hour 36, Person B proposes tightening R4 to exact-string to fix a specific reward-hack exploit found in the probe (DESIGN.md Β§13 #9).
Handling: This is a spec conflict, not a triage event. Person B updates rewards.md AND DESIGN.md Β§14 AND this doc's R03 entry. Critic-gated change. NOT a Risk.triage call β triage is runtime, the spec conflict is authoring. The rule: mitigation edits go through the doc workflow, not through the runtime log.
7.6 False-positive stop (V100 dropped, came back in 4 min)
Scenario: V100 SSH drops at hour 20 minute 8. Returns at hour 20 minute 12. V100_DOWN fires; V100_DOWN_OVER_8H does NOT fire (not enough elapsed time).
Handling: R01-adjacent V100_DOWN has stop_condition=None β it's a nuisance. RiskLog.append only. Resume work. The HARD_STOP is only on the _OVER_8H variant, which requires a wall-clock threshold. This matches CLAUDE.md Β§11 literally ("V100 unavailable for > 8h").
7.7 ZeroGPU quota depleted during demo day
Scenario: Saturday midday, demo Space has been hit by spectators; ZeroGPU quota for krrishchoudhary109 is exhausted.
Handling: R07 (HF Space ZeroGPU) trigger fires. Mitigation: fallback to gradio share=True local tunnel. Person D already has $20 A10G reserved β switch to that first (cheaper for real judge eval). If A10G pulls a 5xx, fall to share=True on the local V100 training box (after Stage-3 completes). RiskLog records which path was taken for the blog.
7.8 openenv validate fails on a subtle schema issue
Scenario: openenv validate . reports observation.tool_results[].schema_version type is str but validator expects Literal["v1", "v2", "v3"].
Handling: R11 (openenv validate fails) fires. Mitigation: "Validate early (pre-onsite hour 16 gate)." If we are past that gate and still failing, after 3 fix attempts OPENENV_VALIDATE_FAIL promotes to ESCALATE. Person D + Orchestrator decide whether to (a) update our models.md to use the Literal, (b) ask user for schema-deviation approval, or (c) drop the validator check β never (c) without user approval.
7.9 Reward-hacking spike during Stage 2
Scenario: train/R5_mean drops to -0.32 at step 180 of Stage 2. train/reward_mean stays flat β classic hack indicator. training.md Β§5 RewardCollapseError fires.
Handling: R05 (reward hacking on R2) trigger. Mitigation chain: (i) training halts, RewardCollapseError raised; (ii) Person B runs reward-hacking probe on last 200 episodes per DESIGN.md Β§13 #9; (iii) if a new exploit pattern is found, update anti_hack_penalty logic in rewards.py per rewards.md Β§3.6; (iv) resume from pre-regression checkpoint, NOT the current one. This case is the single most important runtime mitigation in the book β reward hacking propagates through the model weights and cannot be fixed by just resuming.
7.10 Judge doesn't speak Hindi
Scenario: Demo day, the judge hearing our booth is fluent only in English + Mandarin (neither Indic).
Handling: R12 pre-staged mitigation β every audio clip in the deck has English captions, and the demo Gradio UI auto-translates agent replies to English for readability. No runtime trigger; this is a design-time mitigation surfaced at the pitch-script level. RiskLog records which judges the English path fired for (instrumentation lives in the demo, not in this module).
8. Examples
Three concrete worked examples, runnable if implemented.
8.1 R01 β V100 FP16 gradient instability β detection + mitigation + rollback
Setup. Hour 6 of onsite. Stage-1 training starts. Fresh V100 box; Unsloth 2026.4.5 pinned; use_bias_correction_kl=True. Everything looks right.
Detection (step 14). Training callback reports grad_norm=inf. NonFiniteGradientError would be raised on the 4th consecutive skip. At step 14, this is the first occurrence.
signal = TriggerSignal.GRAD_NORM_INF
triage = Risk.triage(signal)
# β TriageResult(
# entry=R01,
# action="Unsloth 4-bit QLoRA + FP16 autocast; grad clip 1.0; loss-scale monitored every 10 steps; fallback to dtype=float16 explicit",
# escalate=False, hard_stop=False,
# log_line="R01 fired (step 14): grad_norm=inf; mitigation=fp16_autocast+gradclip1.0"
# )
RiskLog.append(triage, ts_iso="2026-04-25T15:22:04+05:30")
# Person C inspects: loss-scale has halved twice in last 50 steps β precursor to underflow.
# Action (from training.md Β§7a mitigation 5): drop learning_rate 5e-6 β 2e-6, resume from last checkpoint.
Mitigation (step 15 onward). Person C: train(..., learning_rate=2e-6, resume_from=Path("checkpoints/stage1_step_10")). Resume starts cleanly.
Rollback guardrail. Training continues for 20 steps. grad_norm stays finite. train/skipped_updates==0 rolling mean. R01 does NOT fire again.
If it had fired again: Person C would triage with context="mitigation_failed_round_1". Second-round failure promotes R01 to escalation per Β§3.3; user is paged; we consider downshifting to Gemma 3 4B per CLAUDE.md Β§11 first-responder case.
8.2 R05 β Reward hacking spike on R2 β probe β reward re-weighting
Setup. Hour 28 of onsite, Stage-2 (single-drift) training. train/R5_mean has floated near 0.0 for 170 steps. Then across steps 171β180, it drops to -0.31.
Detection (step 180). The training callback computes R5_mean_10 (10-step moving mean) = -0.31 AND checks train/hallucinated_field_count simultaneously (training.md Β§7d) β hallucinations spiked. train/reward_mean is flat (not dropping proportionally). RewardCollapseError is raised.
signal = TriggerSignal.R5_DROP_WITH_HACK_SPIKE
triage = Risk.triage(signal)
# β TriageResult(
# entry=R05,
# action="R2 requires specific field-name OR correct follow-up call; R5 penalizes bare assertions. "
# "Halt training; surface to Person B for probe inspection (training.md Β§7d).",
# escalate=True, hard_stop=False,
# log_line="R05 fired (step 180): R5_mean_10=-0.31; reward_mean flat; halt + probe"
# )
Probe (hour 28 + 30 min). Person B runs python3 eval/reward_hack_probe.py --episodes-from step_170_to_180.jsonl --top-k-suspicious 20. Output reveals: the agent learned to emit "drift detected on unknown_field" on EVERY turn, which scores R2 = +0.3 via the substring "drift" token match AND R5 = -0.3 for bare assertion β net 0.0, but the hallucinated R2 credit still feeds advantage variance and the model is "rewarded by gradient" even if the mean reward is unchanged.
Fix (hour 29). rewards.md Β§3.6 update: tighten R2 to require the drift-pattern's canonical error_code token (from vendors.md Β§5.2) as substring, not a generic "drift" token. Bump config_sha256.
Resume (hour 30). Person C resumes from the pre-regression checkpoint checkpoints/stage2_step_160 β NOT 170 β because weights from 160 onward already encoded the exploit pattern in latent gradient directions. Stage-2 restarts with updated rewards. Runs clean for 20 steps; R05 does not re-fire.
Book-keeping. The reward-hacking probe result is appended to reward_hacks.jsonl (DESIGN.md Β§13 deliverable #9). The blog post cites this incident as "one of three exploits caught by the probe, all fixed without silent re-weighting".
8.3 Stop-condition trigger β V100 down β₯ 8h
Setup. Hour 14 of onsite. V100 box drops SSH at hour 14:02. Root cause: data-center UPS maintenance neither DGX ops nor the hackathon organizers flagged.
Detection (continuous). Orchestrator watchdog ssh -o ConnectTimeout=5 v100-box echo ok fails. Signal V100_DOWN fires at hour 14:02.
triage = Risk.triage(TriggerSignal.V100_DOWN)
# β R01-adjacent entry, non-stop, log_only.
# RiskLog.append. Continue non-training work (Person B on test plans, Person D on Dockerfile).
Escalation (hour 14:02 β hour 22:02). Watchdog flips every 2 min. At wall-clock 8 hours elapsed without successful SSH, signal V100_DOWN_OVER_8H fires. CLAUDE.md Β§11 says: hard stop.
triage = Risk.triage(TriggerSignal.V100_DOWN_OVER_8H)
# β TriageResult(
# entry=R01-STOP variant (id="R01-STOP"),
# action="V100 unreachable > 8h β terminate plan; re-plan with user.",
# escalate=True, hard_stop=True,
# log_line="R01-STOP fired: V100 down 8h+; HARD_STOP"
# )
# Orchestrator:
# - halts all Agent dispatch immediately
# - SendMessage team-lead: "HARD_STOP R01-STOP: V100 down β₯ 8h. Plan must be re-scoped with user."
# - awaits user decision: (a) cancel onsite, (b) pivot to Colab T4 at reduced scale, (c) submit baseline-only.
# Person C preserves whatever checkpoint is on last-good-rsync to HF Hub.
Resolution. User decides: pivot to Colab T4 for Stage-1 only; skip Stage-3; record the incident in the blog. The blog-post honesty about this is, per DESIGN.md Β§14 #9-adjacent, an asset not a liability.
9. Open Questions
Should
Risk.triageauto-promote to ESCALATE on second-round mitigation failure, or stay manual? Current spec: manual (Β§7.3). Rationale: automated promotion has surprised us in prior incidents. But manual requires an attentive operator. DECISION during Batch D3 critic round: stay manual; revisit post-hackathon.Do we include supply-chain risks (PyPI malicious package)? Currently out of scope β we pin every dep in
requirements.txt, andpip-auditis a Phase C C1 task. If a malicious package lands during onsite, it becomes R99 (unknown). Not worth a dedicated entry for a 48h event.RiskLogpublic vs private. We writerisk_log.jsonlto the run directory. Do we push it to HF Hub as part of the training traces (DESIGN.md Β§13 #6)? Pros: full transparency for judges. Cons: the log names V100 incidents which is operationally sensitive. Proposed resolution: push with operator identifiers scrubbed toteam/A..Donly. Confirm with Person D before Phase C5.R14 threshold specifics. Currently "Stage 1 doesn't converge (R1 < 0.4 at step 100)". Is step 100 the right checkpoint, or should it be step 75? DESIGN.md Β§14 risk #4 says Stage 1 targets 100 steps total, so step 100 = end-of-stage β already too late to recover in-stage. Proposed:
STAGE1_R1_BELOW_0_4_AT_STEP_75is the actual signal, but we call it "at step 100" in prose to match CLAUDE.md Β§11. Confirm with Person C during their training callback implementation.Judge language (R12) beyond Hindi. Our mitigation is "English captions on every audio clip." What if the judge is fluent only in Mandarin and English? Captions are still English. Demo voice reply is Hindi. Acceptable? YES β English is the hackathon lingua franca; Mandarin judges on an India-first hackathon is low-probability. Not worth adding a second caption track.
10. The Register (data for Β§2.3 Risk.assess())
Fifteen entries. R01βR12 are DESIGN.md Β§14 verbatim, augmented with owner + trigger signal. R13βR15 are CLAUDE.md Β§11 adds.
R01 β V100 FP16 gradient instability (Gemma 4 is BF16-native)
- Probability: Med
- Impact: kills (training)
- Mitigation: Unsloth 4-bit QLoRA + FP16 autocast;
max_grad_norm=1.0; loss-scale monitored every 10 steps; fallback todtype=torch.float16explicit atFastModel.from_pretrained; if instability persists,learning_rate 5e-6 β 2e-6and resume from last checkpoint. - Owner: C (Training & Data)
- Trigger signal:
GRAD_NORM_INF(training callback observesgrad_normis inf or NaN; training.md Β§5NonFiniteGradientErrorafter 3 consecutive skips). - Stop condition: None (local mitigation). See R01-STOP for the > 8h variant.
- design_refs: DESIGN.md Β§14 #1, training.md Β§5, training.md Β§7a, Β§7c.
R01-STOP β V100 unavailable for more than 8 hours
- Probability: Low
- Impact: kills (training AND demo)
- Mitigation: HARD STOP. Preserve last-good checkpoint to HF Hub. Re-plan with user: (a) cancel onsite, (b) pivot to Colab T4 with reduced scope, (c) submit baseline-only.
- Owner: Orchestrator β user
- Trigger signal:
V100_DOWN_OVER_8H(wall-clock β₯ 8h since firstV100_DOWNevent, per CLAUDE.md Β§11). - Stop condition:
HARD_STOP. - design_refs: CLAUDE.md Β§11 (hard-stop #1).
R02 β TRL GRPOTrainer KL catastrophe
- Probability: Med
- Impact: kills (training)
- Mitigation: Pin TRL β₯ 0.23;
use_bias_correction_kl=True(invariant asserted inbuild_grpo_config);beta=0.04; training callback raisesKLDivergenceExplosionifpolicy_kl10-step mean > 10.0; halt + dump last 20 rollout groups todebug/kl_explosion_dump.jsonland escalate to user. - Owner: C (Training & Data)
- Trigger signal:
POLICY_KL_OVER_10(training.md Β§5KLDivergenceExplosion). - Stop condition:
ESCALATE_TO_USER(halt training; no silent recovery attempt; root-cause is almost always config-invariant violation). - design_refs: DESIGN.md Β§14 #2, training.md Β§5, training.md Β§7c, TRL issue #4637.
R03 β Whisper transcription errors on Hinglish code-mixing
- Probability: High
- Impact: med (noisy observation)
- Mitigation: Use
faster-whisper-smallwithlanguage="hi"; accept noise β it's realistic; score R3/R4 on semantic match not exact string. TTS/ASR is ONLY at env boundary (DESIGN.md Β§9.4), never in training loop β so noisy ASR cannot break gradient. Reflected inDriftCallObservation.last_confidence; agent mayCLARIFYto re-prompt. - Owner: C (Audio subsystem; env-side consumer is A)
- Trigger signal: observable as low
TranscriptResult.confidencein observation; not an exception. Does not fireRisk.triageunless confidence < 0.3 on β₯ 30% of episodes over a 50-episode window (indicating systemic degradation, not natural variance). - Stop condition: None.
- design_refs: DESIGN.md Β§14 #3, audio.md Β§5, audio.md Β§7.1, Β§7.4, rewards.md (semantic-match specification).
R04 β 200β500 GRPO steps too few for 3-stage curriculum
- Probability: Med
- Impact: med (Stage 3 undertrains)
- Mitigation: Compressed curriculum: Stage 1 β 100, Stage 2 β 200, Stage 3 β 100 (total 400). Prioritize Stage 2 depth. If time budget slips further, drop Stage 3 entirely and ship Stage-2-final as the LoRA β this is still better than untrained.
- Owner: C (Training & Data)
- Trigger signal: not a runtime signal per se; checked at each stage boundary via
train/stepvs plan. TriggersRisk.triageonly if Stage-3 has < 50 steps available at its start. - Stop condition: None.
- design_refs: DESIGN.md Β§14 #4, DESIGN.md Β§10.3, training.md Β§6.
R05 β Reward hacking on R2 (spam "drift detected!")
- Probability: High
- Impact: high (R2 collapses as a signal)
- Mitigation: R2 requires specific
error_codefield-name substring OR correct follow-up call (rewards.md Β§3.6); R5 penalizes bare assertions (-0.3); penalties stack additively to -1.0 floor. Runtime:RewardCollapseErrorraises onR5_mean_10 β€ -0.3ANDhallucinated_field_countspike ANDreward_meanflat β halt and run reward-hacking probe (DESIGN.md Β§13 deliverable #9); resume from pre-regression checkpoint, not the current one. - Owner: B (Rewards & Tests)
- Trigger signal:
R5_DROP_WITH_HACK_SPIKE(training.md Β§5RewardCollapseError). - Stop condition:
ESCALATE_TO_USER(halt + probe; does not auto-hard-stop, but requires B attention). - design_refs: DESIGN.md Β§14 #5, DESIGN.md Β§7.3, training.md Β§7d, rewards.md Β§3.6.
R06 β HF Space ZeroGPU quota exhausted
- Probability: Low
- Impact: med (demo degrades)
- Mitigation: $20 A10G budget reserved for paid fallback; secondary fallback
gradio share=Truelocally from the V100 training box after Stage-3 completes. Pre-generated demo audio means a static deck is the tertiary fallback. - Owner: D (Deploy & Story)
- Trigger signal: HF Space returns quota-exceeded HTTP; surfaced by deploy-demo-space.md health probe.
- Stop condition: None.
- design_refs: DESIGN.md Β§14 #6, deploy_demo_space.md Β§4.
R06-STOP β HF Hub / Spaces outage persisting > 2 hours
- Description: Hugging Face Hub or Spaces serving infrastructure is unreachable for > 2 hours during onsite; env Space / demo Space / Hub model pushes cannot be verified or redeployed.
- P: Low Β· I: kills
- Mitigation: pause all HF-dependent tasks; fall back to local
gradio share=Truedemo and local-only Docker testing; if not restored by onsite hour β2 (2 hours before pitch), orchestrator declaresHARD_STOPand team pivots to pre-recorded pitch video. - Owner: orchestrator
- Trigger signal:
HF_HUB_OUTAGE_OVER_2H - Stop condition:
HARD_STOP - Design refs:
CLAUDE.md Β§11 Hard-stop #2,DESIGN.md Β§13(deployment),deploy_env_space.md Β§9 OQ3
R07 β Indic Whisper quality too poor for live demo
- Probability: Med
- Impact: med (live demo weak)
- Mitigation: Fallback β English-only briefs for live demo; Indic briefs in recorded video. Decision point is hour 40 (after Stage-3 complete) based on whisper quality measured during final eval.
- Owner: D (Deploy & Story), decision co-signed by C.
- Trigger signal: final-eval Indic-cohort R4 < 0.3 (a proxy for "agent can't parse Indic transcripts reliably").
- Stop condition: None.
- design_refs: DESIGN.md Β§14 #7, audio.md Β§1.1, pitch_demo.md.
R08 β Kokoro Indic voice quality insufficient
- Probability: Low
- Impact: med (demo sounds bad)
- Mitigation: Pre-generate ALL demo audio offline with careful voice-pack selection; A/B each generated clip against AI4Bharat reference clips before shipping; if still poor, use AI4Bharat's TTS for demo-only clips (training is text-only so this has no training-side impact).
- Owner: D (Deploy & Story), audio pipeline owned by C.
- Trigger signal:
INDIC_VOICE_PACK_MISSING(audio.md Β§5) OR subjective quality fail at pre-demo rehearsal. - Stop condition: None.
- design_refs: DESIGN.md Β§14 #8, audio.md Β§4.3.1 (voice-pack fallback chain), audio.md Β§7.2.
R09 β Team member drops / sick
- Probability: Med
- Impact: high (β₯ 8h slip)
- Mitigation: Roles are additive by design β Person D covers A + env if A drops; Person C covers rewards if B drops; Person A covers demo if D drops. Plan survives 3-person execution. If 3-person execution cannot hit Phase D gate on time, escalate.
- Owner: Orchestrator (re-planning); affected teammate for hand-off.
- Trigger signal:
TEAM_MEMBER_DROP(self-reported via SendMessage to team-lead). - Stop condition:
ESCALATE_TO_USER. Promotes toHARD_STOPonly ifTEAM_3PERSON_BELOW_GATEalso fires. - design_refs: DESIGN.md Β§14 #9, CLAUDE.md Β§11 (hard-stop #3), CLAUDE.md Β§2.2.
R10 β Env Docker image too large for free CPU tier
- Probability: Low
- Impact: med (env Space fails to deploy)
- Mitigation: Trim Whisper/Kokoro models to int8; use Alpine base (or
python:3.11-slimwith aggressive layer pruning); target < 2 GB total (audio.md Β§6.4 already budgets ~450 MB for weights). Measured atdocker buildviadocker imagesβ image size. - Owner: D (Deploy & Story)
- Trigger signal:
DOCKER_IMAGE_OVER_2GBβ Dockerfile build prints image size; CI lint fails if > 2 GB. - Stop condition: None.
- design_refs: DESIGN.md Β§14 #10, deploy_env_space.md, audio.md Β§6.4.
R11 β openenv validate fails on our spec
- Probability: Med
- Impact: high (disqualification risk)
- Mitigation: Validate early β pre-onsite hour 16 gate (CLAUDE.md Β§8 smoke-test checklist expansion). Keep
openenv's known-good example envs side-by-side for diffing. After 3 fix attempts without success, escalate to user for schema-deviation approval. - Owner: D (Deploy & Story), env schema owned by A.
- Trigger signal:
OPENENV_VALIDATE_FAILafter 3 fix attempts. - Stop condition:
ESCALATE_TO_USER(CLAUDE.md Β§11 escalate #2). - design_refs: DESIGN.md Β§14 #11, CLAUDE.md Β§11 (escalate #2), env.md.
R12 β Judge doesn't speak Hindi / Indic, misses the nuance
- Probability: Med
- Impact: med (weaker 30% score component)
- Mitigation: Pitch deck has English captions on every audio clip; demo UI auto-translates agent replies to English; pitch script explicitly calls out schema-drift which is language-agnostic as the primary technical contribution (Indic is the substrate, drift is the innovation).
- Owner: D (Deploy & Story)
- Trigger signal:
JUDGE_LANGUAGE_MISMATCHβ observed at booth (not automated). - Stop condition: None (design-time mitigation, not runtime).
- design_refs: DESIGN.md Β§14 #12, pitch_demo.md, DESIGN.md Β§15.
R13 β Gemma 3n E2B smoke test fails on V100 (NEW per CLAUDE.md Β§11)
- Probability: Med
- Impact: kills (block; may need downshift to Gemma 3 4B)
- Mitigation: Pre-onsite smoke test per DESIGN.md Β§16.A.1 β run BEFORE Batch D1 kickoff (CLAUDE.md Β§8 checklist). If fails: diagnose via
nvidia-smi, CUDA driver version, Unsloth pin. Two fallbacks in priority order: (i) downshift tounsloth/gemma-3-4b-it-bnb-4bitβ same GRPO pipeline, known-working on V100; (ii) useunsloth/gemma-2-2b-it-bnb-4bitβ smaller, older, very stable. Either downshift requires updating DESIGN.md Β§0 model ref AND escalation to user (architecture-level decision). - Owner: C (Training & Data); escalation via Orchestrator.
- Trigger signal:
GEMMA4_SMOKE_FAIL. - Stop condition:
ESCALATE_TO_USER(CLAUDE.md Β§11 escalate #1). Hard-stop only if all three candidates (E2B, 3-4B, 2-2B) fail β then hardware is the problem. - design_refs: CLAUDE.md Β§11 (escalate #1), DESIGN.md Β§16.A.1, training.md Β§6.3.
R14 β Stage 1 training doesn't converge (NEW per CLAUDE.md Β§11)
- Probability: Med
- Impact: high (reward/curriculum redesign required; Stage 2/3 blocked)
- Mitigation: Detect at step 100 (end-of-Stage-1): if
eval/R1 < 0.4on the 50-episode held-out set, halt. Diagnose via per-reward breakdown: is R1 flat (tool-use not learned) or R4 flat (format not learned)? Fix candidates: (i) raiselearning_rate5e-6 β 1e-5 for Stage 1 only (Stage 2/3 stay at 5e-6); (ii) over-weight R4 during Stage 1 only (rewards.md temporary override); (iii) extend Stage 1 from 100 β 150 steps by borrowing from Stage 3 budget. All three are reversible; pick one, re-run Stage 1. - Owner: C (Training & Data), reward-side decisions co-signed by B.
- Trigger signal:
STAGE1_R1_BELOW_0_4_AT_100. - Stop condition:
ESCALATE_TO_USER(CLAUDE.md Β§11 escalate #3). - design_refs: CLAUDE.md Β§11 (escalate #3), training.md Β§6, DESIGN.md Β§10.3.
R15 β Merge conflict across owned files β ownership violation (NEW per CLAUDE.md Β§11)
- Probability: Low
- Impact: med (short slip; trust hit)
- Mitigation: First rule β orchestrator resolves merge conflicts, never delegates to an agent (CLAUDE.md Β§5.4). If a conflict touches both A-owned (env/vendors) and B-owned (rewards/tests) files, the ownership was wrong somewhere. Orchestrator: (i) halt both agents, (ii) identify the overlap file, (iii) reassign that file to a single owner, (iv) update CLAUDE.md Β§2.2 if the reassignment is permanent, (v) restart the affected batch.
- Owner: Orchestrator.
- Trigger signal:
MERGE_CONFLICT_CROSS_OWNER. - Stop condition:
ESCALATE_TO_USER(process-level; CLAUDE.md Β§11 escalate #5). Never hard-stop β this is a process bug, not a technical one. - design_refs: CLAUDE.md Β§11 (escalate #5), CLAUDE.md Β§5.4, CLAUDE.md Β§2.2.
R99 β Unknown-class risk (not in register)
- Probability: Low (by construction β the register aims to be exhaustive)
- Impact: unknown (treated as high until classified)
- Mitigation:
Risk.triage(UNKNOWN)routes here. Log line torisk_log.jsonl; SendMessage team-lead; human classifies within 15 min: (a) add a new entry to this doc (post-hackathon PR) and use R99's mitigation template for now; (b) map to an existing entry if the signal was mislabeled. - Owner: Orchestrator β team lead.
- Trigger signal:
UNKNOWNor any signal not matched by R01βR15. - Stop condition:
ESCALATE_TO_USER(default-pessimistic). - design_refs: CLAUDE.md Β§11 (catch-all), Β§7.2 of this doc.
11. Live additions (populated during the 48h)
Append-only notes about risks discovered after hour 0 of onsite. These go here NOT into Risk.assess() because the register's length is pinned in tests. Post-hackathon PR will promote these into numbered entries.
(empty at v1.0)
12. Cross-doc consistency table
| Risk | DESIGN.md Β§14 # | CLAUDE.md Β§11 | training.md Β§5 | vendors.md Β§5 | audio.md Β§5 | rewards.md |
|---|---|---|---|---|---|---|
| R01 | #1 | β | NonFiniteGradientError |
β | β | β |
| R01-STOP | β | hard-stop #1 | β | β | β | β |
| R02 | #2 | β | KLDivergenceExplosion |
β | β | β |
| R03 | #3 | β | β | β | confidence field | Β§3.6 semantic match |
| R04 | #4 | β | β | β | β | β |
| R05 | #5 | β | RewardCollapseError |
β | β | Β§3.6 |
| R06 | #6 | β | β | β | β | β |
| R06-STOP | β | hard-stop #2 | β | β | β | β |
| R07 | #7 | β | β | β | β | β |
| R08 | #8 | β | β | β | INDIC_VOICE_PACK_MISSING |
β |
| R09 | #9 | hard-stop #3 | β | β | β | β |
| R10 | #10 | β | β | β | Β§6.4 budget | β |
| R11 | #11 | escalate #2 | β | β | β | β |
| R12 | #12 | β | β | β | β | β |
| R13 | β | escalate #1 | β | β | β | β |
| R14 | β | escalate #3 | β | β | β | Β§3 (curriculum impact) |
| R15 | β | escalate #5 | β | β | β | β |
| R99 | β | catch-all | β | β | β | β |
Every cell in this table corresponds to an actual citation in the source doc. If a cell is filled but the cited section does not actually describe that risk β spec-drift bug β fix the cite.