File size: 12,240 Bytes
7f33a54
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
# SentinelOps Arena -- Master Improvement Plan

**Created:** Sunday March 8, 2026
**Goal:** Maximize hackathon judging score with surgical code fixes

---

## Priority Legend

| Score | Meaning |
|-------|---------|
| 10 | Must fix -- breaks core functionality or judges will reject |
| 8-9 | High impact -- judges will directly notice and reward |
| 5-7 | Noticeable improvement -- strengthens the demo |
| 1-4 | Low impact -- skip unless time permits |

---

## CRITICAL FIXES (Bugs that break core functionality)

### FIX-1: Billing `issue_refund()` never checks `window_ticks` [Priority: 10]

**Bug:** `billing.py:issue_refund()` checks `max_amount` and `requires_approval` but NEVER checks `window_ticks`. Policy drift attacks that change `window_ticks` have zero effect on refund validation. This means 1/3 of policy drift parameters is dead code.

**File:** `sentinelops_arena/systems/billing.py` (lines 47-89)
**Change:** Add window_ticks validation. The invoice has `date_tick` and the environment tracks the current tick. Pass `current_tick` into `issue_refund()` and compare `current_tick - invoice["date_tick"]` against `self.refund_policy.window_ticks`.
**Impact:** Policy drift attacks now meaningfully change refund behavior. Patronus AI judges (schema/policy drift track) will directly verify this works.
**Lines of code:** ~10 lines in billing.py, ~3 lines in environment.py to pass current_tick

**Details:**
- Add `current_tick: int` parameter to `issue_refund()`
- After the existing checks, add:
  ```python
  ticks_since_invoice = current_tick - invoice.get("date_tick", 0)
  if ticks_since_invoice > self.refund_policy.window_ticks:
      return {"error": f"Refund window expired. Invoice is {ticks_since_invoice} ticks old, policy allows {self.refund_policy.window_ticks}"}
  ```
- Update environment.py `_execute_worker_action` to pass `self.tick`
- Update the MCP tool `issue_refund` to pass `self.tick`

---

### FIX-2: CRM and Ticketing have no rate limiting support [Priority: 8]

**Bug:** `attacks.py:_execute_rate_limit()` calls `system.set_rate_limit()`, but only `BillingSystem` implements it. `CRMSystem` and `TicketingSystem` have no `set_rate_limit`, `_rate_limit`, `_call_count`, or `_rate_limit_check()`. The attack manager already checks `hasattr(system, "set_rate_limit")` and returns error, but the attacker can still target CRM/ticketing and waste budget.

**File:** `sentinelops_arena/systems/crm.py`, `sentinelops_arena/systems/ticketing.py`
**Change:** Add rate limiting to CRM and Ticketing, mirroring BillingSystem's implementation.
**Impact:** Rate limit attacks now work on all 3 systems. The `_is_rate_limited()` check in environment.py (line 601-606) already handles this via `hasattr(system, "_rate_limit")`, so once the attribute exists, rate limiting shows up in the dashboard.
**Lines of code:** ~20 lines per system (copy from billing.py pattern)

**Details for each system (CRM + Ticketing):**
- Add `self._rate_limit: int = 0` and `self._call_count: int = 0` to `__init__`
- Add `_rate_limit_check()` method (copy from billing.py)
- Add `set_rate_limit()` method (copy from billing.py)
- Add `reset_rate_limit_counter()` method (copy from billing.py)
- Add `if self._rate_limit_check(): return {"error": "Rate limit exceeded."}` to `lookup_customer`, `update_tier`, `add_note`, `get_history`, and for ticketing: `create_ticket`, `assign_ticket`, `escalate`, `resolve`, `check_sla`
- Update environment.py to reset CRM and ticketing counters each tick (add to the tick-advance block at line 346)

---

### FIX-3: Schema drift renames target non-existent fields [Priority: 7]

**Bug:** `SCHEMA_DRIFT_RENAMES` in `demo.py` includes `{"old_field": "email", ...}`, `{"old_field": "address", ...}`, `{"old_field": "phone", ...}`, `{"old_field": "id", ...}`. But the Customer model has fields: `customer_id`, `name`, `tier`, `region`, `contact_email`, `lifetime_value`, `notes`. Only `name -> full_name` actually works. The others silently do nothing because the fields don't exist.

**File:** `sentinelops_arena/demo.py` (lines 125-131)
**Change:** Fix renames to use actual Customer model field names.
**Lines of code:** ~5 lines

**New renames:**
```python
SCHEMA_DRIFT_RENAMES = [
    {"old_field": "name", "new_field": "full_name"},
    {"old_field": "contact_email", "new_field": "email_address"},
    {"old_field": "region", "new_field": "territory"},
    {"old_field": "tier", "new_field": "membership_level"},
    {"old_field": "lifetime_value", "new_field": "total_spend"},
]
```

Also fix in `train.py` (lines 311-312) which has the same bad renames.

---

### FIX-4: `tasks_completed` count is always 0 [Priority: 5]

**Bug:** `environment.py` line 356-360 counts `tasks_completed` by checking `t.get("task_completed")` in trajectory entries, but no trajectory entry ever sets `task_completed = True`. The trajectory append at line 329-336 only stores `tick`, `agent`, `action_type`, `reward`.

**File:** `sentinelops_arena/environment.py` (lines 329-336, 356-360)
**Change:** Add `task_completed` flag to trajectory entries when worker successfully completes a task.
**Lines of code:** ~3 lines

**Details:**
- In the trajectory append, add `"task_completed": (action.agent == AgentRole.WORKER and self.last_worker_result and self.last_worker_result.get("success", False))`
- Or simpler: after computing worker reward, if result["success"], set a flag on the trajectory entry

---

## HIGH-IMPACT IMPROVEMENTS (Things judges will notice/reward)

### IMP-1: Apply Gradio theme in `gr.Blocks()` constructor [Priority: 9]

**Bug:** The `SentinelTheme()` and `CUSTOM_CSS` are passed to `demo.launch()` but NOT to `gr.Blocks()`. On HuggingFace Spaces, `launch()` args may be ignored. The theme must be in the constructor.

**File:** `app.py` (line 124, line 444-448)
**Change:** Move theme and css into `gr.Blocks()`:
```python
with gr.Blocks(title="SentinelOps Arena", fill_width=True, theme=SentinelTheme(), css=CUSTOM_CSS) as demo:
```
Remove duplicate theme/css from `demo.launch()`.
**Lines of code:** 2 lines

---

### IMP-2: Worker heuristic should complete multi-step tasks [Priority: 8]

**Bug:** The trained `HeuristicWorker._trained_act()` in `demo.py` checks policy for refund tasks but NEVER actually issues the refund. It just calls `get_current_policy` every time it sees a refund task. Same for untrained worker -- it issues refund but never checks CRM first.

**File:** `sentinelops_arena/demo.py` (lines 253-299)
**Change:** Add state tracking to HeuristicWorker so it can complete multi-step flows:
1. First encounter of refund task: call `get_current_policy`
2. Second encounter (same task): call `issue_refund` with validated params

This will make the trained vs untrained comparison dramatically more interesting in the demo.

**Lines of code:** ~25 lines

**Details:**
- Add `self._last_task_id` and `self._policy_checked` state to HeuristicWorker
- Trained flow: refund task first seen -> get_current_policy, refund task second time -> issue_refund with compliant params
- This creates visible "adaptive behavior" in the replay -- exactly what judges want to see

---

### IMP-3: Improve explanation quality metric [Priority: 6]

**Bug:** `environment.py` line 441: `explanation_quality = min(len(explanation) / 100.0, 1.0)` -- quality is just string length. A 100+ character explanation always gets max quality regardless of content.

**File:** `sentinelops_arena/environment.py` (line 441)
**Change:** Add keyword detection alongside length. Check if explanation mentions relevant terms (policy, schema, drift, social engineering, violation, refund, etc.).
**Lines of code:** ~8 lines

```python
keywords = ["policy", "schema", "drift", "violation", "social", "engineering",
            "refund", "unauthorized", "error", "compliance"]
keyword_matches = sum(1 for k in keywords if k in explanation.lower())
length_score = min(len(explanation) / 100.0, 0.5)
keyword_score = min(keyword_matches / 3.0, 0.5)
explanation_quality = length_score + keyword_score
```

---

## QUICK WINS (Small effort, visible improvement)

### QW-1: Fix HF Spaces requirements [Priority: 9]

**File:** `requirements.txt`
**Change:** Ensure `pandas>=2.0` is listed. Verify gradio version consistency.
**Lines of code:** 1-2 lines

---

### QW-2: Fix version claims in SENTINELOPS_ARENA.md [Priority: 4]

**Bug:** Spec says "80 ticks" and "OpenEnv 0.4" but code uses 30 ticks and OpenEnv 0.2.x.
**Action:** SKIP -- spec docs are aspirational. Judges who read code will see it works. Not worth the time.

---

### QW-3: Clean up hackathon_env/ vestigial directory [Priority: 3]

**File:** `.gitignore` or delete `hackathon_env/`
**Action:** SKIP unless doing final cleanup -- judges won't look here.

---

## SKIP LIST (Not worth the time)

| Item | Why Skip |
|------|----------|
| Compound attacks | 2+ hours, spec feature not in code |
| Compliance drift | New attack type, 1+ hour to implement and test |
| A2A protocol | Already marked "Cut" in spec, correct decision |
| Docker support | HF Spaces uses Gradio SDK |
| SLA breach detection | Needs rework of ticketing + reward pipeline |
| MCP-X gateway | MCP tools work inline, gateway is polish |
| Full GRPO convergence | Training pipeline exists, convergence not needed |

---

## IMPLEMENTATION ORDER

Execute in this exact order to maximize impact per minute:

| # | Item | Est. Time | Impact |
|---|------|-----------|--------|
| 1 | **IMP-1**: Theme in gr.Blocks constructor | 2 min | HF Spaces theme works |
| 2 | **QW-1**: Fix requirements.txt | 2 min | HF Spaces doesn't crash |
| 3 | **FIX-1**: window_ticks enforcement in billing | 10 min | Policy drift attacks work (Patronus AI track) |
| 4 | **FIX-3**: Fix schema drift renames | 5 min | Schema drift attacks work (Patronus AI track) |
| 5 | **FIX-2**: Rate limiting for CRM + Ticketing | 15 min | All attacks work on all systems |
| 6 | **FIX-4**: tasks_completed tracking | 3 min | Dashboard shows correct count |
| 7 | **IMP-2**: Worker multi-step task completion | 15 min | Demo shows real adaptive behavior |
| 8 | **IMP-3**: Better explanation quality metric | 5 min | Oversight agent more realistic |

**Total estimated time: ~57 minutes**

---

## JUDGE-SPECIFIC IMPACT ANALYSIS

### Patronus AI (Darshan Deshpande) -- Schema Drift Track ($10K)
- **FIX-1** makes policy drift mechanically functional (window_ticks enforced)
- **FIX-3** makes schema drift renames target real fields (attacks actually break things)
- **IMP-2** shows worker adapting to drift in multi-step tasks
- Combined: these 3 fixes transform "drift is mentioned" into "drift demonstrably works"

### Fleet AI (Nicolai Ouporov) -- Scalable Oversight Track ($10K)
- **IMP-3** gives oversight meaningful explanation quality scoring (not just string length)
- **IMP-2** creates real violations for oversight to catch (multi-step tasks that can fail)
- **FIX-4** shows accurate task completion stats in dashboard

### Daniel Han (Unsloth) -- Training Pipeline
- The training pipeline in `train.py` is already solid
- Fixes to the environment make the reward signals more meaningful
- GRPO reward functions already correctly shaped

### Sanyam Bhutani (Meta) -- OpenEnv Quality
- **FIX-1 + FIX-2** demonstrate environment integrity (attacks have real effects)
- Clean MCP tool exposure with 19 tools already impressive
- Environment reset/step/state cycle works correctly

### Benjamin Burtenshaw (HuggingFace) -- Hub Deployment
- **IMP-1** + **QW-1** ensure HF Spaces deployment works correctly with theme
- Gradio 6 native plots and custom theme are impressive

---

## WHAT NOT TO TOUCH

1. **rewards.py** -- Reward functions are clean and match spec tables. Do not modify.
2. **models.py** -- Pydantic models are correct. Do not add fields unless required by a fix.
3. **task_generator.py** -- Works fine, generates correct task mix.
4. **sentinel_theme.py** -- Theme is polished. Do not tweak CSS.
5. **replay_html.py** -- HTML rendering works. Do not modify.
6. **chart_helpers.py** -- Chart data builders work. Do not modify.
7. **metrics.py** -- Security metrics computation is solid.
8. **train.py** -- Only touch to fix schema_drift renames in the heuristic attacker configs.