coredipper commited on
Commit
800bb0a
·
verified ·
1 Parent(s): 5bb600f

Initial deploy: Escalation Lab

Browse files
Files changed (4) hide show
  1. README.md +33 -5
  2. __pycache__/app.cpython-311.pyc +0 -0
  3. app.py +301 -0
  4. requirements.txt +3 -0
README.md CHANGED
@@ -1,12 +1,40 @@
1
  ---
2
  title: Operon Escalation Lab
3
- emoji: 🐢
4
- colorFrom: purple
5
- colorTo: purple
6
  sdk: gradio
7
- sdk_version: 6.12.0
8
  app_file: app.py
9
  pinned: false
 
 
10
  ---
11
 
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  title: Operon Escalation Lab
3
+ emoji: "\U0001F9EA"
4
+ colorFrom: yellow
5
+ colorTo: red
6
  sdk: gradio
7
+ sdk_version: "6.5.1"
8
  app_file: app.py
9
  pinned: false
10
+ license: mit
11
+ short_description: Quality-based model escalation demo
12
  ---
13
 
14
+ # Operon Escalation Lab
15
+
16
+ Explore **quality-based escalation**: the VerifierComponent (adaptive immunity) scores each stage's output against a rubric, and the WatcherComponent (innate immunity) escalates from the fast model to the deep model when quality drops below threshold.
17
+
18
+ ## What to Try
19
+
20
+ 1. Click **Run** with "Shallow bug fix" -- the fast model scores 0.25 (below 0.50 threshold), triggering escalation to the deep model.
21
+ 2. Try "Adequate response" -- the fast model scores 0.85 (above threshold), so no escalation occurs.
22
+ 3. Adjust the **Quality Threshold** slider to see how changing the threshold affects escalation behavior.
23
+ 4. Try "Vague summary" with different thresholds to find the tipping point.
24
+
25
+ ## How It Works
26
+
27
+ 1. **VerifierComponent** evaluates output quality via a rubric function (0.0-1.0)
28
+ 2. If quality < threshold, it emits a `WatcherSignal(category=EPISTEMIC, source="verifier")`
29
+ 3. **WatcherComponent** detects the low-quality signal on the fast model
30
+ 4. Watcher decides to **ESCALATE** -- re-runs the stage with the deep nucleus
31
+ 5. Final output comes from the deep model
32
+
33
+ ## Biological Analogy
34
+
35
+ - **Innate immunity** (WatcherComponent): generic anomaly detection via baseline deviations
36
+ - **Adaptive immunity** (VerifierComponent): specific quality assessment via rubric, like B-cells producing antibodies tailored to an antigen
37
+
38
+ ## Learn More
39
+
40
+ [GitHub](https://github.com/coredipper/operon) | [PyPI](https://pypi.org/project/operon-ai/) | [Paper](https://github.com/coredipper/operon/tree/main/article)
__pycache__/app.cpython-311.pyc ADDED
Binary file (13.8 kB). View file
 
app.py ADDED
@@ -0,0 +1,301 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Operon Escalation Lab -- Quality-Based Model Escalation
3
+ ========================================================
4
+
5
+ Interactive demo of the adaptive immune layer:
6
+ VerifierComponent evaluates output quality via a rubric, and
7
+ WatcherComponent escalates from fast -> deep model when quality
8
+ falls below threshold.
9
+
10
+ Run locally: pip install gradio && python space-escalation-lab/app.py
11
+ """
12
+
13
+ import sys
14
+ from dataclasses import dataclass
15
+ from pathlib import Path
16
+
17
+ import gradio as gr
18
+
19
+ _repo_root = Path(__file__).resolve().parents[2]
20
+ if str(_repo_root) not in sys.path:
21
+ sys.path.insert(0, str(_repo_root))
22
+
23
+ from operon_ai import ATP_Store, MockProvider, Nucleus, SkillStage, skill_organism
24
+ from operon_ai.patterns.verifier import VerifierComponent, VerifierConfig
25
+ from operon_ai.patterns.watcher import WatcherComponent, WatcherConfig
26
+
27
+ # ---------------------------------------------------------------------------
28
+ # Scenario definitions
29
+ # ---------------------------------------------------------------------------
30
+
31
+ @dataclass
32
+ class Scenario:
33
+ name: str
34
+ task: str
35
+ fast_response: str
36
+ deep_response: str
37
+ fast_quality: float # expected quality of fast response
38
+ description: str
39
+
40
+
41
+ SCENARIOS = {
42
+ "Shallow bug fix": Scenario(
43
+ name="Shallow bug fix",
44
+ task="Fix the login crash after session timeout",
45
+ fast_response="Add try/except around the login call.",
46
+ deep_response=(
47
+ "Root-cause analysis: the session token is not refreshed on 401 "
48
+ "retry. Fix: add token refresh in the retry interceptor with "
49
+ "exponential backoff. Added regression test for expired-token path."
50
+ ),
51
+ fast_quality=0.25,
52
+ description="Fast model produces a shallow patch; deep model finds root cause.",
53
+ ),
54
+ "Vague summary": Scenario(
55
+ name="Vague summary",
56
+ task="Summarize the Q3 performance report",
57
+ fast_response="Performance was good in Q3.",
58
+ deep_response=(
59
+ "Q3 highlights: revenue up 12% YoY driven by enterprise segment "
60
+ "(+23%). Churn decreased from 4.1% to 3.2% after onboarding "
61
+ "redesign. Two risks: APAC pipeline softening (-8%) and delayed "
62
+ "SOC2 certification (ETA pushed to Q4)."
63
+ ),
64
+ fast_quality=0.15,
65
+ description="Fast model gives a vague one-liner; deep model gives structured detail.",
66
+ ),
67
+ "Adequate response": Scenario(
68
+ name="Adequate response",
69
+ task="List the three main HTTP status code categories",
70
+ fast_response=(
71
+ "1xx Informational, 2xx Success, 3xx Redirection, 4xx Client Error, "
72
+ "5xx Server Error. The three main categories are 2xx, 4xx, and 5xx."
73
+ ),
74
+ deep_response=(
75
+ "The three main HTTP status code categories are 2xx (Success), "
76
+ "4xx (Client Error), and 5xx (Server Error)."
77
+ ),
78
+ fast_quality=0.85,
79
+ description="Fast model gives a good enough answer. No escalation expected.",
80
+ ),
81
+ }
82
+
83
+ # ---------------------------------------------------------------------------
84
+ # Core logic
85
+ # ---------------------------------------------------------------------------
86
+
87
+ def _badge(text, color):
88
+ return (f'<span style="background:{color};color:white;padding:3px 10px;'
89
+ f'border-radius:4px;font-size:0.85em;font-weight:600;">{text}</span>')
90
+
91
+
92
+ def _card(title, content, border_color="#e5e7eb"):
93
+ return (
94
+ f'<div style="border:2px solid {border_color};border-radius:8px;'
95
+ f'margin-bottom:12px;overflow:hidden;">'
96
+ f'<div style="padding:8px 14px;background:{border_color}15;'
97
+ f'border-bottom:1px solid {border_color};">'
98
+ f'<span style="font-weight:700;">{title}</span></div>'
99
+ f'<div style="padding:12px 14px;">{content}</div></div>'
100
+ )
101
+
102
+
103
+ def run_escalation(scenario_name, threshold):
104
+ scenario = SCENARIOS.get(scenario_name)
105
+ if scenario is None:
106
+ return "<p>Select a scenario.</p>"
107
+
108
+ threshold = float(threshold)
109
+
110
+ # Build rubric that scores based on output length + specificity
111
+ def rubric(output: str, stage_name: str) -> float:
112
+ if stage_name != "respond":
113
+ return 0.8
114
+ if output == scenario.fast_response:
115
+ return scenario.fast_quality
116
+ return 0.95 # deep response always scores high
117
+
118
+ # Build organism
119
+ fast = Nucleus(provider=MockProvider(responses={
120
+ "respond": scenario.fast_response,
121
+ }))
122
+ deep = Nucleus(provider=MockProvider(responses={
123
+ "respond": scenario.deep_response,
124
+ }))
125
+
126
+ watcher = WatcherComponent(config=WatcherConfig())
127
+ verifier = VerifierComponent(
128
+ rubric=rubric,
129
+ config=VerifierConfig(quality_low_threshold=threshold),
130
+ )
131
+
132
+ org = skill_organism(
133
+ stages=[
134
+ SkillStage(
135
+ name="respond",
136
+ role="Responder",
137
+ instructions="Respond to the task.",
138
+ mode="fixed",
139
+ ),
140
+ ],
141
+ fast_nucleus=fast,
142
+ deep_nucleus=deep,
143
+ budget=ATP_Store(budget=1000, silent=True),
144
+ components=[watcher, verifier],
145
+ )
146
+
147
+ result = org.run(scenario.task)
148
+
149
+ # Collect results
150
+ escalated = any(
151
+ i.kind.value == "escalate" for i in watcher.interventions
152
+ )
153
+ fix_scores = [(s, q) for s, q in verifier.quality_scores if s == "respond"]
154
+ initial_quality = fix_scores[0][1] if fix_scores else 0.0
155
+
156
+ verifier_signals = [s for s in watcher.signals if s.source == "verifier"]
157
+
158
+ # Build HTML output
159
+ html_parts = []
160
+
161
+ # Scenario info
162
+ html_parts.append(_card(
163
+ f"Scenario: {scenario.name}",
164
+ f'<p style="color:#6b7280;">{scenario.description}</p>'
165
+ f'<p><b>Task:</b> {scenario.task}</p>'
166
+ f'<p><b>Threshold:</b> {threshold:.2f}</p>',
167
+ "#6366f1",
168
+ ))
169
+
170
+ # Fast model output
171
+ fast_badge = _badge(f"quality: {initial_quality:.2f}",
172
+ "#ef4444" if initial_quality < threshold else "#22c55e")
173
+ below = initial_quality < threshold
174
+ html_parts.append(_card(
175
+ f"Fast Model Output {fast_badge}",
176
+ f'<p style="font-family:monospace;white-space:pre-wrap;">'
177
+ f'{scenario.fast_response}</p>'
178
+ f'<p style="margin-top:8px;color:#6b7280;">'
179
+ f'{"Below threshold" if below else "Above threshold"} '
180
+ f'({initial_quality:.2f} {"<" if below else ">="} {threshold:.2f})</p>',
181
+ "#ef4444" if below else "#22c55e",
182
+ ))
183
+
184
+ # Escalation decision
185
+ if escalated:
186
+ intv = watcher.interventions[0]
187
+ html_parts.append(_card(
188
+ f"Watcher Decision: {_badge('ESCALATE', '#f59e0b')}",
189
+ f'<p><b>Reason:</b> {intv.reason}</p>'
190
+ f'<p style="color:#6b7280;">Fast model quality ({initial_quality:.2f}) '
191
+ f'fell below threshold ({threshold:.2f}). '
192
+ f'Watcher escalated to deep model.</p>',
193
+ "#f59e0b",
194
+ ))
195
+
196
+ html_parts.append(_card(
197
+ f"Deep Model Output {_badge('quality: 0.95', '#22c55e')}",
198
+ f'<p style="font-family:monospace;white-space:pre-wrap;">'
199
+ f'{scenario.deep_response}</p>',
200
+ "#22c55e",
201
+ ))
202
+ else:
203
+ html_parts.append(_card(
204
+ f"Watcher Decision: {_badge('NO ESCALATION', '#22c55e')}",
205
+ f'<p>Quality ({initial_quality:.2f}) met threshold ({threshold:.2f}). '
206
+ f'Fast model output accepted.</p>',
207
+ "#22c55e",
208
+ ))
209
+
210
+ # Final output
211
+ final_badge = _badge("ESCALATED", "#f59e0b") if escalated else _badge("DIRECT", "#22c55e")
212
+ html_parts.append(_card(
213
+ f"Final Output {final_badge}",
214
+ f'<p style="font-family:monospace;white-space:pre-wrap;font-weight:600;">'
215
+ f'{result.final_output}</p>',
216
+ "#3b82f6",
217
+ ))
218
+
219
+ # Signal trace
220
+ sig_rows = ""
221
+ for sig in verifier_signals:
222
+ q = sig.detail.get("quality", 0)
223
+ bt = sig.detail.get("below_threshold", False)
224
+ status = _badge("BELOW", "#ef4444") if bt else _badge("OK", "#22c55e")
225
+ sig_rows += (
226
+ f'<tr style="border-bottom:1px solid #f3f4f6;">'
227
+ f'<td style="padding:4px 8px;">{sig.stage_name}</td>'
228
+ f'<td style="padding:4px 8px;">{q:.2f}</td>'
229
+ f'<td style="padding:4px 8px;">{sig.value:.2f}</td>'
230
+ f'<td style="padding:4px 8px;">{status}</td></tr>')
231
+
232
+ if sig_rows:
233
+ html_parts.append(_card(
234
+ "Signal Trace",
235
+ '<table style="width:100%;border-collapse:collapse;">'
236
+ '<tr style="border-bottom:2px solid #e5e7eb;color:#6b7280;">'
237
+ '<th style="text-align:left;padding:4px 8px;">Stage</th>'
238
+ '<th style="text-align:left;padding:4px 8px;">Quality</th>'
239
+ '<th style="text-align:left;padding:4px 8px;">Severity</th>'
240
+ '<th style="text-align:left;padding:4px 8px;">Status</th></tr>'
241
+ f'{sig_rows}</table>',
242
+ "#8b5cf6",
243
+ ))
244
+
245
+ return "\n".join(html_parts)
246
+
247
+
248
+ def load_scenario(name):
249
+ s = SCENARIOS.get(name)
250
+ if s:
251
+ return s.description
252
+ return ""
253
+
254
+
255
+ # ---------------------------------------------------------------------------
256
+ # Gradio UI
257
+ # ---------------------------------------------------------------------------
258
+
259
+ def build_app() -> gr.Blocks:
260
+ with gr.Blocks(title="Operon Escalation Lab") as app:
261
+ gr.Markdown(
262
+ "# Operon Escalation Lab\n"
263
+ "Explore **quality-based escalation**: the VerifierComponent scores "
264
+ "each stage's output, and the WatcherComponent escalates from the "
265
+ "fast model to the deep model when quality drops below threshold.\n\n"
266
+ "**Biological analogy:** Innate immunity (Watcher) detects generic anomalies. "
267
+ "Adaptive immunity (Verifier) evaluates against a specific rubric.\n\n"
268
+ "[GitHub](https://github.com/coredipper/operon) | "
269
+ "[Paper](https://github.com/coredipper/operon/tree/main/article)")
270
+
271
+ with gr.Row():
272
+ scenario_dd = gr.Dropdown(
273
+ choices=list(SCENARIOS.keys()),
274
+ value="Shallow bug fix",
275
+ label="Scenario", scale=2)
276
+ run_btn = gr.Button("Run", variant="primary", scale=1)
277
+
278
+ scenario_desc = gr.Markdown("Fast model produces a shallow patch; deep model finds root cause.")
279
+
280
+ threshold_slider = gr.Slider(
281
+ minimum=0.1, maximum=0.95, value=0.5, step=0.05,
282
+ label="Quality Threshold (below this = escalate)")
283
+
284
+ gr.Markdown("### Results")
285
+ results_output = gr.HTML()
286
+
287
+ run_btn.click(
288
+ fn=run_escalation,
289
+ inputs=[scenario_dd, threshold_slider],
290
+ outputs=[results_output])
291
+ scenario_dd.change(
292
+ fn=load_scenario,
293
+ inputs=[scenario_dd],
294
+ outputs=[scenario_desc])
295
+
296
+ return app
297
+
298
+
299
+ if __name__ == "__main__":
300
+ app = build_app()
301
+ app.launch(theme=gr.themes.Soft())
requirements.txt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ gradio>=4.0
2
+ operon-ai>=0.33.0
3
+ pydantic>=2.0