VibecoderMcSwaggins commited on
Commit
e5e44dc
·
1 Parent(s): a5b5479

docs(spec): Add regression prevention strategy with smoke tests

Browse files
Files changed (1) hide show
  1. SPEC_ARCHITECTURAL_DEBT.md +96 -0
SPEC_ARCHITECTURAL_DEBT.md CHANGED
@@ -226,6 +226,102 @@ class WorkflowState:
226
 
227
  ---
228
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
229
  ## Execution Strategy
230
 
231
  ### Phase 1: Current PR (REQUIRED)
 
226
 
227
  ---
228
 
229
+ ## Regression Prevention Strategy
230
+
231
+ **CRITICAL**: Each phase MUST pass smoke tests before merge. Unit tests alone are insufficient.
232
+
233
+ ### Smoke Test Infrastructure
234
+
235
+ Add to `Makefile`:
236
+ ```makefile
237
+ # Smoke tests - run against real APIs (slow, not for CI)
238
+ smoke-free:
239
+ @echo "Running Free Tier smoke test..."
240
+ uv run python -m pytest tests/e2e/test_smoke.py::test_free_tier_synthesis -v -s --timeout=600
241
+
242
+ smoke-paid:
243
+ @echo "Running Paid Tier smoke test (requires OPENAI_API_KEY)..."
244
+ uv run python -m pytest tests/e2e/test_smoke.py::test_paid_tier_synthesis -v -s --timeout=300
245
+
246
+ smoke: smoke-free # Default to free tier
247
+ ```
248
+
249
+ ### Smoke Test Implementation
250
+
251
+ Create `tests/e2e/test_smoke.py`:
252
+ ```python
253
+ """
254
+ Smoke tests for regression prevention.
255
+
256
+ These tests run against REAL APIs and verify end-to-end functionality.
257
+ They are slow (2-5 minutes) and should NOT run in CI.
258
+
259
+ Usage:
260
+ make smoke-free # Test Free Tier (HuggingFace)
261
+ make smoke-paid # Test Paid Tier (OpenAI BYOK)
262
+ """
263
+ import pytest
264
+ from src.orchestrators.advanced import AdvancedOrchestrator
265
+
266
+ @pytest.mark.e2e
267
+ @pytest.mark.timeout(600) # 10 minute timeout for Free Tier
268
+ async def test_free_tier_synthesis():
269
+ """Verify Free Tier produces actual synthesis (not just 'Research complete.')"""
270
+ orch = AdvancedOrchestrator(max_rounds=2)
271
+
272
+ events = []
273
+ async for event in orch.run("What is libido?"):
274
+ events.append(event)
275
+
276
+ # MUST have a complete event
277
+ complete_events = [e for e in events if e.type == "complete"]
278
+ assert len(complete_events) >= 1, "No complete event received"
279
+
280
+ # Complete event MUST have substantive content (not just signal)
281
+ final = complete_events[-1]
282
+ assert len(final.message) > 100, f"Synthesis too short: {len(final.message)} chars"
283
+ assert "Research complete." not in final.message or len(final.message) > 50, \
284
+ "Got empty synthesis signal instead of actual report"
285
+
286
+ # Should NOT have duplicate content
287
+ messages = [e.message for e in events if e.message]
288
+ # Check for exact duplicates of long content
289
+ long_messages = [m for m in messages if len(m) > 200]
290
+ assert len(long_messages) == len(set(long_messages)), "Duplicate content detected"
291
+
292
+ @pytest.mark.e2e
293
+ @pytest.mark.timeout(300) # 5 minute timeout for Paid Tier
294
+ async def test_paid_tier_synthesis():
295
+ """Verify Paid Tier (BYOK) produces synthesis."""
296
+ import os
297
+ api_key = os.environ.get("OPENAI_API_KEY")
298
+ if not api_key:
299
+ pytest.skip("OPENAI_API_KEY not set")
300
+
301
+ orch = AdvancedOrchestrator(max_rounds=2, api_key=api_key)
302
+
303
+ events = []
304
+ async for event in orch.run("What is libido?"):
305
+ events.append(event)
306
+
307
+ complete_events = [e for e in events if e.type == "complete"]
308
+ assert len(complete_events) >= 1
309
+ assert len(complete_events[-1].message) > 100
310
+ ```
311
+
312
+ ### Phase Gate Checklist
313
+
314
+ Before merging ANY refactoring PR:
315
+
316
+ ```
317
+ [ ] make check # All 318+ unit tests pass
318
+ [ ] make smoke-free # Free Tier produces real synthesis
319
+ [ ] make smoke-paid # Paid Tier works (if you have key)
320
+ [ ] CodeRabbit approved # No blocking issues
321
+ ```
322
+
323
+ ---
324
+
325
  ## Execution Strategy
326
 
327
  ### Phase 1: Current PR (REQUIRED)