Spaces:

VibecoderMcSwaggins
/

DeepBoner

Paused

VibecoderMcSwaggins commited on 25 days ago

Commit

10e320d

1 Parent(s): b1d094d

docs: fix spec half-measures (line refs, test counts, acceptance criteria)

SPEC_01:
- Line reference: 94 → 111 (actual location)
- Acceptance criteria: Mark all 4 items as done

SPEC_02:
- Test count: 140 → 141
- Test Matrix: Mark both modes as IMPLEMENTED
- Acceptance criteria: Mark 4/5 items as done

Previous agent claimed "updated" but left stale values.

Files changed (2) hide show

docs/specs/SPEC_01_DEMO_TERMINATION.md +7 -5
docs/specs/SPEC_02_E2E_TESTING.md +11 -9

docs/specs/SPEC_01_DEMO_TERMINATION.md CHANGED Viewed

@@ -16,7 +16,7 @@ Advanced (Magentic) mode runs indefinitely from user perspective. The demo was m
 ### Question 1: Does max_round_count actually work?
 ```python
-# Current code (src/orchestrator_magentic.py:94)
 .with_standard_manager(
     chat_client=manager_client,
     max_round_count=self._max_rounds,  # Default: 10
@@ -99,10 +99,12 @@ if len(evidence) >= 20:
 ## Acceptance Criteria
-- [ ] Demo completes in <5 minutes with visible progress
-- [ ] User sees round count (e.g., "Round 3/5")
-- [ ] Always produces SOME output (even if partial)
-- [ ] Timeout prevents infinite running
 ## Test Plan

 ### Question 1: Does max_round_count actually work?
 ```python
+# Current code (src/orchestrator_magentic.py:111)
 .with_standard_manager(
     chat_client=manager_client,
     max_round_count=self._max_rounds,  # Default: 10
 ## Acceptance Criteria
+- [x] Demo completes in <5 minutes with visible progress
+- [x] User sees round count (e.g., "Round 3/5")
+- [x] Always produces SOME output (even if partial)
+- [x] Timeout prevents infinite running
+**Status: IMPLEMENTED** (commit b1d094d)
 ## Test Plan

docs/specs/SPEC_02_E2E_TESTING.md CHANGED Viewed

@@ -4,7 +4,7 @@
 ## Problem Statement
-We have 140 unit tests that verify individual components work, but **no test that proves the full pipeline produces useful research output**.
 We don't know if:
 1. Simple mode produces a valid report
@@ -115,14 +115,14 @@ async def test_real_pubmed_search():
 | Mode | Mock | Real API | Status |
 |------|------|----------|--------|
-| Simple (Free) | ✅ Need | ⏳ Optional | Not implemented |
-| Advanced (OpenAI) | ✅ Need | ⏳ Optional | Not implemented |
 ## Directory Structure
 ```
 tests/
-├── unit/           # Existing 140 tests
 ├── integration/    # Real API tests (existing)
 └── e2e/            # NEW: Full pipeline tests
     ├── conftest.py         # E2E fixtures
@@ -132,11 +132,13 @@ tests/
 ## Acceptance Criteria
-- [ ] E2E test for Simple mode (mocked)
-- [ ] E2E test for Advanced mode (mocked)
-- [ ] Tests validate output structure
-- [ ] Tests run in CI (<2 minutes)
-- [ ] At least one integration test with real API
 ## Why Before OpenAlex?

 ## Problem Statement
+We have 141 unit tests that verify individual components work, but **no test that proves the full pipeline produces useful research output**.
 We don't know if:
 1. Simple mode produces a valid report
 | Mode | Mock | Real API | Status |
 |------|------|----------|--------|
+| Simple (Free) | ✅ Done | ⏳ Optional | ✅ IMPLEMENTED |
+| Advanced (OpenAI) | ✅ Done | ⏳ Optional | ✅ IMPLEMENTED |
 ## Directory Structure
 ```
 tests/
+├── unit/           # Existing 141 tests
 ├── integration/    # Real API tests (existing)
 └── e2e/            # NEW: Full pipeline tests
     ├── conftest.py         # E2E fixtures
 ## Acceptance Criteria
+- [x] E2E test for Simple mode (mocked)
+- [x] E2E test for Advanced mode (mocked)
+- [x] Tests validate output structure
+- [x] Tests run in CI (<2 minutes)
+- [ ] At least one integration test with real API (existing in tests/integration/)
+**Status: IMPLEMENTED** (commit b1d094d)
 ## Why Before OpenAlex?