Commit
Β·
10e320d
1
Parent(s):
b1d094d
docs: fix spec half-measures (line refs, test counts, acceptance criteria)
Browse filesSPEC_01:
- Line reference: 94 β 111 (actual location)
- Acceptance criteria: Mark all 4 items as done
SPEC_02:
- Test count: 140 β 141
- Test Matrix: Mark both modes as IMPLEMENTED
- Acceptance criteria: Mark 4/5 items as done
Previous agent claimed "updated" but left stale values.
docs/specs/SPEC_01_DEMO_TERMINATION.md
CHANGED
|
@@ -16,7 +16,7 @@ Advanced (Magentic) mode runs indefinitely from user perspective. The demo was m
|
|
| 16 |
### Question 1: Does max_round_count actually work?
|
| 17 |
|
| 18 |
```python
|
| 19 |
-
# Current code (src/orchestrator_magentic.py:
|
| 20 |
.with_standard_manager(
|
| 21 |
chat_client=manager_client,
|
| 22 |
max_round_count=self._max_rounds, # Default: 10
|
|
@@ -99,10 +99,12 @@ if len(evidence) >= 20:
|
|
| 99 |
|
| 100 |
## Acceptance Criteria
|
| 101 |
|
| 102 |
-
- [
|
| 103 |
-
- [
|
| 104 |
-
- [
|
| 105 |
-
- [
|
|
|
|
|
|
|
| 106 |
|
| 107 |
## Test Plan
|
| 108 |
|
|
|
|
| 16 |
### Question 1: Does max_round_count actually work?
|
| 17 |
|
| 18 |
```python
|
| 19 |
+
# Current code (src/orchestrator_magentic.py:111)
|
| 20 |
.with_standard_manager(
|
| 21 |
chat_client=manager_client,
|
| 22 |
max_round_count=self._max_rounds, # Default: 10
|
|
|
|
| 99 |
|
| 100 |
## Acceptance Criteria
|
| 101 |
|
| 102 |
+
- [x] Demo completes in <5 minutes with visible progress
|
| 103 |
+
- [x] User sees round count (e.g., "Round 3/5")
|
| 104 |
+
- [x] Always produces SOME output (even if partial)
|
| 105 |
+
- [x] Timeout prevents infinite running
|
| 106 |
+
|
| 107 |
+
**Status: IMPLEMENTED** (commit b1d094d)
|
| 108 |
|
| 109 |
## Test Plan
|
| 110 |
|
docs/specs/SPEC_02_E2E_TESTING.md
CHANGED
|
@@ -4,7 +4,7 @@
|
|
| 4 |
|
| 5 |
## Problem Statement
|
| 6 |
|
| 7 |
-
We have
|
| 8 |
|
| 9 |
We don't know if:
|
| 10 |
1. Simple mode produces a valid report
|
|
@@ -115,14 +115,14 @@ async def test_real_pubmed_search():
|
|
| 115 |
|
| 116 |
| Mode | Mock | Real API | Status |
|
| 117 |
|------|------|----------|--------|
|
| 118 |
-
| Simple (Free) | β
|
| 119 |
-
| Advanced (OpenAI) | β
|
| 120 |
|
| 121 |
## Directory Structure
|
| 122 |
|
| 123 |
```
|
| 124 |
tests/
|
| 125 |
-
βββ unit/ # Existing
|
| 126 |
βββ integration/ # Real API tests (existing)
|
| 127 |
βββ e2e/ # NEW: Full pipeline tests
|
| 128 |
βββ conftest.py # E2E fixtures
|
|
@@ -132,11 +132,13 @@ tests/
|
|
| 132 |
|
| 133 |
## Acceptance Criteria
|
| 134 |
|
| 135 |
-
- [
|
| 136 |
-
- [
|
| 137 |
-
- [
|
| 138 |
-
- [
|
| 139 |
-
- [ ] At least one integration test with real API
|
|
|
|
|
|
|
| 140 |
|
| 141 |
## Why Before OpenAlex?
|
| 142 |
|
|
|
|
| 4 |
|
| 5 |
## Problem Statement
|
| 6 |
|
| 7 |
+
We have 141 unit tests that verify individual components work, but **no test that proves the full pipeline produces useful research output**.
|
| 8 |
|
| 9 |
We don't know if:
|
| 10 |
1. Simple mode produces a valid report
|
|
|
|
| 115 |
|
| 116 |
| Mode | Mock | Real API | Status |
|
| 117 |
|------|------|----------|--------|
|
| 118 |
+
| Simple (Free) | β
Done | β³ Optional | β
IMPLEMENTED |
|
| 119 |
+
| Advanced (OpenAI) | β
Done | β³ Optional | β
IMPLEMENTED |
|
| 120 |
|
| 121 |
## Directory Structure
|
| 122 |
|
| 123 |
```
|
| 124 |
tests/
|
| 125 |
+
βββ unit/ # Existing 141 tests
|
| 126 |
βββ integration/ # Real API tests (existing)
|
| 127 |
βββ e2e/ # NEW: Full pipeline tests
|
| 128 |
βββ conftest.py # E2E fixtures
|
|
|
|
| 132 |
|
| 133 |
## Acceptance Criteria
|
| 134 |
|
| 135 |
+
- [x] E2E test for Simple mode (mocked)
|
| 136 |
+
- [x] E2E test for Advanced mode (mocked)
|
| 137 |
+
- [x] Tests validate output structure
|
| 138 |
+
- [x] Tests run in CI (<2 minutes)
|
| 139 |
+
- [ ] At least one integration test with real API (existing in tests/integration/)
|
| 140 |
+
|
| 141 |
+
**Status: IMPLEMENTED** (commit b1d094d)
|
| 142 |
|
| 143 |
## Why Before OpenAlex?
|
| 144 |
|