File size: 3,770 Bytes
80d8c84
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
# Tests Map - `tests/`

> 365 tests across 18 files. All passing.
>
> **Last verified:** 2026-03-08

## Summary

| File | Tests | What it covers |
|------|-------|----------------|
| `test_api_rest_isolation.py` | 11 | `API 14` REST session isolation and replay separation |
| `test_cache.py` | 2 | Oracle scenario caching and reuse |
| `test_client.py` | 24 | `TRN 13` reusable client over REST and WebSocket |
| `test_config.py` | 3 | Shared constants and config consistency |
| `test_env.py` | 56 | `ENV 01-08`, `ENV 10`, `ENV 11`, `OBS 04`, `JDG 04-05`, `TST 01-03` |
| `test_judge_policy.py` | 10 | `JDG 11` structured judge audit payload |
| `test_lab_manager_policy.py` | 37 | `AGT 05-07` plus `AGT 09` determinism coverage |
| `test_models.py` | 21 | Action, observation, step, state, and log contracts |
| `test_logging.py` | 11 | `MOD 07` replay persistence and `JDG 07` CSV logging helpers |
| `test_oracle.py` | 5 | Oracle hybrid wrapper, structured parsing, and env reset adapter |
| `test_prompts.py` | 7 | `AGT 10` prompt files and Oracle prompt asset loading |
| `test_reward.py` | 40 | `JDG 01-06`, `JDG 08`, and reward regression coverage |
| `test_rollout.py` | 12 | `TRN 03` rollout worker behavior |
| `test_rollout_traces.py` | 2 | `TRN 04` bounded tool trace aggregation and batched collection |
| `test_scenarios.py` | 14 | `SCN 01-13` scenario generation, determinism, and Oracle scenario adaptation |
| `test_scientist_policy.py` | 46 | `MOD 09`, `AGT 01-04`, `AGT 08` |
| `test_server.py` | 44 | `API 01-04`, `API 06-08`, `API 13-14`, replay audit propagation, and root landing page |
| `test_validation.py` | 20 | `MOD 05-06` semantic validation |
| **Total** | **365** | |

## Coverage Notes

- The environment stack is covered end to end:
  - `test_env.py` validates reset, step, invalid action, termination, reward integration, deep state snapshots, close/reopen lifecycle behavior, terminal judge-audit propagation, and seeded replay determinism across all scenario families.
- The API/server stack is covered end to end:
  - `test_server.py` covers REST reset/step/scenarios, WebSocket session handling, idle timeout cleanup, CORS behavior, and replay audit propagation.
- The scientist stack is covered end to end:
  - `test_scientist_policy.py`, `test_prompts.py`, `test_rollout.py`, and `test_rollout_traces.py` together cover prompt construction, observation formatting, parse/retry, baseline policy, rollout collection, and bounded tool trace capture.
- The judge stack is covered end to end:
  - `test_reward.py` covers rubric scores and reward math, while `test_judge_policy.py` covers structured audit payload generation.
- The Oracle hybrid layer is covered additively:
  - `test_oracle.py`, `test_cache.py`, and `test_prompts.py` cover Oracle scenario generation wrappers, cache reuse, and prompt asset loading without changing the deterministic reward contract.

## Remaining Gaps

| Planned test work | Why it still matters |
|-------------------|----------------------|
| `TST 09` notebook smoke coverage | Fresh-runtime validation for the judged training notebook |

## Task-to-Test Mapping

| Area | Primary test files |
|------|--------------------|
| Models and contracts | `test_models.py`, `test_validation.py` |
| Scenarios | `test_scenarios.py` |
| Oracle integration and cache | `test_oracle.py`, `test_cache.py`, `test_prompts.py` |
| Scientist policy | `test_scientist_policy.py`, `test_prompts.py` |
| Lab Manager policy | `test_lab_manager_policy.py` |
| Judge and reward | `test_reward.py`, `test_judge_policy.py` |
| Environment | `test_env.py` |
| API and deployment-facing server behavior | `test_server.py` |
| Client and training rollouts | `test_client.py`, `test_rollout.py`, `test_rollout_traces.py` |