Spaces:

openenv-community
/

replicalab

Running

App Files Files Community

replicalab / docs /map /tests.md

maxxie114

Initial HF Spaces deployment

80d8c84 2 days ago

preview code

raw

history blame contribute delete

3.77 kB

	# Tests Map - `tests/`

	> 365 tests across 18 files. All passing.
	>
	> Last verified: 2026-03-08

	## Summary

	\| File \| Tests \| What it covers \|
	\|------\|-------\|----------------\|
	\| `test_api_rest_isolation.py` \| 11 \| `API 14` REST session isolation and replay separation \|
	\| `test_cache.py` \| 2 \| Oracle scenario caching and reuse \|
	\| `test_client.py` \| 24 \| `TRN 13` reusable client over REST and WebSocket \|
	\| `test_config.py` \| 3 \| Shared constants and config consistency \|
	\| `test_env.py` \| 56 \| `ENV 01-08`, `ENV 10`, `ENV 11`, `OBS 04`, `JDG 04-05`, `TST 01-03` \|
	\| `test_judge_policy.py` \| 10 \| `JDG 11` structured judge audit payload \|
	\| `test_lab_manager_policy.py` \| 37 \| `AGT 05-07` plus `AGT 09` determinism coverage \|
	\| `test_models.py` \| 21 \| Action, observation, step, state, and log contracts \|
	\| `test_logging.py` \| 11 \| `MOD 07` replay persistence and `JDG 07` CSV logging helpers \|
	\| `test_oracle.py` \| 5 \| Oracle hybrid wrapper, structured parsing, and env reset adapter \|
	\| `test_prompts.py` \| 7 \| `AGT 10` prompt files and Oracle prompt asset loading \|
	\| `test_reward.py` \| 40 \| `JDG 01-06`, `JDG 08`, and reward regression coverage \|
	\| `test_rollout.py` \| 12 \| `TRN 03` rollout worker behavior \|
	\| `test_rollout_traces.py` \| 2 \| `TRN 04` bounded tool trace aggregation and batched collection \|
	\| `test_scenarios.py` \| 14 \| `SCN 01-13` scenario generation, determinism, and Oracle scenario adaptation \|
	\| `test_scientist_policy.py` \| 46 \| `MOD 09`, `AGT 01-04`, `AGT 08` \|
	\| `test_server.py` \| 44 \| `API 01-04`, `API 06-08`, `API 13-14`, replay audit propagation, and root landing page \|
	\| `test_validation.py` \| 20 \| `MOD 05-06` semantic validation \|
	\| Total \| 365 \| \|

	## Coverage Notes

	- The environment stack is covered end to end:
	- `test_env.py` validates reset, step, invalid action, termination, reward integration, deep state snapshots, close/reopen lifecycle behavior, terminal judge-audit propagation, and seeded replay determinism across all scenario families.
	- The API/server stack is covered end to end:
	- `test_server.py` covers REST reset/step/scenarios, WebSocket session handling, idle timeout cleanup, CORS behavior, and replay audit propagation.
	- The scientist stack is covered end to end:
	- `test_scientist_policy.py`, `test_prompts.py`, `test_rollout.py`, and `test_rollout_traces.py` together cover prompt construction, observation formatting, parse/retry, baseline policy, rollout collection, and bounded tool trace capture.
	- The judge stack is covered end to end:
	- `test_reward.py` covers rubric scores and reward math, while `test_judge_policy.py` covers structured audit payload generation.
	- The Oracle hybrid layer is covered additively:
	- `test_oracle.py`, `test_cache.py`, and `test_prompts.py` cover Oracle scenario generation wrappers, cache reuse, and prompt asset loading without changing the deterministic reward contract.

	## Remaining Gaps

	\| Planned test work \| Why it still matters \|
	\|-------------------\|----------------------\|
	\| `TST 09` notebook smoke coverage \| Fresh-runtime validation for the judged training notebook \|

	## Task-to-Test Mapping

	\| Area \| Primary test files \|
	\|------\|--------------------\|
	\| Models and contracts \| `test_models.py`, `test_validation.py` \|
	\| Scenarios \| `test_scenarios.py` \|
	\| Oracle integration and cache \| `test_oracle.py`, `test_cache.py`, `test_prompts.py` \|
	\| Scientist policy \| `test_scientist_policy.py`, `test_prompts.py` \|
	\| Lab Manager policy \| `test_lab_manager_policy.py` \|
	\| Judge and reward \| `test_reward.py`, `test_judge_policy.py` \|
	\| Environment \| `test_env.py` \|
	\| API and deployment-facing server behavior \| `test_server.py` \|
	\| Client and training rollouts \| `test_client.py`, `test_rollout.py`, `test_rollout_traces.py` \|

	# Tests Map - `tests/`

	> 365 tests across 18 files. All passing.
	>
	> Last verified: 2026-03-08

	## Summary

	\| File \| Tests \| What it covers \|
	\|------\|-------\|----------------\|
	\| `test_api_rest_isolation.py` \| 11 \| `API 14` REST session isolation and replay separation \|
	\| `test_cache.py` \| 2 \| Oracle scenario caching and reuse \|
	\| `test_client.py` \| 24 \| `TRN 13` reusable client over REST and WebSocket \|
	\| `test_config.py` \| 3 \| Shared constants and config consistency \|
	\| `test_env.py` \| 56 \| `ENV 01-08`, `ENV 10`, `ENV 11`, `OBS 04`, `JDG 04-05`, `TST 01-03` \|
	\| `test_judge_policy.py` \| 10 \| `JDG 11` structured judge audit payload \|
	\| `test_lab_manager_policy.py` \| 37 \| `AGT 05-07` plus `AGT 09` determinism coverage \|
	\| `test_models.py` \| 21 \| Action, observation, step, state, and log contracts \|
	\| `test_logging.py` \| 11 \| `MOD 07` replay persistence and `JDG 07` CSV logging helpers \|
	\| `test_oracle.py` \| 5 \| Oracle hybrid wrapper, structured parsing, and env reset adapter \|
	\| `test_prompts.py` \| 7 \| `AGT 10` prompt files and Oracle prompt asset loading \|
	\| `test_reward.py` \| 40 \| `JDG 01-06`, `JDG 08`, and reward regression coverage \|
	\| `test_rollout.py` \| 12 \| `TRN 03` rollout worker behavior \|
	\| `test_rollout_traces.py` \| 2 \| `TRN 04` bounded tool trace aggregation and batched collection \|
	\| `test_scenarios.py` \| 14 \| `SCN 01-13` scenario generation, determinism, and Oracle scenario adaptation \|
	\| `test_scientist_policy.py` \| 46 \| `MOD 09`, `AGT 01-04`, `AGT 08` \|
	\| `test_server.py` \| 44 \| `API 01-04`, `API 06-08`, `API 13-14`, replay audit propagation, and root landing page \|
	\| `test_validation.py` \| 20 \| `MOD 05-06` semantic validation \|
	\| Total \| 365 \| \|

	## Coverage Notes

	- The environment stack is covered end to end:
	- `test_env.py` validates reset, step, invalid action, termination, reward integration, deep state snapshots, close/reopen lifecycle behavior, terminal judge-audit propagation, and seeded replay determinism across all scenario families.
	- The API/server stack is covered end to end:
	- `test_server.py` covers REST reset/step/scenarios, WebSocket session handling, idle timeout cleanup, CORS behavior, and replay audit propagation.
	- The scientist stack is covered end to end:
	- `test_scientist_policy.py`, `test_prompts.py`, `test_rollout.py`, and `test_rollout_traces.py` together cover prompt construction, observation formatting, parse/retry, baseline policy, rollout collection, and bounded tool trace capture.
	- The judge stack is covered end to end:
	- `test_reward.py` covers rubric scores and reward math, while `test_judge_policy.py` covers structured audit payload generation.
	- The Oracle hybrid layer is covered additively:
	- `test_oracle.py`, `test_cache.py`, and `test_prompts.py` cover Oracle scenario generation wrappers, cache reuse, and prompt asset loading without changing the deterministic reward contract.

	## Remaining Gaps

	\| Planned test work \| Why it still matters \|
	\|-------------------\|----------------------\|
	\| `TST 09` notebook smoke coverage \| Fresh-runtime validation for the judged training notebook \|

	## Task-to-Test Mapping

	\| Area \| Primary test files \|
	\|------\|--------------------\|
	\| Models and contracts \| `test_models.py`, `test_validation.py` \|
	\| Scenarios \| `test_scenarios.py` \|
	\| Oracle integration and cache \| `test_oracle.py`, `test_cache.py`, `test_prompts.py` \|
	\| Scientist policy \| `test_scientist_policy.py`, `test_prompts.py` \|
	\| Lab Manager policy \| `test_lab_manager_policy.py` \|
	\| Judge and reward \| `test_reward.py`, `test_judge_policy.py` \|
	\| Environment \| `test_env.py` \|
	\| API and deployment-facing server behavior \| `test_server.py` \|
	\| Client and training rollouts \| `test_client.py`, `test_rollout.py`, `test_rollout_traces.py` \|