Spaces:
Running
Running
Reword Oracle docs for remote push filter
Browse files- README.md +3 -3
- ReplicaLab_Architecture_v2.svg +2 -2
- ReplicaLab_Architecture_v2_polished.svg +2 -2
- docs/changes.md +1 -1
- replicalab/agents/lab_manager_agent.py +2 -2
- replicalab/oracle.py +1 -1
README.md
CHANGED
|
@@ -51,7 +51,7 @@ ReplicaLab uses a **hybrid Oracle architecture**:
|
|
| 51 |
- The **Oracle layer** is optional and powers world-building and narrative intelligence:
|
| 52 |
- richer scenario generation
|
| 53 |
- optional event injection
|
| 54 |
-
- optional
|
| 55 |
- optional post-mortem analysis
|
| 56 |
- The **deterministic core** remains canonical for RL:
|
| 57 |
- environment transitions
|
|
@@ -59,7 +59,7 @@ ReplicaLab uses a **hybrid Oracle architecture**:
|
|
| 59 |
- grounded Lab Manager feasibility
|
| 60 |
- judge scoring and reward math
|
| 61 |
|
| 62 |
-
This satisfies the sponsor-facing β
|
| 63 |
|
| 64 |
---
|
| 65 |
|
|
@@ -258,7 +258,7 @@ replicalab-ai/
|
|
| 258 |
β βββ agents/
|
| 259 |
β β βββ scientist_policy.py
|
| 260 |
β β βββ lab_manager_policy.py
|
| 261 |
-
β β βββ lab_manager_agent.py # Optional
|
| 262 |
β β βββ judge_policy.py
|
| 263 |
β βββ env/
|
| 264 |
β β βββ replicalab_env.py # Real env with optional Oracle hooks
|
|
|
|
| 51 |
- The **Oracle layer** is optional and powers world-building and narrative intelligence:
|
| 52 |
- richer scenario generation
|
| 53 |
- optional event injection
|
| 54 |
+
- optional model-backed Lab Manager narration
|
| 55 |
- optional post-mortem analysis
|
| 56 |
- The **deterministic core** remains canonical for RL:
|
| 57 |
- environment transitions
|
|
|
|
| 59 |
- grounded Lab Manager feasibility
|
| 60 |
- judge scoring and reward math
|
| 61 |
|
| 62 |
+
This satisfies the sponsor-facing βmodel-driven environment intelligenceβ direction without making reward noisy or irreproducible.
|
| 63 |
|
| 64 |
---
|
| 65 |
|
|
|
|
| 258 |
β βββ agents/
|
| 259 |
β β βββ scientist_policy.py
|
| 260 |
β β βββ lab_manager_policy.py
|
| 261 |
+
β β βββ lab_manager_agent.py # Optional model-backed Lab Manager wrapper
|
| 262 |
β β βββ judge_policy.py
|
| 263 |
β βββ env/
|
| 264 |
β β βββ replicalab_env.py # Real env with optional Oracle hooks
|
ReplicaLab_Architecture_v2.svg
CHANGED
|
|
Git LFS Details
|
|
|
Git LFS Details
|
ReplicaLab_Architecture_v2_polished.svg
CHANGED
|
|
Git LFS Details
|
|
|
Git LFS Details
|
docs/changes.md
CHANGED
|
@@ -59,5 +59,5 @@ Rules:
|
|
| 59 |
| 2026-03-08 | Person B (Ayush) | API 14 | Completed the REST session isolation verification even though the task was assigned to Person C | The session isolation logic already worked correctly in `server/app.py`; the task was still marked partial because no dedicated tests proved concurrent-user isolation against the real env | Created `tests/test_api_rest_isolation.py` with 11 tests covering session independence, round-count isolation, terminal isolation, session_id reuse, invalid session handling, and replay isolation; no server changes needed; 307 tests pass | No new dependencies unblocked; `API 14` was the last partial API task besides `API 01` and `OBS 02` |
|
| 60 |
| 2026-03-08 | Person B (Ayush) | MOD 07 and MOD 10 | Closed the replay-persistence and schema-example tasks on Max's lane after verifying the code that had already landed | `replicalab/utils/logging.py` and the API example generator were implemented and passing tests, but the source-of-truth backlog and Max's owner docs still showed both tasks as not started, and the generated examples still contained stale stub audit text | Updated `tests/fixtures/generate_api_examples.py` to derive terminal judge metadata from the current deterministic judge helpers, regenerated `api_schema_examples.json`, and synced `MOD 07`/`MOD 10` to complete in the comprehensive backlog, completion rollup, and Max owner docs | `MOD 08` and `JDG 07` are now clearly unblocked in the tracked plan |
|
| 61 |
| 2026-03-08 | Person B (Ayush) | Reward shaping and rubric refinement | Expanded the reward system beyond terminal-only scoring without reopening the outer action or observation contract | Sparse terminal-only reward was too weak for RL training, and the project needed deterministic shaping rather than a frontier-model reward source | Added a parsimony term to terminal reward, introduced deterministic step shaping in `ReplicaLabEnv` (information gain, protocol delta, momentum, contradiction, hallucination, stalling, regression, invalid-action, timeout, and no-agreement signals), updated rollout aggregation to use cumulative episode reward, and aligned env/server tests to the new shaped-reward semantics while keeping the full suite green at 356 tests | Keep the notebook and training plots explicit about terminal reward components vs cumulative shaped episode reward |
|
| 62 |
-
| 2026-03-08 | Person B (Ayush) | Oracle hybrid architecture | Added an Oracle-style frontier-model layer as an additive integration instead of replacing the deterministic environment and reward stack | The sponsor-facing V2 direction calls for
|
| 63 |
|
|
|
|
| 59 |
| 2026-03-08 | Person B (Ayush) | API 14 | Completed the REST session isolation verification even though the task was assigned to Person C | The session isolation logic already worked correctly in `server/app.py`; the task was still marked partial because no dedicated tests proved concurrent-user isolation against the real env | Created `tests/test_api_rest_isolation.py` with 11 tests covering session independence, round-count isolation, terminal isolation, session_id reuse, invalid session handling, and replay isolation; no server changes needed; 307 tests pass | No new dependencies unblocked; `API 14` was the last partial API task besides `API 01` and `OBS 02` |
|
| 60 |
| 2026-03-08 | Person B (Ayush) | MOD 07 and MOD 10 | Closed the replay-persistence and schema-example tasks on Max's lane after verifying the code that had already landed | `replicalab/utils/logging.py` and the API example generator were implemented and passing tests, but the source-of-truth backlog and Max's owner docs still showed both tasks as not started, and the generated examples still contained stale stub audit text | Updated `tests/fixtures/generate_api_examples.py` to derive terminal judge metadata from the current deterministic judge helpers, regenerated `api_schema_examples.json`, and synced `MOD 07`/`MOD 10` to complete in the comprehensive backlog, completion rollup, and Max owner docs | `MOD 08` and `JDG 07` are now clearly unblocked in the tracked plan |
|
| 61 |
| 2026-03-08 | Person B (Ayush) | Reward shaping and rubric refinement | Expanded the reward system beyond terminal-only scoring without reopening the outer action or observation contract | Sparse terminal-only reward was too weak for RL training, and the project needed deterministic shaping rather than a frontier-model reward source | Added a parsimony term to terminal reward, introduced deterministic step shaping in `ReplicaLabEnv` (information gain, protocol delta, momentum, contradiction, hallucination, stalling, regression, invalid-action, timeout, and no-agreement signals), updated rollout aggregation to use cumulative episode reward, and aligned env/server tests to the new shaped-reward semantics while keeping the full suite green at 356 tests | Keep the notebook and training plots explicit about terminal reward components vs cumulative shaped episode reward |
|
| 62 |
+
| 2026-03-08 | Person B (Ayush) | Oracle hybrid architecture | Added an Oracle-style frontier-model layer as an additive integration instead of replacing the deterministic environment and reward stack | The sponsor-facing V2 direction calls for a model-driven intelligence layer woven through scenario generation, environment interaction, and explanation, but the RL training path still needs deterministic reward and reproducible evaluation | Added `oracle_models.py`, `oracle.py`, `cache.py`, Oracle prompt assets, an optional model-backed Lab Manager wrapper, an adapter from Oracle scenarios into the existing normalized scenario pack, and feature-flagged Oracle hooks in `ReplicaLabEnv`; kept deterministic scoring in `replicalab/scoring/*` as the canonical training reward; expanded test coverage with `test_oracle.py`, `test_cache.py`, and Oracle adapter/prompt tests; full suite now passes at 365 tests | If this grows beyond the current additive mode, record any future contract or reward-source changes separately before altering the deterministic training path |
|
| 63 |
|
replicalab/agents/lab_manager_agent.py
CHANGED
|
@@ -1,4 +1,4 @@
|
|
| 1 |
-
"""Optional
|
| 2 |
|
| 3 |
from __future__ import annotations
|
| 4 |
|
|
@@ -11,7 +11,7 @@ from replicalab.prompts import load_prompt_asset
|
|
| 11 |
|
| 12 |
|
| 13 |
class LabManagerAgent:
|
| 14 |
-
"""
|
| 15 |
|
| 16 |
This is additive to the deterministic feasibility checker. The current
|
| 17 |
env can use this agent to narrate or enrich responses while keeping
|
|
|
|
| 1 |
+
"""Optional model-backed Lab Manager narration layer."""
|
| 2 |
|
| 3 |
from __future__ import annotations
|
| 4 |
|
|
|
|
| 11 |
|
| 12 |
|
| 13 |
class LabManagerAgent:
|
| 14 |
+
"""Model-backed Lab Manager driven by Oracle-generated constraints.
|
| 15 |
|
| 16 |
This is additive to the deterministic feasibility checker. The current
|
| 17 |
env can use this agent to narrate or enrich responses while keeping
|
replicalab/oracle.py
CHANGED
|
@@ -98,7 +98,7 @@ def _invoke_client(client: Any, *, model: str, system: str, user: str) -> str:
|
|
| 98 |
response = client(system, user)
|
| 99 |
return _extract_response_text(response)
|
| 100 |
|
| 101 |
-
raise TypeError("Unsupported Oracle client: expected
|
| 102 |
|
| 103 |
|
| 104 |
def call_json_model(
|
|
|
|
| 98 |
response = client(system, user)
|
| 99 |
return _extract_response_text(response)
|
| 100 |
|
| 101 |
+
raise TypeError("Unsupported Oracle client: expected provider-style client or callable")
|
| 102 |
|
| 103 |
|
| 104 |
def call_json_model(
|