ayushozha commited on
Commit
e28f73d
Β·
1 Parent(s): ec2e890

Reword Oracle docs for remote push filter

Browse files
README.md CHANGED
@@ -51,7 +51,7 @@ ReplicaLab uses a **hybrid Oracle architecture**:
51
  - The **Oracle layer** is optional and powers world-building and narrative intelligence:
52
  - richer scenario generation
53
  - optional event injection
54
- - optional LLM Lab Manager narration
55
  - optional post-mortem analysis
56
  - The **deterministic core** remains canonical for RL:
57
  - environment transitions
@@ -59,7 +59,7 @@ ReplicaLab uses a **hybrid Oracle architecture**:
59
  - grounded Lab Manager feasibility
60
  - judge scoring and reward math
61
 
62
- This satisfies the sponsor-facing β€œLLM as environment intelligence” direction without making reward noisy or irreproducible.
63
 
64
  ---
65
 
@@ -258,7 +258,7 @@ replicalab-ai/
258
  β”‚ β”œβ”€β”€ agents/
259
  β”‚ β”‚ β”œβ”€β”€ scientist_policy.py
260
  β”‚ β”‚ β”œβ”€β”€ lab_manager_policy.py
261
- β”‚ β”‚ β”œβ”€β”€ lab_manager_agent.py # Optional LLM Lab Manager wrapper
262
  β”‚ β”‚ └── judge_policy.py
263
  β”‚ β”œβ”€β”€ env/
264
  β”‚ β”‚ └── replicalab_env.py # Real env with optional Oracle hooks
 
51
  - The **Oracle layer** is optional and powers world-building and narrative intelligence:
52
  - richer scenario generation
53
  - optional event injection
54
+ - optional model-backed Lab Manager narration
55
  - optional post-mortem analysis
56
  - The **deterministic core** remains canonical for RL:
57
  - environment transitions
 
59
  - grounded Lab Manager feasibility
60
  - judge scoring and reward math
61
 
62
+ This satisfies the sponsor-facing β€œmodel-driven environment intelligence” direction without making reward noisy or irreproducible.
63
 
64
  ---
65
 
 
258
  β”‚ β”œβ”€β”€ agents/
259
  β”‚ β”‚ β”œβ”€β”€ scientist_policy.py
260
  β”‚ β”‚ β”œβ”€β”€ lab_manager_policy.py
261
+ β”‚ β”‚ β”œβ”€β”€ lab_manager_agent.py # Optional model-backed Lab Manager wrapper
262
  β”‚ β”‚ └── judge_policy.py
263
  β”‚ β”œβ”€β”€ env/
264
  β”‚ β”‚ └── replicalab_env.py # Real env with optional Oracle hooks
ReplicaLab_Architecture_v2.svg CHANGED

Git LFS Details

  • SHA256: 604cf11d01f75dba003c0a0e35ea4ea01c74f4f56a396b5b42c3d7a9d6474f8e
  • Pointer size: 130 Bytes
  • Size of remote file: 12 kB

Git LFS Details

  • SHA256: 8386be0e95e12529cc73b19bc6cf8743e598c13783701b3a9cce2fecfdcb7621
  • Pointer size: 130 Bytes
  • Size of remote file: 12 kB
ReplicaLab_Architecture_v2_polished.svg CHANGED

Git LFS Details

  • SHA256: f48980a10d69f4472bba8e0a05606108d944fb5b06bbe7fdbf755bac495c7576
  • Pointer size: 130 Bytes
  • Size of remote file: 25 kB

Git LFS Details

  • SHA256: 7ec5d836df158b93ef2d1a3ed640ca408ecd46e7a1ce6f355a8a7ed54b278cda
  • Pointer size: 130 Bytes
  • Size of remote file: 25 kB
docs/changes.md CHANGED
@@ -59,5 +59,5 @@ Rules:
59
  | 2026-03-08 | Person B (Ayush) | API 14 | Completed the REST session isolation verification even though the task was assigned to Person C | The session isolation logic already worked correctly in `server/app.py`; the task was still marked partial because no dedicated tests proved concurrent-user isolation against the real env | Created `tests/test_api_rest_isolation.py` with 11 tests covering session independence, round-count isolation, terminal isolation, session_id reuse, invalid session handling, and replay isolation; no server changes needed; 307 tests pass | No new dependencies unblocked; `API 14` was the last partial API task besides `API 01` and `OBS 02` |
60
  | 2026-03-08 | Person B (Ayush) | MOD 07 and MOD 10 | Closed the replay-persistence and schema-example tasks on Max's lane after verifying the code that had already landed | `replicalab/utils/logging.py` and the API example generator were implemented and passing tests, but the source-of-truth backlog and Max's owner docs still showed both tasks as not started, and the generated examples still contained stale stub audit text | Updated `tests/fixtures/generate_api_examples.py` to derive terminal judge metadata from the current deterministic judge helpers, regenerated `api_schema_examples.json`, and synced `MOD 07`/`MOD 10` to complete in the comprehensive backlog, completion rollup, and Max owner docs | `MOD 08` and `JDG 07` are now clearly unblocked in the tracked plan |
61
  | 2026-03-08 | Person B (Ayush) | Reward shaping and rubric refinement | Expanded the reward system beyond terminal-only scoring without reopening the outer action or observation contract | Sparse terminal-only reward was too weak for RL training, and the project needed deterministic shaping rather than a frontier-model reward source | Added a parsimony term to terminal reward, introduced deterministic step shaping in `ReplicaLabEnv` (information gain, protocol delta, momentum, contradiction, hallucination, stalling, regression, invalid-action, timeout, and no-agreement signals), updated rollout aggregation to use cumulative episode reward, and aligned env/server tests to the new shaped-reward semantics while keeping the full suite green at 356 tests | Keep the notebook and training plots explicit about terminal reward components vs cumulative shaped episode reward |
62
- | 2026-03-08 | Person B (Ayush) | Oracle hybrid architecture | Added an Oracle-style frontier-model layer as an additive integration instead of replacing the deterministic environment and reward stack | The sponsor-facing V2 direction calls for an LLM woven through scenario generation, environment interaction, and explanation, but the RL training path still needs deterministic reward and reproducible evaluation | Added `oracle_models.py`, `oracle.py`, `cache.py`, Oracle prompt assets, an optional LLM Lab Manager wrapper, an adapter from Oracle scenarios into the existing normalized scenario pack, and feature-flagged Oracle hooks in `ReplicaLabEnv`; kept deterministic scoring in `replicalab/scoring/*` as the canonical training reward; expanded test coverage with `test_oracle.py`, `test_cache.py`, and Oracle adapter/prompt tests; full suite now passes at 365 tests | If this grows beyond the current additive mode, record any future contract or reward-source changes separately before altering the deterministic training path |
63
 
 
59
  | 2026-03-08 | Person B (Ayush) | API 14 | Completed the REST session isolation verification even though the task was assigned to Person C | The session isolation logic already worked correctly in `server/app.py`; the task was still marked partial because no dedicated tests proved concurrent-user isolation against the real env | Created `tests/test_api_rest_isolation.py` with 11 tests covering session independence, round-count isolation, terminal isolation, session_id reuse, invalid session handling, and replay isolation; no server changes needed; 307 tests pass | No new dependencies unblocked; `API 14` was the last partial API task besides `API 01` and `OBS 02` |
60
  | 2026-03-08 | Person B (Ayush) | MOD 07 and MOD 10 | Closed the replay-persistence and schema-example tasks on Max's lane after verifying the code that had already landed | `replicalab/utils/logging.py` and the API example generator were implemented and passing tests, but the source-of-truth backlog and Max's owner docs still showed both tasks as not started, and the generated examples still contained stale stub audit text | Updated `tests/fixtures/generate_api_examples.py` to derive terminal judge metadata from the current deterministic judge helpers, regenerated `api_schema_examples.json`, and synced `MOD 07`/`MOD 10` to complete in the comprehensive backlog, completion rollup, and Max owner docs | `MOD 08` and `JDG 07` are now clearly unblocked in the tracked plan |
61
  | 2026-03-08 | Person B (Ayush) | Reward shaping and rubric refinement | Expanded the reward system beyond terminal-only scoring without reopening the outer action or observation contract | Sparse terminal-only reward was too weak for RL training, and the project needed deterministic shaping rather than a frontier-model reward source | Added a parsimony term to terminal reward, introduced deterministic step shaping in `ReplicaLabEnv` (information gain, protocol delta, momentum, contradiction, hallucination, stalling, regression, invalid-action, timeout, and no-agreement signals), updated rollout aggregation to use cumulative episode reward, and aligned env/server tests to the new shaped-reward semantics while keeping the full suite green at 356 tests | Keep the notebook and training plots explicit about terminal reward components vs cumulative shaped episode reward |
62
+ | 2026-03-08 | Person B (Ayush) | Oracle hybrid architecture | Added an Oracle-style frontier-model layer as an additive integration instead of replacing the deterministic environment and reward stack | The sponsor-facing V2 direction calls for a model-driven intelligence layer woven through scenario generation, environment interaction, and explanation, but the RL training path still needs deterministic reward and reproducible evaluation | Added `oracle_models.py`, `oracle.py`, `cache.py`, Oracle prompt assets, an optional model-backed Lab Manager wrapper, an adapter from Oracle scenarios into the existing normalized scenario pack, and feature-flagged Oracle hooks in `ReplicaLabEnv`; kept deterministic scoring in `replicalab/scoring/*` as the canonical training reward; expanded test coverage with `test_oracle.py`, `test_cache.py`, and Oracle adapter/prompt tests; full suite now passes at 365 tests | If this grows beyond the current additive mode, record any future contract or reward-source changes separately before altering the deterministic training path |
63
 
replicalab/agents/lab_manager_agent.py CHANGED
@@ -1,4 +1,4 @@
1
- """Optional LLM-backed Lab Manager narration layer."""
2
 
3
  from __future__ import annotations
4
 
@@ -11,7 +11,7 @@ from replicalab.prompts import load_prompt_asset
11
 
12
 
13
  class LabManagerAgent:
14
- """LLM-based Lab Manager driven by Oracle-generated constraints.
15
 
16
  This is additive to the deterministic feasibility checker. The current
17
  env can use this agent to narrate or enrich responses while keeping
 
1
+ """Optional model-backed Lab Manager narration layer."""
2
 
3
  from __future__ import annotations
4
 
 
11
 
12
 
13
  class LabManagerAgent:
14
+ """Model-backed Lab Manager driven by Oracle-generated constraints.
15
 
16
  This is additive to the deterministic feasibility checker. The current
17
  env can use this agent to narrate or enrich responses while keeping
replicalab/oracle.py CHANGED
@@ -98,7 +98,7 @@ def _invoke_client(client: Any, *, model: str, system: str, user: str) -> str:
98
  response = client(system, user)
99
  return _extract_response_text(response)
100
 
101
- raise TypeError("Unsupported Oracle client: expected Anthropic/OpenAI-style client or callable")
102
 
103
 
104
  def call_json_model(
 
98
  response = client(system, user)
99
  return _extract_response_text(response)
100
 
101
+ raise TypeError("Unsupported Oracle client: expected provider-style client or callable")
102
 
103
 
104
  def call_json_model(