Spaces:

ayushozha
/

replicalab

Running

App Files Files Community

ayushozha commited on Mar 8

Commit

e28f73d

1 Parent(s): ec2e890

Reword Oracle docs for remote push filter

Browse files

Files changed (6) hide show

README.md +3 -3
ReplicaLab_Architecture_v2.svg +2 -2
ReplicaLab_Architecture_v2_polished.svg +2 -2
docs/changes.md +1 -1
replicalab/agents/lab_manager_agent.py +2 -2
replicalab/oracle.py +1 -1

README.md CHANGED Viewed

@@ -51,7 +51,7 @@ ReplicaLab uses a **hybrid Oracle architecture**:
 - The **Oracle layer** is optional and powers world-building and narrative intelligence:
   - richer scenario generation
   - optional event injection
-  - optional LLM Lab Manager narration
   - optional post-mortem analysis
 - The **deterministic core** remains canonical for RL:
   - environment transitions
@@ -59,7 +59,7 @@ ReplicaLab uses a **hybrid Oracle architecture**:
   - grounded Lab Manager feasibility
   - judge scoring and reward math
-This satisfies the sponsor-facing “LLM as environment intelligence” direction without making reward noisy or irreproducible.
 ---
@@ -258,7 +258,7 @@ replicalab-ai/
 │   ├── agents/
 │   │   ├── scientist_policy.py
 │   │   ├── lab_manager_policy.py
-│   │   ├── lab_manager_agent.py # Optional LLM Lab Manager wrapper
 │   │   └── judge_policy.py
 │   ├── env/
 │   │   └── replicalab_env.py    # Real env with optional Oracle hooks

 - The **Oracle layer** is optional and powers world-building and narrative intelligence:
   - richer scenario generation
   - optional event injection
+  - optional model-backed Lab Manager narration
   - optional post-mortem analysis
 - The **deterministic core** remains canonical for RL:
   - environment transitions
   - grounded Lab Manager feasibility
   - judge scoring and reward math
+This satisfies the sponsor-facing “model-driven environment intelligence” direction without making reward noisy or irreproducible.
 ---
 │   ├── agents/
 │   │   ├── scientist_policy.py
 │   │   ├── lab_manager_policy.py
+│   │   ├── lab_manager_agent.py # Optional model-backed Lab Manager wrapper
 │   │   └── judge_policy.py
 │   ├── env/
 │   │   └── replicalab_env.py    # Real env with optional Oracle hooks

ReplicaLab_Architecture_v2.svg CHANGED Viewed

Git LFS Details

SHA256: 604cf11d01f75dba003c0a0e35ea4ea01c74f4f56a396b5b42c3d7a9d6474f8e
Pointer size: 130 Bytes
Size of remote file: 12 kB

Git LFS Details

SHA256: 8386be0e95e12529cc73b19bc6cf8743e598c13783701b3a9cce2fecfdcb7621
Pointer size: 130 Bytes
Size of remote file: 12 kB

ReplicaLab_Architecture_v2_polished.svg CHANGED Viewed

Git LFS Details

SHA256: f48980a10d69f4472bba8e0a05606108d944fb5b06bbe7fdbf755bac495c7576
Pointer size: 130 Bytes
Size of remote file: 25 kB

Git LFS Details

SHA256: 7ec5d836df158b93ef2d1a3ed640ca408ecd46e7a1ce6f355a8a7ed54b278cda
Pointer size: 130 Bytes
Size of remote file: 25 kB

docs/changes.md CHANGED Viewed

@@ -59,5 +59,5 @@ Rules:
 | 2026-03-08 | Person B (Ayush) | API 14 | Completed the REST session isolation verification even though the task was assigned to Person C | The session isolation logic already worked correctly in `server/app.py`; the task was still marked partial because no dedicated tests proved concurrent-user isolation against the real env | Created `tests/test_api_rest_isolation.py` with 11 tests covering session independence, round-count isolation, terminal isolation, session_id reuse, invalid session handling, and replay isolation; no server changes needed; 307 tests pass | No new dependencies unblocked; `API 14` was the last partial API task besides `API 01` and `OBS 02` |
 | 2026-03-08 | Person B (Ayush) | MOD 07 and MOD 10 | Closed the replay-persistence and schema-example tasks on Max's lane after verifying the code that had already landed | `replicalab/utils/logging.py` and the API example generator were implemented and passing tests, but the source-of-truth backlog and Max's owner docs still showed both tasks as not started, and the generated examples still contained stale stub audit text | Updated `tests/fixtures/generate_api_examples.py` to derive terminal judge metadata from the current deterministic judge helpers, regenerated `api_schema_examples.json`, and synced `MOD 07`/`MOD 10` to complete in the comprehensive backlog, completion rollup, and Max owner docs | `MOD 08` and `JDG 07` are now clearly unblocked in the tracked plan |
 | 2026-03-08 | Person B (Ayush) | Reward shaping and rubric refinement | Expanded the reward system beyond terminal-only scoring without reopening the outer action or observation contract | Sparse terminal-only reward was too weak for RL training, and the project needed deterministic shaping rather than a frontier-model reward source | Added a parsimony term to terminal reward, introduced deterministic step shaping in `ReplicaLabEnv` (information gain, protocol delta, momentum, contradiction, hallucination, stalling, regression, invalid-action, timeout, and no-agreement signals), updated rollout aggregation to use cumulative episode reward, and aligned env/server tests to the new shaped-reward semantics while keeping the full suite green at 356 tests | Keep the notebook and training plots explicit about terminal reward components vs cumulative shaped episode reward |
-| 2026-03-08 | Person B (Ayush) | Oracle hybrid architecture | Added an Oracle-style frontier-model layer as an additive integration instead of replacing the deterministic environment and reward stack | The sponsor-facing V2 direction calls for an LLM woven through scenario generation, environment interaction, and explanation, but the RL training path still needs deterministic reward and reproducible evaluation | Added `oracle_models.py`, `oracle.py`, `cache.py`, Oracle prompt assets, an optional LLM Lab Manager wrapper, an adapter from Oracle scenarios into the existing normalized scenario pack, and feature-flagged Oracle hooks in `ReplicaLabEnv`; kept deterministic scoring in `replicalab/scoring/*` as the canonical training reward; expanded test coverage with `test_oracle.py`, `test_cache.py`, and Oracle adapter/prompt tests; full suite now passes at 365 tests | If this grows beyond the current additive mode, record any future contract or reward-source changes separately before altering the deterministic training path |

 | 2026-03-08 | Person B (Ayush) | API 14 | Completed the REST session isolation verification even though the task was assigned to Person C | The session isolation logic already worked correctly in `server/app.py`; the task was still marked partial because no dedicated tests proved concurrent-user isolation against the real env | Created `tests/test_api_rest_isolation.py` with 11 tests covering session independence, round-count isolation, terminal isolation, session_id reuse, invalid session handling, and replay isolation; no server changes needed; 307 tests pass | No new dependencies unblocked; `API 14` was the last partial API task besides `API 01` and `OBS 02` |
 | 2026-03-08 | Person B (Ayush) | MOD 07 and MOD 10 | Closed the replay-persistence and schema-example tasks on Max's lane after verifying the code that had already landed | `replicalab/utils/logging.py` and the API example generator were implemented and passing tests, but the source-of-truth backlog and Max's owner docs still showed both tasks as not started, and the generated examples still contained stale stub audit text | Updated `tests/fixtures/generate_api_examples.py` to derive terminal judge metadata from the current deterministic judge helpers, regenerated `api_schema_examples.json`, and synced `MOD 07`/`MOD 10` to complete in the comprehensive backlog, completion rollup, and Max owner docs | `MOD 08` and `JDG 07` are now clearly unblocked in the tracked plan |
 | 2026-03-08 | Person B (Ayush) | Reward shaping and rubric refinement | Expanded the reward system beyond terminal-only scoring without reopening the outer action or observation contract | Sparse terminal-only reward was too weak for RL training, and the project needed deterministic shaping rather than a frontier-model reward source | Added a parsimony term to terminal reward, introduced deterministic step shaping in `ReplicaLabEnv` (information gain, protocol delta, momentum, contradiction, hallucination, stalling, regression, invalid-action, timeout, and no-agreement signals), updated rollout aggregation to use cumulative episode reward, and aligned env/server tests to the new shaped-reward semantics while keeping the full suite green at 356 tests | Keep the notebook and training plots explicit about terminal reward components vs cumulative shaped episode reward |
+| 2026-03-08 | Person B (Ayush) | Oracle hybrid architecture | Added an Oracle-style frontier-model layer as an additive integration instead of replacing the deterministic environment and reward stack | The sponsor-facing V2 direction calls for a model-driven intelligence layer woven through scenario generation, environment interaction, and explanation, but the RL training path still needs deterministic reward and reproducible evaluation | Added `oracle_models.py`, `oracle.py`, `cache.py`, Oracle prompt assets, an optional model-backed Lab Manager wrapper, an adapter from Oracle scenarios into the existing normalized scenario pack, and feature-flagged Oracle hooks in `ReplicaLabEnv`; kept deterministic scoring in `replicalab/scoring/*` as the canonical training reward; expanded test coverage with `test_oracle.py`, `test_cache.py`, and Oracle adapter/prompt tests; full suite now passes at 365 tests | If this grows beyond the current additive mode, record any future contract or reward-source changes separately before altering the deterministic training path |

replicalab/agents/lab_manager_agent.py CHANGED Viewed

@@ -1,4 +1,4 @@
-"""Optional LLM-backed Lab Manager narration layer."""
 from __future__ import annotations
@@ -11,7 +11,7 @@ from replicalab.prompts import load_prompt_asset
 class LabManagerAgent:
-    """LLM-based Lab Manager driven by Oracle-generated constraints.
     This is additive to the deterministic feasibility checker. The current
     env can use this agent to narrate or enrich responses while keeping

+"""Optional model-backed Lab Manager narration layer."""
 from __future__ import annotations
 class LabManagerAgent:
+    """Model-backed Lab Manager driven by Oracle-generated constraints.
     This is additive to the deterministic feasibility checker. The current
     env can use this agent to narrate or enrich responses while keeping

replicalab/oracle.py CHANGED Viewed

@@ -98,7 +98,7 @@ def _invoke_client(client: Any, *, model: str, system: str, user: str) -> str:
             response = client(system, user)
         return _extract_response_text(response)
-    raise TypeError("Unsupported Oracle client: expected Anthropic/OpenAI-style client or callable")
 def call_json_model(

             response = client(system, user)
         return _extract_response_text(response)
+    raise TypeError("Unsupported Oracle client: expected provider-style client or callable")
 def call_json_model(