Spaces:

Addyk24
/

Project-Polymath

Sleeping

App Files Files Community

Addyk24 commited on 20 days ago

Commit

b87e31c

verified ·

1 Parent(s): 3b8351d

Update README.md

Browse files

Files changed (1) hide show

README.md +20 -0

README.md CHANGED Viewed

@@ -58,7 +58,25 @@ An agent is placed in a simulated corporate workspace as a **Product Manager**.
 │                                                     │
 │  Reward: Dense (discovery) + Sparse (harmonic mean) │
 └─────────────────────────────────────────────────────┘
 ```
 ### Hidden Constraints (what the agent must discover)
@@ -70,6 +88,8 @@ An agent is placed in a simulated corporate workspace as a **Product Manager**.
 The agent never sees these directly. It must ask the right questions, interpret expert responses, and synthesize a draft that addresses all three.
 ### Actions
 ```python

 │                                                     │
 │  Reward: Dense (discovery) + Sparse (harmonic mean) │
 └─────────────────────────────────────────────────────┘
 ```
+### 🏛️ System Architecture: The State-Based Sieve
+Our architecture is designed as a closed-loop State Machine. Unlike standard LLM "chat" wrappers, Project Polymath implements a rigorous enforcement layer that separates reasoning from execution.
+![architecture](system_architecture.png)
+Architectural Highlights:
+- The 40-Token Critical Sieve: Positioned as a diamond gate between the Agent and the Workspace. It acts as a hard bandwidth filter, ensuring the model is penalized for any verbosity that exceeds the survivor-mode threshold.
+- Expert Constraints Database: A persistent state container holding hidden stakeholder variables. The Environment only allows these variables to be "unlocked" through specific, targeted queries from the agent.
+- Closed-Loop Reward Engine: The "Judge" monitors the state changes in the environment and provides a real-time floating-point reward signal back to the GRPO trainer, iteratively sharpening the "Sniper" logic.
 ### Hidden Constraints (what the agent must discover)
 The agent never sees these directly. It must ask the right questions, interpret expert responses, and synthesize a draft that addresses all three.
+```
+```
 ### Actions
 ```python