Spaces:
Sleeping
Sleeping
Update README.md
Browse files
README.md
CHANGED
|
@@ -58,7 +58,25 @@ An agent is placed in a simulated corporate workspace as a **Product Manager**.
|
|
| 58 |
│ │
|
| 59 |
│ Reward: Dense (discovery) + Sparse (harmonic mean) │
|
| 60 |
└─────────────────────────────────────────────────────┘
|
|
|
|
|
|
|
| 61 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 62 |
|
| 63 |
### Hidden Constraints (what the agent must discover)
|
| 64 |
|
|
@@ -70,6 +88,8 @@ An agent is placed in a simulated corporate workspace as a **Product Manager**.
|
|
| 70 |
|
| 71 |
The agent never sees these directly. It must ask the right questions, interpret expert responses, and synthesize a draft that addresses all three.
|
| 72 |
|
|
|
|
|
|
|
| 73 |
### Actions
|
| 74 |
|
| 75 |
```python
|
|
|
|
| 58 |
│ │
|
| 59 |
│ Reward: Dense (discovery) + Sparse (harmonic mean) │
|
| 60 |
└─────────────────────────────────────────────────────┘
|
| 61 |
+
|
| 62 |
+
|
| 63 |
```
|
| 64 |
+
### 🏛️ System Architecture: The State-Based Sieve
|
| 65 |
+
|
| 66 |
+
Our architecture is designed as a closed-loop State Machine. Unlike standard LLM "chat" wrappers, Project Polymath implements a rigorous enforcement layer that separates reasoning from execution.
|
| 67 |
+
|
| 68 |
+
|
| 69 |
+

|
| 70 |
+
|
| 71 |
+
|
| 72 |
+
Architectural Highlights:
|
| 73 |
+
|
| 74 |
+
- The 40-Token Critical Sieve: Positioned as a diamond gate between the Agent and the Workspace. It acts as a hard bandwidth filter, ensuring the model is penalized for any verbosity that exceeds the survivor-mode threshold.
|
| 75 |
+
|
| 76 |
+
- Expert Constraints Database: A persistent state container holding hidden stakeholder variables. The Environment only allows these variables to be "unlocked" through specific, targeted queries from the agent.
|
| 77 |
+
|
| 78 |
+
- Closed-Loop Reward Engine: The "Judge" monitors the state changes in the environment and provides a real-time floating-point reward signal back to the GRPO trainer, iteratively sharpening the "Sniper" logic.
|
| 79 |
+
|
| 80 |
|
| 81 |
### Hidden Constraints (what the agent must discover)
|
| 82 |
|
|
|
|
| 88 |
|
| 89 |
The agent never sees these directly. It must ask the right questions, interpret expert responses, and synthesize a draft that addresses all three.
|
| 90 |
|
| 91 |
+
```
|
| 92 |
+
```
|
| 93 |
### Actions
|
| 94 |
|
| 95 |
```python
|