Addyk24 commited on
Commit
b87e31c
·
verified ·
1 Parent(s): 3b8351d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +20 -0
README.md CHANGED
@@ -58,7 +58,25 @@ An agent is placed in a simulated corporate workspace as a **Product Manager**.
58
  │ │
59
  │ Reward: Dense (discovery) + Sparse (harmonic mean) │
60
  └─────────────────────────────────────────────────────┘
 
 
61
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
62
 
63
  ### Hidden Constraints (what the agent must discover)
64
 
@@ -70,6 +88,8 @@ An agent is placed in a simulated corporate workspace as a **Product Manager**.
70
 
71
  The agent never sees these directly. It must ask the right questions, interpret expert responses, and synthesize a draft that addresses all three.
72
 
 
 
73
  ### Actions
74
 
75
  ```python
 
58
  │ │
59
  │ Reward: Dense (discovery) + Sparse (harmonic mean) │
60
  └─────────────────────────────────────────────────────┘
61
+
62
+
63
  ```
64
+ ### 🏛️ System Architecture: The State-Based Sieve
65
+
66
+ Our architecture is designed as a closed-loop State Machine. Unlike standard LLM "chat" wrappers, Project Polymath implements a rigorous enforcement layer that separates reasoning from execution.
67
+
68
+
69
+ ![architecture](system_architecture.png)
70
+
71
+
72
+ Architectural Highlights:
73
+
74
+ - The 40-Token Critical Sieve: Positioned as a diamond gate between the Agent and the Workspace. It acts as a hard bandwidth filter, ensuring the model is penalized for any verbosity that exceeds the survivor-mode threshold.
75
+
76
+ - Expert Constraints Database: A persistent state container holding hidden stakeholder variables. The Environment only allows these variables to be "unlocked" through specific, targeted queries from the agent.
77
+
78
+ - Closed-Loop Reward Engine: The "Judge" monitors the state changes in the environment and provides a real-time floating-point reward signal back to the GRPO trainer, iteratively sharpening the "Sniper" logic.
79
+
80
 
81
  ### Hidden Constraints (what the agent must discover)
82
 
 
88
 
89
  The agent never sees these directly. It must ask the right questions, interpret expert responses, and synthesize a draft that addresses all three.
90
 
91
+ ```
92
+ ```
93
  ### Actions
94
 
95
  ```python