Spaces:

Addyk24
/

Project-Polymath

Sleeping

Addyk24 commited on 20 days ago

Commit

6686755

verified ·

1 Parent(s): d1cefdf

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -218,7 +218,7 @@ Average (last 10): 0.74
 *Cumulative reward per episode*
-### Before vs After — Agent Behavior
 **Before training (episode 3):**
 ```
@@ -244,6 +244,7 @@ Turn 7: submit_final → "Budget capped at $50k. Biometric 2FA required.
 ```
 ---
 * 📄 **[View the Raw GRPO Training Metrics](artifacts/grpo_state_based/grpo_metrics.json)**
@@ -317,7 +318,7 @@ python grpo_train.py --output-dir artifacts/grpo_state_based_v2 --model Qwen/Qwe
 ---
-## Architecture
 ```
 expert-negotiation-env/
@@ -340,7 +341,7 @@ expert-negotiation-env/
 ---
-## Why This Matters
 Multi-stakeholder alignment is one of the hardest unsolved problems in enterprise AI deployment. An LLM that can reliably discover hidden constraints, track multiple parties' requirements, and synthesize a balanced output would be immediately useful for:

 *Cumulative reward per episode*
+### 📄 Before vs After — Agent Behavior
 **Before training (episode 3):**
 ```
 ```
 ---
+## 🛠 Training Logs
 * 📄 **[View the Raw GRPO Training Metrics](artifacts/grpo_state_based/grpo_metrics.json)**
 ---
+## ✨ Architecture
 ```
 expert-negotiation-env/
 ---
+## 🔍 Why This Matters
 Multi-stakeholder alignment is one of the hardest unsolved problems in enterprise AI deployment. An LLM that can reliably discover hidden constraints, track multiple parties' requirements, and synthesize a balanced output would be immediately useful for: