Spaces:
Sleeping
Sleeping
Update README.md
Browse files
README.md
CHANGED
|
@@ -218,7 +218,7 @@ Average (last 10): 0.74
|
|
| 218 |
|
| 219 |
*Cumulative reward per episode*
|
| 220 |
|
| 221 |
-
### Before vs After — Agent Behavior
|
| 222 |
|
| 223 |
**Before training (episode 3):**
|
| 224 |
```
|
|
@@ -244,6 +244,7 @@ Turn 7: submit_final → "Budget capped at $50k. Biometric 2FA required.
|
|
| 244 |
```
|
| 245 |
|
| 246 |
---
|
|
|
|
| 247 |
* 📄 **[View the Raw GRPO Training Metrics](artifacts/grpo_state_based/grpo_metrics.json)**
|
| 248 |
|
| 249 |
|
|
@@ -317,7 +318,7 @@ python grpo_train.py --output-dir artifacts/grpo_state_based_v2 --model Qwen/Qwe
|
|
| 317 |
|
| 318 |
---
|
| 319 |
|
| 320 |
-
## Architecture
|
| 321 |
|
| 322 |
```
|
| 323 |
expert-negotiation-env/
|
|
@@ -340,7 +341,7 @@ expert-negotiation-env/
|
|
| 340 |
|
| 341 |
---
|
| 342 |
|
| 343 |
-
## Why This Matters
|
| 344 |
|
| 345 |
Multi-stakeholder alignment is one of the hardest unsolved problems in enterprise AI deployment. An LLM that can reliably discover hidden constraints, track multiple parties' requirements, and synthesize a balanced output would be immediately useful for:
|
| 346 |
|
|
|
|
| 218 |
|
| 219 |
*Cumulative reward per episode*
|
| 220 |
|
| 221 |
+
### 📄 Before vs After — Agent Behavior
|
| 222 |
|
| 223 |
**Before training (episode 3):**
|
| 224 |
```
|
|
|
|
| 244 |
```
|
| 245 |
|
| 246 |
---
|
| 247 |
+
## 🛠 Training Logs
|
| 248 |
* 📄 **[View the Raw GRPO Training Metrics](artifacts/grpo_state_based/grpo_metrics.json)**
|
| 249 |
|
| 250 |
|
|
|
|
| 318 |
|
| 319 |
---
|
| 320 |
|
| 321 |
+
## ✨ Architecture
|
| 322 |
|
| 323 |
```
|
| 324 |
expert-negotiation-env/
|
|
|
|
| 341 |
|
| 342 |
---
|
| 343 |
|
| 344 |
+
## 🔍 Why This Matters
|
| 345 |
|
| 346 |
Multi-stakeholder alignment is one of the hardest unsolved problems in enterprise AI deployment. An LLM that can reliably discover hidden constraints, track multiple parties' requirements, and synthesize a balanced output would be immediately useful for:
|
| 347 |
|