Addyk24 commited on
Commit
6686755
·
verified ·
1 Parent(s): d1cefdf

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -3
README.md CHANGED
@@ -218,7 +218,7 @@ Average (last 10): 0.74
218
 
219
  *Cumulative reward per episode*
220
 
221
- ### Before vs After — Agent Behavior
222
 
223
  **Before training (episode 3):**
224
  ```
@@ -244,6 +244,7 @@ Turn 7: submit_final → "Budget capped at $50k. Biometric 2FA required.
244
  ```
245
 
246
  ---
 
247
  * 📄 **[View the Raw GRPO Training Metrics](artifacts/grpo_state_based/grpo_metrics.json)**
248
 
249
 
@@ -317,7 +318,7 @@ python grpo_train.py --output-dir artifacts/grpo_state_based_v2 --model Qwen/Qwe
317
 
318
  ---
319
 
320
- ## Architecture
321
 
322
  ```
323
  expert-negotiation-env/
@@ -340,7 +341,7 @@ expert-negotiation-env/
340
 
341
  ---
342
 
343
- ## Why This Matters
344
 
345
  Multi-stakeholder alignment is one of the hardest unsolved problems in enterprise AI deployment. An LLM that can reliably discover hidden constraints, track multiple parties' requirements, and synthesize a balanced output would be immediately useful for:
346
 
 
218
 
219
  *Cumulative reward per episode*
220
 
221
+ ### 📄 Before vs After — Agent Behavior
222
 
223
  **Before training (episode 3):**
224
  ```
 
244
  ```
245
 
246
  ---
247
+ ## 🛠 Training Logs
248
  * 📄 **[View the Raw GRPO Training Metrics](artifacts/grpo_state_based/grpo_metrics.json)**
249
 
250
 
 
318
 
319
  ---
320
 
321
+ ## Architecture
322
 
323
  ```
324
  expert-negotiation-env/
 
341
 
342
  ---
343
 
344
+ ## 🔍 Why This Matters
345
 
346
  Multi-stakeholder alignment is one of the hardest unsolved problems in enterprise AI deployment. An LLM that can reliably discover hidden constraints, track multiple parties' requirements, and synthesize a balanced output would be immediately useful for:
347