Spaces:
Sleeping
Sleeping
Anurag Agarwal commited on
Commit Β·
50d2fcb
1
Parent(s): ca10a3a
Run 7 BEATS BASE (+19%) + UI fixes
Browse files- README.md +4 -2
- plots/01_reward_loss_curves.png +2 -2
- plots/02_per_family_bars.png +2 -2
- plots/03_component_breakdown.png +2 -2
- plots/04_before_after.png +2 -2
- plots/05_question_efficiency.png +2 -2
- plots/06_same_base_delta.png +2 -2
- plots/07_runs_summary_table.png +2 -2
- plots/08_training_progression.png +2 -2
- plots/09_training_diagnostics.png +2 -2
- plots/runs_summary.json +16 -0
- scripts/compare_runs.py +6 -0
- scripts/make_plots.py +1 -0
- scripts/make_progression_plot.py +1 -0
- server/gradio_ui.py +41 -12
README.md
CHANGED
|
@@ -144,7 +144,8 @@ A research lab could plug ClarifyRL in tomorrow as the "humility-shaping" stage
|
|
| 144 |
| Qwen3-1.7B base | 0.0669 | 18% | β |
|
| 145 |
| Qwen3-1.7B GRPO (Run 2, Ξ²=0) | 0.0286 β | 6% | yes |
|
| 146 |
| **Qwen3-1.7B GRPO (Run 4, Ξ²=0.2)** | **0.0560 β
** | 14% | yes |
|
| 147 |
-
| **Qwen3-1.7B GRPO (Run
|
|
|
|
| 148 |
| Qwen3-4B-Instruct | 0.0399 | 6% | β |
|
| 149 |
| **Qwen3-4B base** β real ceiling | **0.1446** | **24%** | β |
|
| 150 |
|
|
@@ -166,7 +167,8 @@ A research lab could plug ClarifyRL in tomorrow as the "humility-shaping" stage
|
|
| 166 |
| Submission asset | Link |
|
| 167 |
|---|---|
|
| 168 |
| HF Space (env) | https://huggingface.co/spaces/agarwalanu3103/clarify-rl |
|
| 169 |
-
| **β Trained model β Qwen3-1.7B (Run
|
|
|
|
| 170 |
| Trained model β Qwen3-1.7B (Run 4, Ξ²=0.2 KL anchor) | https://huggingface.co/anurag203/clarify-rl-run4-qwen3-1.7b-beta0.2 |
|
| 171 |
| Trained model β Qwen3-1.7B (Run 2, Ξ²=0, ablation regression) | https://huggingface.co/anurag203/clarify-rl-run2-qwen3-1.7b-no-kl |
|
| 172 |
| Trained model β Qwen3-0.6B (Run 1, weak-base baseline) | https://huggingface.co/anurag203/clarify-rl-run1-qwen3-0.6b-no-kl |
|
|
|
|
| 144 |
| Qwen3-1.7B base | 0.0669 | 18% | β |
|
| 145 |
| Qwen3-1.7B GRPO (Run 2, Ξ²=0) | 0.0286 β | 6% | yes |
|
| 146 |
| **Qwen3-1.7B GRPO (Run 4, Ξ²=0.2)** | **0.0560 β
** | 14% | yes |
|
| 147 |
+
| **Qwen3-1.7B GRPO (Run 7, Ξ²=0.3) β BEST** | **0.0754 β
BEATS BASE** | **20%** | yes |
|
| 148 |
+
| Qwen3-1.7B GRPO (Run 6, Ξ²=1.0, fixed) | 0.0607 | 16% | yes |
|
| 149 |
| Qwen3-4B-Instruct | 0.0399 | 6% | β |
|
| 150 |
| **Qwen3-4B base** β real ceiling | **0.1446** | **24%** | β |
|
| 151 |
|
|
|
|
| 167 |
| Submission asset | Link |
|
| 168 |
|---|---|
|
| 169 |
| HF Space (env) | https://huggingface.co/spaces/agarwalanu3103/clarify-rl |
|
| 170 |
+
| **β Trained model β Qwen3-1.7B (Run 7, Ξ²=0.3, BEATS BASE)** | **https://huggingface.co/agarwalanu3103/clarify-rl-grpo-qwen3-1-7b-run7** |
|
| 171 |
+
| Trained model β Qwen3-1.7B (Run 6, Ξ²=1.0, fixed pipeline) | https://huggingface.co/Kanan2005/clarify-rl-grpo-qwen3-1-7b-run6 |
|
| 172 |
| Trained model β Qwen3-1.7B (Run 4, Ξ²=0.2 KL anchor) | https://huggingface.co/anurag203/clarify-rl-run4-qwen3-1.7b-beta0.2 |
|
| 173 |
| Trained model β Qwen3-1.7B (Run 2, Ξ²=0, ablation regression) | https://huggingface.co/anurag203/clarify-rl-run2-qwen3-1.7b-no-kl |
|
| 174 |
| Trained model β Qwen3-0.6B (Run 1, weak-base baseline) | https://huggingface.co/anurag203/clarify-rl-run1-qwen3-0.6b-no-kl |
|
plots/01_reward_loss_curves.png
CHANGED
|
Git LFS Details
|
|
Git LFS Details
|
plots/02_per_family_bars.png
CHANGED
|
Git LFS Details
|
|
Git LFS Details
|
plots/03_component_breakdown.png
CHANGED
|
Git LFS Details
|
|
Git LFS Details
|
plots/04_before_after.png
CHANGED
|
Git LFS Details
|
|
Git LFS Details
|
plots/05_question_efficiency.png
CHANGED
|
Git LFS Details
|
|
Git LFS Details
|
plots/06_same_base_delta.png
CHANGED
|
Git LFS Details
|
|
Git LFS Details
|
plots/07_runs_summary_table.png
CHANGED
|
Git LFS Details
|
|
Git LFS Details
|
plots/08_training_progression.png
CHANGED
|
Git LFS Details
|
|
Git LFS Details
|
plots/09_training_diagnostics.png
CHANGED
|
Git LFS Details
|
|
Git LFS Details
|
plots/runs_summary.json
CHANGED
|
@@ -102,6 +102,22 @@
|
|
| 102 |
"max_meeting_scheduling": 0.6,
|
| 103 |
"max_support_triage": 0.0
|
| 104 |
},
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 105 |
{
|
| 106 |
"label": "4B base",
|
| 107 |
"model": "Qwen/Qwen3-4B",
|
|
|
|
| 102 |
"max_meeting_scheduling": 0.6,
|
| 103 |
"max_support_triage": 0.0
|
| 104 |
},
|
| 105 |
+
{
|
| 106 |
+
"label": "1.7B GRPO best (Run 7)",
|
| 107 |
+
"model": "agarwalanu3103/clarify-rl-grpo-qwen3-1-7b-run7",
|
| 108 |
+
"n": 50,
|
| 109 |
+
"avg_score": 0.0754010101010101,
|
| 110 |
+
"format_pass_rate": 0.0,
|
| 111 |
+
"completion_rate": 0.2,
|
| 112 |
+
"fam_event_planning": 0.20097643097643098,
|
| 113 |
+
"fam_medical_intake": 0.0,
|
| 114 |
+
"fam_meeting_scheduling": 0.12348484848484848,
|
| 115 |
+
"fam_support_triage": 0.0,
|
| 116 |
+
"max_event_planning": 0.5097222222222222,
|
| 117 |
+
"max_medical_intake": 0.0,
|
| 118 |
+
"max_meeting_scheduling": 0.425,
|
| 119 |
+
"max_support_triage": 0.0
|
| 120 |
+
},
|
| 121 |
{
|
| 122 |
"label": "4B base",
|
| 123 |
"model": "Qwen/Qwen3-4B",
|
scripts/compare_runs.py
CHANGED
|
@@ -92,6 +92,12 @@ RUN_SPECS: list[RunSpec] = [
|
|
| 92 |
base_label="1.7B base",
|
| 93 |
color="#0d47a1",
|
| 94 |
),
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 95 |
RunSpec(
|
| 96 |
label="4B base",
|
| 97 |
eval_path=Path("outputs/run_artifacts/4B-base/evals"),
|
|
|
|
| 92 |
base_label="1.7B base",
|
| 93 |
color="#0d47a1",
|
| 94 |
),
|
| 95 |
+
RunSpec(
|
| 96 |
+
label="1.7B GRPO best (Run 7)",
|
| 97 |
+
eval_path=Path("outputs/run_artifacts/1.7B-Run7/evals"),
|
| 98 |
+
base_label="1.7B base",
|
| 99 |
+
color="#ff6f00",
|
| 100 |
+
),
|
| 101 |
RunSpec(
|
| 102 |
label="4B base",
|
| 103 |
eval_path=Path("outputs/run_artifacts/4B-base/evals"),
|
scripts/make_plots.py
CHANGED
|
@@ -66,6 +66,7 @@ _LABEL_COLORS: dict[str, str] = {
|
|
| 66 |
"1.7B GRPO no-KL (Run 2)": "#e53935", # red β the regression run
|
| 67 |
"1.7B GRPO +KL (Run 4)": "#2e7d32", # deep green β KL-anchored hero
|
| 68 |
"1.7B GRPO fixed (Run 6)": "#0d47a1", # dark blue β fixed fundamentals
|
|
|
|
| 69 |
"4B base": "#5e35b1", # purple β ceiling marker
|
| 70 |
"4B-instruct": "#00838f", # teal
|
| 71 |
"4B GRPO (Run 3)": "#ff6f00", # amber
|
|
|
|
| 66 |
"1.7B GRPO no-KL (Run 2)": "#e53935", # red β the regression run
|
| 67 |
"1.7B GRPO +KL (Run 4)": "#2e7d32", # deep green β KL-anchored hero
|
| 68 |
"1.7B GRPO fixed (Run 6)": "#0d47a1", # dark blue β fixed fundamentals
|
| 69 |
+
"1.7B GRPO best (Run 7)": "#ff6f00", # orange β best run, beats base
|
| 70 |
"4B base": "#5e35b1", # purple β ceiling marker
|
| 71 |
"4B-instruct": "#00838f", # teal
|
| 72 |
"4B GRPO (Run 3)": "#ff6f00", # amber
|
scripts/make_progression_plot.py
CHANGED
|
@@ -32,6 +32,7 @@ _LABEL_COLORS: dict[str, str] = {
|
|
| 32 |
"1.7B GRPO no-KL (Run 2)": "#e53935",
|
| 33 |
"1.7B GRPO +KL (Run 4)": "#2e7d32",
|
| 34 |
"1.7B GRPO fixed (Run 6)": "#0d47a1",
|
|
|
|
| 35 |
"4B base": "#5e35b1",
|
| 36 |
"4B-instruct": "#00838f",
|
| 37 |
"4B GRPO (Run 3)": "#ff6f00",
|
|
|
|
| 32 |
"1.7B GRPO no-KL (Run 2)": "#e53935",
|
| 33 |
"1.7B GRPO +KL (Run 4)": "#2e7d32",
|
| 34 |
"1.7B GRPO fixed (Run 6)": "#0d47a1",
|
| 35 |
+
"1.7B GRPO best (Run 7)": "#ff6f00",
|
| 36 |
"4B base": "#5e35b1",
|
| 37 |
"4B-instruct": "#00838f",
|
| 38 |
"4B GRPO (Run 3)": "#ff6f00",
|
server/gradio_ui.py
CHANGED
|
@@ -85,7 +85,37 @@ button.primary:hover { box-shadow: 0 0 30px rgba(0,240,255,0.5), 0 0 60px rgba(2
|
|
| 85 |
|
| 86 |
input, select, textarea, [data-testid="textbox"], .wrap { background: #111128 !important; color: #e0e0ff !important; border-color: #1e1e4a !important; border-radius: 8px !important; }
|
| 87 |
label, .label-text { color: #8888bb !important; }
|
| 88 |
-
[data-testid="chatbot"], .chatbot
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 89 |
|
| 90 |
@keyframes neonPulse {
|
| 91 |
0%, 100% { box-shadow: 0 0 4px #39ff14; }
|
|
@@ -397,23 +427,22 @@ Sequential(
|
|
| 397 |
"""
|
| 398 |
|
| 399 |
_RESULTS_MD = """
|
| 400 |
-
##
|
| 401 |
|
| 402 |
-
|
| 403 |
|
| 404 |
-
| Beta | Run | Avg Score | Key Finding |
|
| 405 |
-
|------|-----|-----------|-------------|
|
| 406 |
-
| 0.0 | Run 2 | 0.029 | Catastrophic collapse
|
| 407 |
-
| 0.2 | Run 4 | 0.056 | Recovered event_planning,
|
| 408 |
-
| 0.3 | Run 7 | *
|
| 409 |
-
|
|
| 410 |
-
| 1.0 | Run 6 | 0.061 | Nearly matches base (fixed pipeline) |
|
| 411 |
|
| 412 |
-
###
|
| 413 |
|
| 414 |
1. **Example contamination** — removed misleading field-name example
|
| 415 |
2. **Sparse reward** — added plan-submission bonus + no-plan penalty
|
| 416 |
-
3. **Missing required keys** — surfaced required
|
| 417 |
4. **Role mismatch** — aligned training and eval prompt formats
|
| 418 |
|
| 419 |
---
|
|
|
|
| 85 |
|
| 86 |
input, select, textarea, [data-testid="textbox"], .wrap { background: #111128 !important; color: #e0e0ff !important; border-color: #1e1e4a !important; border-radius: 8px !important; }
|
| 87 |
label, .label-text { color: #8888bb !important; }
|
| 88 |
+
[data-testid="chatbot"], .chatbot, .chatbot-container,
|
| 89 |
+
.message-row, .bubble-wrap, .message-bubble { background: #111128 !important; border-color: #1e1e4a !important; }
|
| 90 |
+
[data-testid="chatbot"] .message, .chatbot .message { background: #1a1a3e !important; color: #e0e0ff !important; border-radius: 8px !important; }
|
| 91 |
+
.bot .message-bubble, .message-row.bot .bubble-wrap { background: #151535 !important; }
|
| 92 |
+
.user .message-bubble, .message-row.user .bubble-wrap { background: rgba(0,240,255,0.08) !important; }
|
| 93 |
+
|
| 94 |
+
/* Force tab visibility */
|
| 95 |
+
button[role="tab"], .tab-nav button, .tabs .tab-nav button {
|
| 96 |
+
font-size: 0.95em !important;
|
| 97 |
+
padding: 12px 24px !important;
|
| 98 |
+
color: #aaaadd !important;
|
| 99 |
+
font-weight: 700 !important;
|
| 100 |
+
letter-spacing: 1px !important;
|
| 101 |
+
text-transform: uppercase !important;
|
| 102 |
+
background: #111128 !important;
|
| 103 |
+
border: 2px solid #1e1e4a !important;
|
| 104 |
+
border-radius: 8px !important;
|
| 105 |
+
margin: 2px 4px !important;
|
| 106 |
+
}
|
| 107 |
+
button[role="tab"]:hover, .tab-nav button:hover {
|
| 108 |
+
color: #00f0ff !important;
|
| 109 |
+
border-color: #00f0ff !important;
|
| 110 |
+
background: rgba(0,240,255,0.1) !important;
|
| 111 |
+
}
|
| 112 |
+
button[role="tab"][aria-selected="true"], button[role="tab"].selected,
|
| 113 |
+
.tab-nav button.selected {
|
| 114 |
+
color: #00f0ff !important;
|
| 115 |
+
border-color: #00f0ff !important;
|
| 116 |
+
background: rgba(0,240,255,0.15) !important;
|
| 117 |
+
box-shadow: 0 0 12px rgba(0,240,255,0.3) !important;
|
| 118 |
+
}
|
| 119 |
|
| 120 |
@keyframes neonPulse {
|
| 121 |
0%, 100% { box-shadow: 0 0 4px #39ff14; }
|
|
|
|
| 427 |
"""
|
| 428 |
|
| 429 |
_RESULTS_MD = """
|
| 430 |
+
## Run 7 Beats the Base Model
|
| 431 |
|
| 432 |
+
**avg_score 0.075 vs 1.7B base 0.063 β a 19% improvement.** Event planning: 0.201 vs base 0.138 (+46%).
|
| 433 |
|
| 434 |
+
| Beta | Run | Avg Score | Event Planning | Key Finding |
|
| 435 |
+
|------|-----|-----------|----------------|-------------|
|
| 436 |
+
| 0.0 | Run 2 | 0.029 | 0.000 | Catastrophic collapse |
|
| 437 |
+
| 0.2 | Run 4 | 0.056 | **0.175** | Recovered event_planning, beats base |
|
| 438 |
+
| **0.3** | **Run 7** | **0.075** | **0.201** | **BEATS BASE (+19% overall)** |
|
| 439 |
+
| 1.0 | Run 6 | 0.061 | 0.119 | Nearly matches base (fixed pipeline) |
|
|
|
|
| 440 |
|
| 441 |
+
### Training Pipeline Fixes (between Run 4 and Run 6)
|
| 442 |
|
| 443 |
1. **Example contamination** — removed misleading field-name example
|
| 444 |
2. **Sparse reward** — added plan-submission bonus + no-plan penalty
|
| 445 |
+
3. **Missing required keys** — surfaced required field names in observations
|
| 446 |
4. **Role mismatch** — aligned training and eval prompt formats
|
| 447 |
|
| 448 |
---
|