Upload _paper_results/reasoning_rl_multiseed_summary.md with huggingface_hub
Browse files
_paper_results/reasoning_rl_multiseed_summary.md
CHANGED
|
@@ -5,7 +5,7 @@ Headline TF-grounding rate (TFG) across seeds for T1/T2/T3 reasoning grounded-RL
|
|
| 5 |
| Task | n seeds | TFG mean ± std | TFG min, max | per-seed values |
|
| 6 |
|------|--------:|----------------|--------------|------------------|
|
| 7 |
| T1 | 3 | 0.4115 ± 0.0233 | [0.3964, 0.4384] | s=2: 0.3964, s=3: 0.3996, s=42: 0.4384 |
|
| 8 |
-
| T2 |
|
| 9 |
| T3 | 3 | 0.2247 ± 0.0257 | [0.1951, 0.2397] | s=2: 0.2397, s=3: 0.1951, s=42: 0.2393 |
|
| 10 |
|
| 11 |
## T1 per-seed details
|
|
@@ -20,6 +20,8 @@ Headline TF-grounding rate (TFG) across seeds for T1/T2/T3 reasoning grounded-RL
|
|
| 20 |
|
| 21 |
| seed | TFG | n_cited | n_grounded | n_halluc | reasoning_tags_rate |
|
| 22 |
|-----:|----:|--------:|-----------:|---------:|---------------------:|
|
|
|
|
|
|
|
| 23 |
| 42 | 0.3650 | 20.58 | 9.84 | 10.08 | 0.4400 |
|
| 24 |
|
| 25 |
## T3 per-seed details
|
|
|
|
| 5 |
| Task | n seeds | TFG mean ± std | TFG min, max | per-seed values |
|
| 6 |
|------|--------:|----------------|--------------|------------------|
|
| 7 |
| T1 | 3 | 0.4115 ± 0.0233 | [0.3964, 0.4384] | s=2: 0.3964, s=3: 0.3996, s=42: 0.4384 |
|
| 8 |
+
| T2 | 3 | 0.3235 ± 0.0510 | [0.2666, 0.3650] | s=2: 0.3390, s=3: 0.2666, s=42: 0.3650 |
|
| 9 |
| T3 | 3 | 0.2247 ± 0.0257 | [0.1951, 0.2397] | s=2: 0.2397, s=3: 0.1951, s=42: 0.2393 |
|
| 10 |
|
| 11 |
## T1 per-seed details
|
|
|
|
| 20 |
|
| 21 |
| seed | TFG | n_cited | n_grounded | n_halluc | reasoning_tags_rate |
|
| 22 |
|-----:|----:|--------:|-----------:|---------:|---------------------:|
|
| 23 |
+
| 2 | 0.3390 | 16.22 | 5.84 | 10.22 | 0.8200 |
|
| 24 |
+
| 3 | 0.2666 | 15.20 | 4.70 | 10.16 | 0.7400 |
|
| 25 |
| 42 | 0.3650 | 20.58 | 9.84 | 10.08 | 0.4400 |
|
| 26 |
|
| 27 |
## T3 per-seed details
|