Update README.md
Browse files
README.md
CHANGED
|
@@ -39,7 +39,7 @@ We provide a rich, structured QA dataset derived from Wikipedia knowledge graphs
|
|
| 39 |
In the supervised fine-tuning (SFT) stage, a **turn-level judge** identifies which interaction turns should be used for training.
|
| 40 |
In the reinforcement learning (RL) stage, a **turn-level reward** provides feedback to refine the model's multi-turn search and tool invocation behavior.
|
| 41 |
|
| 42 |
-
*Training data, full report, and evaluation details will be released soon.*
|
| 43 |
|
| 44 |
# Benchmark Results
|
| 45 |
|
|
|
|
| 39 |
In the supervised fine-tuning (SFT) stage, a **turn-level judge** identifies which interaction turns should be used for training.
|
| 40 |
In the reinforcement learning (RL) stage, a **turn-level reward** provides feedback to refine the model's multi-turn search and tool invocation behavior.
|
| 41 |
|
| 42 |
+
> **Note:** *Training data, full report, and evaluation details will be released soon.*
|
| 43 |
|
| 44 |
# Benchmark Results
|
| 45 |
|