Lil2J commited on
Commit
abf58ec
·
verified ·
1 Parent(s): f74f78c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -1
README.md CHANGED
@@ -1,3 +1,7 @@
 
 
 
 
1
  Reproduces the core idea of [AgentFlow](https://arxiv.org/abs/2510.05592): extending single-step LLM inference into a multi-turn **Planner → Executor → Verifier** agent loop, applying RL signals (GRPO) to the Planner's generation trajectory. This allows the model to improve its tool-use and reasoning capabilities without requiring manually annotated intermediate steps.
2
 
3
  #### Architecture
@@ -38,4 +42,4 @@ Rewarder.compute_reward() ← LLM-as-Judge: compare model answer with gr
38
  |---|---|---|---|---|
39
  | Qwen2.5-7B-Instruct | AIME 2024 | 10.0% | 26.7% | +16.7% |
40
 
41
- > **Note:** Due to limited training resources, the AgentFlow model was only trained for 100 steps.
 
1
+ ---
2
+ base_model:
3
+ - Qwen/Qwen2.5-7B-Instruct
4
+ ---
5
  Reproduces the core idea of [AgentFlow](https://arxiv.org/abs/2510.05592): extending single-step LLM inference into a multi-turn **Planner → Executor → Verifier** agent loop, applying RL signals (GRPO) to the Planner's generation trajectory. This allows the model to improve its tool-use and reasoning capabilities without requiring manually annotated intermediate steps.
6
 
7
  #### Architecture
 
42
  |---|---|---|---|---|
43
  | Qwen2.5-7B-Instruct | AIME 2024 | 10.0% | 26.7% | +16.7% |
44
 
45
+ > **Note:** Due to limited training resources, the AgentFlow model was only trained for 100 steps.