LMIS-ORG
/

AgentFlow_Slime_Agentic_Qwen2.5_7B

Model card Files Files and versions

Lil2J commited on Mar 10

Commit

abf58ec

·

verified ·

1 Parent(s): f74f78c

Update README.md

Files changed (1) hide show

README.md +5 -1

README.md CHANGED Viewed

@@ -1,3 +1,7 @@
 Reproduces the core idea of [AgentFlow](https://arxiv.org/abs/2510.05592): extending single-step LLM inference into a multi-turn **Planner → Executor → Verifier** agent loop, applying RL signals (GRPO) to the Planner's generation trajectory. This allows the model to improve its tool-use and reasoning capabilities without requiring manually annotated intermediate steps.
 #### Architecture
@@ -38,4 +42,4 @@ Rewarder.compute_reward()         ← LLM-as-Judge: compare model answer with gr
 |---|---|---|---|---|
 | Qwen2.5-7B-Instruct | AIME 2024 | 10.0% | 26.7% | +16.7% |
-> **Note:** Due to limited training resources, the AgentFlow model was only trained for 100 steps.

+---
+base_model:
+- Qwen/Qwen2.5-7B-Instruct
+---
 Reproduces the core idea of [AgentFlow](https://arxiv.org/abs/2510.05592): extending single-step LLM inference into a multi-turn **Planner → Executor → Verifier** agent loop, applying RL signals (GRPO) to the Planner's generation trajectory. This allows the model to improve its tool-use and reasoning capabilities without requiring manually annotated intermediate steps.
 #### Architecture
 |---|---|---|---|---|
 | Qwen2.5-7B-Instruct | AIME 2024 | 10.0% | 26.7% | +16.7% |
+> **Note:** Due to limited training resources, the AgentFlow model was only trained for 100 steps.