Reproduces the core idea of AgentFlow: extending single-step LLM inference into a multi-turn Planner β†’ Executor β†’ Verifier agent loop, applying RL signals (GRPO) to the Planner's generation trajectory. This allows the model to improve its tool-use and reasoning capabilities without requiring manually annotated intermediate steps.

Architecture

Input question
  β”‚
  β–Ό
Planner.plan()              ← Analyze the problem and devise a solution strategy (loss_mask=1)
  β”‚
  └─► for step in range(max_steps):
        β”‚
        β”œβ”€ Planner.generate_next_step()             ← Select next tool and sub-goal (loss_mask=1)
        β”œβ”€ Executor.generate_tool_command()
        β”‚  + execute_command()                       ← Invoke tool (excluded from sequence)
        β”œβ”€ Verifier.verificate_context()             ← Decide whether to continue (excluded)
        └─ Memory.add_action()                       ← Record execution result
  β”‚
  β–Ό
Planner.generate_final_output()   ← Summarize results and produce final answer (loss_mask=0)
  β”‚
  β–Ό
Rewarder.compute_reward()         ← LLM-as-Judge: compare model answer with ground truth

Tools (tools/)

Tool Description
base_generator General-purpose text generation tool; answers sub-tasks directly via LLM
python_coder Python code generation and execution tool for math computation and algorithmic problem solving

Results

Model Dataset Baseline AgentFlow (Ours) Improvement
Qwen2.5-7B-Instruct AIME 2024 10.0% 26.7% +16.7%

Note: Due to limited training resources, the AgentFlow model was only trained for 100 steps.

Downloads last month
148
Safetensors
Model size
8B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for LMIS-ORG/AgentFlow_Slime_Agentic_Qwen2.5_7B

Base model

Qwen/Qwen2.5-7B
Finetuned
(2852)
this model
Finetunes
1 model
Quantizations
2 models

Paper for LMIS-ORG/AgentFlow_Slime_Agentic_Qwen2.5_7B