FlowSteer: Interactive Agentic Workflow Orchestration via End-to-End Reinforcement Learning

Homepage Demo Paper GitHub

FlowSteer: End-to-End RL Framework for Automated Workflow Orchestration


Overview

FlowSteer addresses critical challenges in agentic workflow orchestration—high manual cost, reliance on specific operators/LLMs, and sparse reward signals. FlowSteer is an end-to-end reinforcement learning (RL) framework that takes a lightweight policy model as the agent and an executable canvas environment, automating workflow orchestration through multi-turn interaction. In this process, the policy model analyzes execution states and selects editing actions, while the canvas executes operators and returns feedback for iterative refinement.

By integrating Canvas Workflow Relative Policy Optimization (CWRPO) with diversity-constrained rewards, FlowSteer offers a plug-and-play framework that supports diverse operator libraries and interchangeable LLM backends.

Key Features

  • End-to-End RL Training: Learns workflow orchestration through real execution feedback
  • Plug-and-Play Design: Supports diverse operator libraries and interchangeable LLM backends
  • CWRPO Algorithm: Novel training algorithm with diversity-constrained rewards and conditional release
  • Multi-Turn Interaction: Iteratively builds and refines workflows through canvas environment

Results

Model Details

  • Base Model: Qwen/Qwen3-8B
  • Training Method: Canvas Workflow Relative Policy Optimization (CWRPO)
  • LoRA Rank: 64
  • Training Steps: 300

Quick Start

1. Install Environment

conda create -n flowsteer python=3.10 -y
conda activate flowsteer
pip install -r requirements.txt
pip install vllm>=0.6.0

2. Download Base Model

# Using huggingface
huggingface-cli download Qwen/Qwen3-8B

# Or using modelscope
pip install modelscope
python -c "from modelscope import snapshot_download; snapshot_download('Qwen/Qwen3-8B')"

3. Start vLLM Server

CUDA_VISIBLE_DEVICES=0 python -m vllm.entrypoints.openai.api_server \
    --model /path/to/Qwen3-8B \
    --served-model-name Qwen3-8B \
    --port 8003 \
    --gpu-memory-utilization 0.85 \
    --max-model-len 16384 \
    --enable-lora \
    --max-loras 2 \
    --max-lora-rank 64 \
    --trust-remote-code \
    --dtype bfloat16

4. Start Training

CUDA_VISIBLE_DEVICES=2 python train_interactive.py \
    --config config/training_interactive.yaml

5. Evaluation

python eval_only.py --config config/training_interactive.yaml \
    --checkpoint checkpoints/interactive/checkpoint_step_100

For more details, please refer to our GitHub repository.

License

This project is for research purposes.

Downloads last month
78
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for beita6969/FlowSteer-8b

Base model

Qwen/Qwen3-8B-Base
Finetuned
Qwen/Qwen3-8B
Finetuned
(907)
this model
Quantizations
1 model