FlowSteer: Interactive Agentic Workflow Orchestration via End-to-End Reinforcement Learning

FlowSteer: End-to-End RL Framework for Automated Workflow Orchestration

Overview

FlowSteer addresses critical challenges in agentic workflow orchestration—high manual cost, reliance on specific operators/LLMs, and sparse reward signals. FlowSteer is an end-to-end reinforcement learning (RL) framework that takes a lightweight policy model as the agent and an executable canvas environment, automating workflow orchestration through multi-turn interaction. In this process, the policy model analyzes execution states and selects editing actions, while the canvas executes operators and returns feedback for iterative refinement.

By integrating Canvas Workflow Relative Policy Optimization (CWRPO) with diversity-constrained rewards, FlowSteer offers a plug-and-play framework that supports diverse operator libraries and interchangeable LLM backends.

Key Features

End-to-End RL Training: Learns workflow orchestration through real execution feedback
Plug-and-Play Design: Supports diverse operator libraries and interchangeable LLM backends
CWRPO Algorithm: Novel training algorithm with diversity-constrained rewards and conditional release
Multi-Turn Interaction: Iteratively builds and refines workflows through canvas environment

Results

Model Details

Base Model: Qwen/Qwen3-8B
Training Method: Canvas Workflow Relative Policy Optimization (CWRPO)
LoRA Rank: 64
Training Steps: 300

Quick Start

1. Install Environment

conda create -n flowsteer python=3.10 -y
conda activate flowsteer
pip install -r requirements.txt
pip install vllm>=0.6.0

2. Download Base Model

# Using huggingface
huggingface-cli download Qwen/Qwen3-8B

# Or using modelscope
pip install modelscope
python -c "from modelscope import snapshot_download; snapshot_download('Qwen/Qwen3-8B')"

3. Start vLLM Server

CUDA_VISIBLE_DEVICES=0 python -m vllm.entrypoints.openai.api_server \
    --model /path/to/Qwen3-8B \
    --served-model-name Qwen3-8B \
    --port 8003 \
    --gpu-memory-utilization 0.85 \
    --max-model-len 16384 \
    --enable-lora \
    --max-loras 2 \
    --max-lora-rank 64 \
    --trust-remote-code \
    --dtype bfloat16

4. Start Training

CUDA_VISIBLE_DEVICES=2 python train_interactive.py \
    --config config/training_interactive.yaml

5. Evaluation

python eval_only.py --config config/training_interactive.yaml \
    --checkpoint checkpoints/interactive/checkpoint_step_100

For more details, please refer to our GitHub repository.

License

This project is for research purposes.

Downloads last month: 8

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for beita6969/FlowSteer-8b

Base model

Qwen/Qwen3-8B-Base

Finetuned

Qwen/Qwen3-8B

Finetuned

(1848)

this model

Quantizations

1 model