flow / README.md
victordibia's picture
Deploy 2026-01-27 14:43:06
c1ec9a0
|
raw
history blame
2.9 kB
metadata
title: Flow
emoji: 🔄
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
pinned: false

Flow

Evaluate and Optimize Coding Agent Configurations

Flow is a framework for running experiments on LLM coding agents. Compare context engineering strategies (message compaction, agent memory, sub-agents), evaluate results with LLM-as-Judge, and find optimal configurations that balance quality and token cost.

Flow UI

Features

  • Ablation Studies: Test different agent configurations side-by-side
  • LLM-as-Judge Evaluation: Automatically score agent outputs for correctness
  • Pareto Analysis: Find optimal quality vs. cost tradeoffs
  • Web UI: Visual interface for managing experiments and viewing results
  • Config Export: Export winning configurations for production use

Quick Start

1. Install

# Clone and install with uv
git clone https://github.com/victordibia/flow
cd flow
uv sync

2. Configure Azure OpenAI

export AZURE_OPENAI_API_KEY="your-api-key"
export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com/"
export AZURE_OPENAI_DEPLOYMENT="gpt-4o"

3. Run Optimization

# Run with built-in task suite
uv run flow optimize --suite coding

# Or with custom tasks
uv run flow optimize --tasks my_tasks.jsonl

4. Launch Web UI

uv run flow serve
# Opens at http://localhost:8091

CLI Commands

flow optimize [OPTIONS]   # Run optimization experiments
flow serve               # Start the web UI
flow run [TASK]          # Run a single agent task
flow config              # Show current configuration
flow init                # Initialize Flow directories

What Gets Optimized

Flow tests different context engineering strategies:

Strategy Description
Message Compaction Keep first N + last M messages, discard middle
Agent Memory Persistent storage the agent controls
Sub-Agent Isolation Delegate research to isolated sub-agent

Example configurations:

from flow.experiments.ablation import AblationConfig

configs = [
    AblationConfig(name="baseline", enable_message_compaction=False),
    AblationConfig(name="compaction", enable_message_compaction=True, compaction_head_size=10),
    AblationConfig(name="full", enable_message_compaction=True, enable_memory_tool=True),
]

Task Format

Tasks are defined in JSONL format:

{"name": "fizzbuzz", "prompt": "Create fizzbuzz.py and run it", "criteria": [{"name": "correct", "instruction": "Output shows FizzBuzz pattern"}]}

Development

# Install dev dependencies
uv sync --dev

# Run tests
uv run pytest tests/ -v

# Type checking
uv run pyright src/

# Linting
uv run ruff check src/
uv run ruff format src/

License

MIT License - see LICENSE for details.