Spaces:

victordibia
/

flow

Sleeping

App Files Files Community

flow / README.md

victordibia

Deploy 2026-01-27 14:43:06

c1ec9a0 2 months ago

2.9 kB

title: Flow
emoji: 🔄
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
pinned: false

Flow

Evaluate and Optimize Coding Agent Configurations

Flow is a framework for running experiments on LLM coding agents. Compare context engineering strategies (message compaction, agent memory, sub-agents), evaluate results with LLM-as-Judge, and find optimal configurations that balance quality and token cost.

Features

Ablation Studies: Test different agent configurations side-by-side
LLM-as-Judge Evaluation: Automatically score agent outputs for correctness
Pareto Analysis: Find optimal quality vs. cost tradeoffs
Web UI: Visual interface for managing experiments and viewing results
Config Export: Export winning configurations for production use

Quick Start

1. Install

# Clone and install with uv
git clone https://github.com/victordibia/flow
cd flow
uv sync

2. Configure Azure OpenAI

export AZURE_OPENAI_API_KEY="your-api-key"
export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com/"
export AZURE_OPENAI_DEPLOYMENT="gpt-4o"

3. Run Optimization

# Run with built-in task suite
uv run flow optimize --suite coding

# Or with custom tasks
uv run flow optimize --tasks my_tasks.jsonl

4. Launch Web UI

uv run flow serve
# Opens at http://localhost:8091

CLI Commands

flow optimize [OPTIONS]   # Run optimization experiments
flow serve               # Start the web UI
flow run [TASK]          # Run a single agent task
flow config              # Show current configuration
flow init                # Initialize Flow directories

What Gets Optimized

Flow tests different context engineering strategies:

Strategy	Description
Message Compaction	Keep first N + last M messages, discard middle
Agent Memory	Persistent storage the agent controls
Sub-Agent Isolation	Delegate research to isolated sub-agent

Example configurations:

from flow.experiments.ablation import AblationConfig

configs = [
    AblationConfig(name="baseline", enable_message_compaction=False),
    AblationConfig(name="compaction", enable_message_compaction=True, compaction_head_size=10),
    AblationConfig(name="full", enable_message_compaction=True, enable_memory_tool=True),
]

Task Format

Tasks are defined in JSONL format:

{"name": "fizzbuzz", "prompt": "Create fizzbuzz.py and run it", "criteria": [{"name": "correct", "instruction": "Output shows FizzBuzz pattern"}]}

Development

# Install dev dependencies
uv sync --dev

# Run tests
uv run pytest tests/ -v

# Type checking
uv run pyright src/

# Linting
uv run ruff check src/
uv run ruff format src/

License

MIT License - see LICENSE for details.