---
title: Flow
emoji: 🔄
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
pinned: false
---

# Flow

**Evaluate and Optimize Coding Agent Configurations**

Flow is a framework for running experiments on LLM coding agents. Compare context engineering strategies (message compaction, agent memory, sub-agents), evaluate results with LLM-as-Judge, and find optimal configurations that balance quality and token cost.

![Flow UI](docs/flow.png)

## Features

- **Ablation Studies**: Test different agent configurations side-by-side
- **LLM-as-Judge Evaluation**: Automatically score agent outputs for correctness
- **Pareto Analysis**: Find optimal quality vs. cost tradeoffs
- **Web UI**: Visual interface for managing experiments and viewing results
- **Config Export**: Export winning configurations for production use

## Quick Start

### 1. Install

```bash
# Clone and install with uv
git clone https://github.com/victordibia/flow
cd flow
uv sync
```

### 2. Configure Azure OpenAI

```bash
export AZURE_OPENAI_API_KEY="your-api-key"
export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com/"
export AZURE_OPENAI_DEPLOYMENT="gpt-4o"
```

### 3. Run Optimization

```bash
# Run with built-in task suite
uv run flow optimize --suite coding

# Or with custom tasks
uv run flow optimize --tasks my_tasks.jsonl
```

### 4. Launch Web UI

```bash
uv run flow serve
# Opens at http://localhost:8091
```

## CLI Commands

```bash
flow optimize [OPTIONS]   # Run optimization experiments
flow serve               # Start the web UI
flow run [TASK]          # Run a single agent task
flow config              # Show current configuration
flow init                # Initialize Flow directories
```

## What Gets Optimized

Flow tests different **context engineering strategies**:

| Strategy | Description |
|----------|-------------|
| **Message Compaction** | Keep first N + last M messages, discard middle |
| **Agent Memory** | Persistent storage the agent controls |
| **Sub-Agent Isolation** | Delegate research to isolated sub-agent |

Example configurations:

```python
from flow.experiments.ablation import AblationConfig

configs = [
    AblationConfig(name="baseline", enable_message_compaction=False),
    AblationConfig(name="compaction", enable_message_compaction=True, compaction_head_size=10),
    AblationConfig(name="full", enable_message_compaction=True, enable_memory_tool=True),
]
```

## Task Format

Tasks are defined in JSONL format:

```json
{"name": "fizzbuzz", "prompt": "Create fizzbuzz.py and run it", "criteria": [{"name": "correct", "instruction": "Output shows FizzBuzz pattern"}]}
```

## Development

```bash
# Install dev dependencies
uv sync --dev

# Run tests
uv run pytest tests/ -v

# Type checking
uv run pyright src/

# Linting
uv run ruff check src/
uv run ruff format src/
```

## License

MIT License - see [LICENSE](LICENSE) for details.