flow / README.md
victordibia's picture
Deploy 2026-01-27 14:43:06
c1ec9a0
|
raw
history blame
2.9 kB
---
title: Flow
emoji: ๐Ÿ”„
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
pinned: false
---
# Flow
**Evaluate and Optimize Coding Agent Configurations**
Flow is a framework for running experiments on LLM coding agents. Compare context engineering strategies (message compaction, agent memory, sub-agents), evaluate results with LLM-as-Judge, and find optimal configurations that balance quality and token cost.
![Flow UI](docs/flow.png)
## Features
- **Ablation Studies**: Test different agent configurations side-by-side
- **LLM-as-Judge Evaluation**: Automatically score agent outputs for correctness
- **Pareto Analysis**: Find optimal quality vs. cost tradeoffs
- **Web UI**: Visual interface for managing experiments and viewing results
- **Config Export**: Export winning configurations for production use
## Quick Start
### 1. Install
```bash
# Clone and install with uv
git clone https://github.com/victordibia/flow
cd flow
uv sync
```
### 2. Configure Azure OpenAI
```bash
export AZURE_OPENAI_API_KEY="your-api-key"
export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com/"
export AZURE_OPENAI_DEPLOYMENT="gpt-4o"
```
### 3. Run Optimization
```bash
# Run with built-in task suite
uv run flow optimize --suite coding
# Or with custom tasks
uv run flow optimize --tasks my_tasks.jsonl
```
### 4. Launch Web UI
```bash
uv run flow serve
# Opens at http://localhost:8091
```
## CLI Commands
```bash
flow optimize [OPTIONS] # Run optimization experiments
flow serve # Start the web UI
flow run [TASK] # Run a single agent task
flow config # Show current configuration
flow init # Initialize Flow directories
```
## What Gets Optimized
Flow tests different **context engineering strategies**:
| Strategy | Description |
|----------|-------------|
| **Message Compaction** | Keep first N + last M messages, discard middle |
| **Agent Memory** | Persistent storage the agent controls |
| **Sub-Agent Isolation** | Delegate research to isolated sub-agent |
Example configurations:
```python
from flow.experiments.ablation import AblationConfig
configs = [
AblationConfig(name="baseline", enable_message_compaction=False),
AblationConfig(name="compaction", enable_message_compaction=True, compaction_head_size=10),
AblationConfig(name="full", enable_message_compaction=True, enable_memory_tool=True),
]
```
## Task Format
Tasks are defined in JSONL format:
```json
{"name": "fizzbuzz", "prompt": "Create fizzbuzz.py and run it", "criteria": [{"name": "correct", "instruction": "Output shows FizzBuzz pattern"}]}
```
## Development
```bash
# Install dev dependencies
uv sync --dev
# Run tests
uv run pytest tests/ -v
# Type checking
uv run pyright src/
# Linting
uv run ruff check src/
uv run ruff format src/
```
## License
MIT License - see [LICENSE](LICENSE) for details.