Spaces:
Sleeping
Sleeping
File size: 2,899 Bytes
034c2ac c1ec9a0 034c2ac c1ec9a0 034c2ac c1ec9a0 034c2ac c1ec9a0 034c2ac c1ec9a0 034c2ac c1ec9a0 034c2ac c1ec9a0 034c2ac c1ec9a0 034c2ac c1ec9a0 034c2ac c1ec9a0 034c2ac c1ec9a0 034c2ac c1ec9a0 034c2ac c1ec9a0 034c2ac c1ec9a0 034c2ac c1ec9a0 034c2ac c1ec9a0 034c2ac c1ec9a0 034c2ac c1ec9a0 034c2ac c1ec9a0 034c2ac c1ec9a0 034c2ac c1ec9a0 034c2ac c1ec9a0 034c2ac c1ec9a0 034c2ac c1ec9a0 034c2ac c1ec9a0 034c2ac c1ec9a0 034c2ac c1ec9a0 034c2ac c1ec9a0 034c2ac | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 | ---
title: Flow
emoji: 🔄
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
pinned: false
---
# Flow
**Evaluate and Optimize Coding Agent Configurations**
Flow is a framework for running experiments on LLM coding agents. Compare context engineering strategies (message compaction, agent memory, sub-agents), evaluate results with LLM-as-Judge, and find optimal configurations that balance quality and token cost.

## Features
- **Ablation Studies**: Test different agent configurations side-by-side
- **LLM-as-Judge Evaluation**: Automatically score agent outputs for correctness
- **Pareto Analysis**: Find optimal quality vs. cost tradeoffs
- **Web UI**: Visual interface for managing experiments and viewing results
- **Config Export**: Export winning configurations for production use
## Quick Start
### 1. Install
```bash
# Clone and install with uv
git clone https://github.com/victordibia/flow
cd flow
uv sync
```
### 2. Configure Azure OpenAI
```bash
export AZURE_OPENAI_API_KEY="your-api-key"
export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com/"
export AZURE_OPENAI_DEPLOYMENT="gpt-4o"
```
### 3. Run Optimization
```bash
# Run with built-in task suite
uv run flow optimize --suite coding
# Or with custom tasks
uv run flow optimize --tasks my_tasks.jsonl
```
### 4. Launch Web UI
```bash
uv run flow serve
# Opens at http://localhost:8091
```
## CLI Commands
```bash
flow optimize [OPTIONS] # Run optimization experiments
flow serve # Start the web UI
flow run [TASK] # Run a single agent task
flow config # Show current configuration
flow init # Initialize Flow directories
```
## What Gets Optimized
Flow tests different **context engineering strategies**:
| Strategy | Description |
|----------|-------------|
| **Message Compaction** | Keep first N + last M messages, discard middle |
| **Agent Memory** | Persistent storage the agent controls |
| **Sub-Agent Isolation** | Delegate research to isolated sub-agent |
Example configurations:
```python
from flow.experiments.ablation import AblationConfig
configs = [
AblationConfig(name="baseline", enable_message_compaction=False),
AblationConfig(name="compaction", enable_message_compaction=True, compaction_head_size=10),
AblationConfig(name="full", enable_message_compaction=True, enable_memory_tool=True),
]
```
## Task Format
Tasks are defined in JSONL format:
```json
{"name": "fizzbuzz", "prompt": "Create fizzbuzz.py and run it", "criteria": [{"name": "correct", "instruction": "Output shows FizzBuzz pattern"}]}
```
## Development
```bash
# Install dev dependencies
uv sync --dev
# Run tests
uv run pytest tests/ -v
# Type checking
uv run pyright src/
# Linting
uv run ruff check src/
uv run ruff format src/
```
## License
MIT License - see [LICENSE](LICENSE) for details.
|