Spaces:

victordibia
/

flow

Sleeping

File size: 2,899 Bytes

034c2ac
c1ec9a0
034c2ac
 
 
 
 
 
 
 
 
 
c1ec9a0
034c2ac
c1ec9a0
034c2ac
c1ec9a0
034c2ac
c1ec9a0
034c2ac
c1ec9a0
 
 
 
 
034c2ac
c1ec9a0
034c2ac
c1ec9a0
034c2ac
c1ec9a0
 
 
 
 
034c2ac
 
c1ec9a0
034c2ac
 
 
 
 
 
 
c1ec9a0
034c2ac
 
c1ec9a0
 
 
 
 
034c2ac
 
c1ec9a0
034c2ac
 
c1ec9a0
 
034c2ac
 
 
 
 
c1ec9a0
 
 
 
 
034c2ac
 
c1ec9a0
034c2ac
c1ec9a0
034c2ac
c1ec9a0
 
 
 
 
034c2ac
c1ec9a0
034c2ac
c1ec9a0
 
034c2ac
c1ec9a0
 
 
 
 
034c2ac
 
c1ec9a0
034c2ac
c1ec9a0
034c2ac
c1ec9a0
 
034c2ac
 
 
 
 
c1ec9a0
 
034c2ac
 
c1ec9a0
034c2ac
 
c1ec9a0
034c2ac
 
c1ec9a0
 
034c2ac

---
title: Flow
emoji: 🔄
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
pinned: false
---

# Flow

**Evaluate and Optimize Coding Agent Configurations**

Flow is a framework for running experiments on LLM coding agents. Compare context engineering strategies (message compaction, agent memory, sub-agents), evaluate results with LLM-as-Judge, and find optimal configurations that balance quality and token cost.

![Flow UI](docs/flow.png)

## Features

- **Ablation Studies**: Test different agent configurations side-by-side
- **LLM-as-Judge Evaluation**: Automatically score agent outputs for correctness
- **Pareto Analysis**: Find optimal quality vs. cost tradeoffs
- **Web UI**: Visual interface for managing experiments and viewing results
- **Config Export**: Export winning configurations for production use

## Quick Start

### 1. Install

```bash
# Clone and install with uv
git clone https://github.com/victordibia/flow
cd flow
uv sync
```

### 2. Configure Azure OpenAI

```bash
export AZURE_OPENAI_API_KEY="your-api-key"
export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com/"
export AZURE_OPENAI_DEPLOYMENT="gpt-4o"
```

### 3. Run Optimization

```bash
# Run with built-in task suite
uv run flow optimize --suite coding

# Or with custom tasks
uv run flow optimize --tasks my_tasks.jsonl
```

### 4. Launch Web UI

```bash
uv run flow serve
# Opens at http://localhost:8091
```

## CLI Commands

```bash
flow optimize [OPTIONS]   # Run optimization experiments
flow serve               # Start the web UI
flow run [TASK]          # Run a single agent task
flow config              # Show current configuration
flow init                # Initialize Flow directories
```

## What Gets Optimized

Flow tests different **context engineering strategies**:

| Strategy | Description |
|----------|-------------|
| **Message Compaction** | Keep first N + last M messages, discard middle |
| **Agent Memory** | Persistent storage the agent controls |
| **Sub-Agent Isolation** | Delegate research to isolated sub-agent |

Example configurations:

```python
from flow.experiments.ablation import AblationConfig

configs = [
    AblationConfig(name="baseline", enable_message_compaction=False),
    AblationConfig(name="compaction", enable_message_compaction=True, compaction_head_size=10),
    AblationConfig(name="full", enable_message_compaction=True, enable_memory_tool=True),
]
```

## Task Format

Tasks are defined in JSONL format:

```json
{"name": "fizzbuzz", "prompt": "Create fizzbuzz.py and run it", "criteria": [{"name": "correct", "instruction": "Output shows FizzBuzz pattern"}]}
```

## Development

```bash
# Install dev dependencies
uv sync --dev

# Run tests
uv run pytest tests/ -v

# Type checking
uv run pyright src/

# Linting
uv run ruff check src/
uv run ruff format src/
```

## License

MIT License - see [LICENSE](LICENSE) for details.