Spaces:

victordibia
/

flow

Sleeping

App Files Files Community

flow / README.md

victordibia

Deploy 2026-01-27 14:43:06

c1ec9a0 2 months ago

preview code

raw

history blame

2.9 kB

	---
	title: Flow
	emoji: 🔄
	colorFrom: blue
	colorTo: purple
	sdk: docker
	app_port: 7860
	pinned: false
	---

	# Flow

	Evaluate and Optimize Coding Agent Configurations

	Flow is a framework for running experiments on LLM coding agents. Compare context engineering strategies (message compaction, agent memory, sub-agents), evaluate results with LLM-as-Judge, and find optimal configurations that balance quality and token cost.

	![Flow UI](docs/flow.png)

	## Features

	- Ablation Studies: Test different agent configurations side-by-side
	- LLM-as-Judge Evaluation: Automatically score agent outputs for correctness
	- Pareto Analysis: Find optimal quality vs. cost tradeoffs
	- Web UI: Visual interface for managing experiments and viewing results
	- Config Export: Export winning configurations for production use

	## Quick Start

	### 1. Install

	```bash
	# Clone and install with uv
	git clone https://github.com/victordibia/flow
	cd flow
	uv sync
	```

	### 2. Configure Azure OpenAI

	```bash
	export AZURE_OPENAI_API_KEY="your-api-key"
	export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com/"
	export AZURE_OPENAI_DEPLOYMENT="gpt-4o"
	```

	### 3. Run Optimization

	```bash
	# Run with built-in task suite
	uv run flow optimize --suite coding

	# Or with custom tasks
	uv run flow optimize --tasks my_tasks.jsonl
	```

	### 4. Launch Web UI

	```bash
	uv run flow serve
	# Opens at http://localhost:8091
	```

	## CLI Commands

	```bash
	flow optimize [OPTIONS] # Run optimization experiments
	flow serve # Start the web UI
	flow run [TASK] # Run a single agent task
	flow config # Show current configuration
	flow init # Initialize Flow directories
	```

	## What Gets Optimized

	Flow tests different context engineering strategies:

	\| Strategy \| Description \|
	\|----------\|-------------\|
	\| Message Compaction \| Keep first N + last M messages, discard middle \|
	\| Agent Memory \| Persistent storage the agent controls \|
	\| Sub-Agent Isolation \| Delegate research to isolated sub-agent \|

	Example configurations:

	```python
	from flow.experiments.ablation import AblationConfig

	configs = [
	AblationConfig(name="baseline", enable_message_compaction=False),
	AblationConfig(name="compaction", enable_message_compaction=True, compaction_head_size=10),
	AblationConfig(name="full", enable_message_compaction=True, enable_memory_tool=True),
	]
	```

	## Task Format

	Tasks are defined in JSONL format:

	```json
	{"name": "fizzbuzz", "prompt": "Create fizzbuzz.py and run it", "criteria": [{"name": "correct", "instruction": "Output shows FizzBuzz pattern"}]}
	```

	## Development

	```bash
	# Install dev dependencies
	uv sync --dev

	# Run tests
	uv run pytest tests/ -v

	# Type checking
	uv run pyright src/

	# Linting
	uv run ruff check src/
	uv run ruff format src/
	```

	## License

	MIT License - see [LICENSE](LICENSE) for details.