--- title: Flow emoji: 🔄 colorFrom: blue colorTo: purple sdk: docker app_port: 7860 pinned: false --- # Flow **Evaluate and Optimize Coding Agent Configurations** Flow is a framework for running experiments on LLM coding agents. Compare context engineering strategies (message compaction, agent memory, sub-agents), evaluate results with LLM-as-Judge, and find optimal configurations that balance quality and token cost. ![Flow UI](docs/flow.png) ## Features - **Ablation Studies**: Test different agent configurations side-by-side - **LLM-as-Judge Evaluation**: Automatically score agent outputs for correctness - **Pareto Analysis**: Find optimal quality vs. cost tradeoffs - **Web UI**: Visual interface for managing experiments and viewing results - **Config Export**: Export winning configurations for production use ## Quick Start ### 1. Install ```bash # Clone and install with uv git clone https://github.com/victordibia/flow cd flow uv sync ``` ### 2. Configure Azure OpenAI ```bash export AZURE_OPENAI_API_KEY="your-api-key" export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com/" export AZURE_OPENAI_DEPLOYMENT="gpt-4o" ``` ### 3. Run Optimization ```bash # Run with built-in task suite uv run flow optimize --suite coding # Or with custom tasks uv run flow optimize --tasks my_tasks.jsonl ``` ### 4. Launch Web UI ```bash uv run flow serve # Opens at http://localhost:8091 ``` ## CLI Commands ```bash flow optimize [OPTIONS] # Run optimization experiments flow serve # Start the web UI flow run [TASK] # Run a single agent task flow config # Show current configuration flow init # Initialize Flow directories ``` ## What Gets Optimized Flow tests different **context engineering strategies**: | Strategy | Description | |----------|-------------| | **Message Compaction** | Keep first N + last M messages, discard middle | | **Agent Memory** | Persistent storage the agent controls | | **Sub-Agent Isolation** | Delegate research to isolated sub-agent | Example configurations: ```python from flow.experiments.ablation import AblationConfig configs = [ AblationConfig(name="baseline", enable_message_compaction=False), AblationConfig(name="compaction", enable_message_compaction=True, compaction_head_size=10), AblationConfig(name="full", enable_message_compaction=True, enable_memory_tool=True), ] ``` ## Task Format Tasks are defined in JSONL format: ```json {"name": "fizzbuzz", "prompt": "Create fizzbuzz.py and run it", "criteria": [{"name": "correct", "instruction": "Output shows FizzBuzz pattern"}]} ``` ## Development ```bash # Install dev dependencies uv sync --dev # Run tests uv run pytest tests/ -v # Type checking uv run pyright src/ # Linting uv run ruff check src/ uv run ruff format src/ ``` ## License MIT License - see [LICENSE](LICENSE) for details.