Spaces:
Sleeping
Sleeping
| title: Flow | |
| emoji: ๐ | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: docker | |
| app_port: 7860 | |
| pinned: false | |
| # Flow | |
| **Evaluate and Optimize Coding Agent Configurations** | |
| Flow is a framework for running experiments on LLM coding agents. Compare context engineering strategies (message compaction, agent memory, sub-agents), evaluate results with LLM-as-Judge, and find optimal configurations that balance quality and token cost. | |
|  | |
| ## Features | |
| - **Ablation Studies**: Test different agent configurations side-by-side | |
| - **LLM-as-Judge Evaluation**: Automatically score agent outputs for correctness | |
| - **Pareto Analysis**: Find optimal quality vs. cost tradeoffs | |
| - **Web UI**: Visual interface for managing experiments and viewing results | |
| - **Config Export**: Export winning configurations for production use | |
| ## Quick Start | |
| ### 1. Install | |
| ```bash | |
| # Clone and install with uv | |
| git clone https://github.com/victordibia/flow | |
| cd flow | |
| uv sync | |
| ``` | |
| ### 2. Configure Azure OpenAI | |
| ```bash | |
| export AZURE_OPENAI_API_KEY="your-api-key" | |
| export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com/" | |
| export AZURE_OPENAI_DEPLOYMENT="gpt-4o" | |
| ``` | |
| ### 3. Run Optimization | |
| ```bash | |
| # Run with built-in task suite | |
| uv run flow optimize --suite coding | |
| # Or with custom tasks | |
| uv run flow optimize --tasks my_tasks.jsonl | |
| ``` | |
| ### 4. Launch Web UI | |
| ```bash | |
| uv run flow serve | |
| # Opens at http://localhost:8091 | |
| ``` | |
| ## CLI Commands | |
| ```bash | |
| flow optimize [OPTIONS] # Run optimization experiments | |
| flow serve # Start the web UI | |
| flow run [TASK] # Run a single agent task | |
| flow config # Show current configuration | |
| flow init # Initialize Flow directories | |
| ``` | |
| ## What Gets Optimized | |
| Flow tests different **context engineering strategies**: | |
| | Strategy | Description | | |
| |----------|-------------| | |
| | **Message Compaction** | Keep first N + last M messages, discard middle | | |
| | **Agent Memory** | Persistent storage the agent controls | | |
| | **Sub-Agent Isolation** | Delegate research to isolated sub-agent | | |
| Example configurations: | |
| ```python | |
| from flow.experiments.ablation import AblationConfig | |
| configs = [ | |
| AblationConfig(name="baseline", enable_message_compaction=False), | |
| AblationConfig(name="compaction", enable_message_compaction=True, compaction_head_size=10), | |
| AblationConfig(name="full", enable_message_compaction=True, enable_memory_tool=True), | |
| ] | |
| ``` | |
| ## Task Format | |
| Tasks are defined in JSONL format: | |
| ```json | |
| {"name": "fizzbuzz", "prompt": "Create fizzbuzz.py and run it", "criteria": [{"name": "correct", "instruction": "Output shows FizzBuzz pattern"}]} | |
| ``` | |
| ## Development | |
| ```bash | |
| # Install dev dependencies | |
| uv sync --dev | |
| # Run tests | |
| uv run pytest tests/ -v | |
| # Type checking | |
| uv run pyright src/ | |
| # Linting | |
| uv run ruff check src/ | |
| uv run ruff format src/ | |
| ``` | |
| ## License | |
| MIT License - see [LICENSE](LICENSE) for details. | |