| # CLAUDE.md | |
| This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. | |
| ## Project Overview | |
| **CodePilot** - An autonomous AI coding agent that takes GitHub issues, understands codebases, writes code in sandboxed environments, and creates pull requests autonomously. | |
| **Tech Stack:** Python 3.11+, Claude Sonnet 4.5 (Anthropic API), E2B sandboxed execution, LangChain/LangGraph, Chainlit UI | |
| **Current Phase:** Phase 5 Complete (Chainlit UI with multi-agent visualization) | |
| ## Architecture | |
| ### Multi-Agent Workflow System | |
| CodePilot uses a **dual-mode orchestrator** that routes tasks to different workflows: | |
| ``` | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β ORCHESTRATOR β | |
| β (Task Classification) β | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β | |
| βββββββββββββββββ΄βββββββββββββββββ | |
| β β | |
| "explore" "code" | |
| β β | |
| v v | |
| ββββββββββββββββββ ββββββββββββββββββββββββββββββββ | |
| β ExplorerAgent β β Full Multi-Agent Pipeline β | |
| β (Direct) β β Explorer β Clarify β Plan β | |
| ββββββββββββββββββ β β β | |
| β Coder β· Reviewer β | |
| β (iterative) β | |
| ββββββββββββββββββββββββββββββββ | |
| ``` | |
| **Task Classification Logic** (see `orchestrator.py:92-201`): | |
| - **Explore tasks**: Questions starting with "find", "where", "what", "how", "explain" β Uses ExplorerAgent only | |
| - **Code tasks**: Commands starting with "add", "create", "implement", "fix" β Full pipeline | |
| - Short queries (<100 chars) default to explore; long queries default to code | |
| **Full Pipeline Flow** (code tasks): | |
| 1. **Explorer** - Gathers codebase context using token-efficient tools | |
| 2. **Clarifier** - Planner generates questions, pauses for user answers (v3.3+) | |
| 3. **Planner** - Creates implementation plan (NO tools, pure LLM reasoning) | |
| 4. **Coder** - Implements code, tests in sandbox (NO search, uses Explorer's context) | |
| 5. **Reviewer** - Reviews code, approves or sends back to Coder with feedback | |
| ### Context Engineering (Hybrid Retrieval) | |
| The core differentiator is **Reciprocal Rank Fusion (RRF)** combining two search methods: | |
| ``` | |
| Query β ββ BM25 (keyword) βββββββ | |
| β β | |
| ββ Embeddings (semantic)β€ β RRF Fusion β Top K Results | |
| β (sentence-transformers) | |
| βββββββββββββββββββββββββ | |
| ``` | |
| **Implementation**: `codepilot/context/hybrid_retriever.py` | |
| - BM25: Exact matches (function names, variable names) | |
| - Embeddings: Semantic matches (related concepts) | |
| - RRF formula: `score = Ξ£(weight_i / (k + rank_i))` where k=60 | |
| - Default weights: 50% BM25, 50% embeddings | |
| ### Token-Efficient Tools | |
| **Critical for cost management** - agents should prefer: | |
| 1. `get_file_outline(path)` - Shows class/function signatures (~50 tokens vs ~2000 for full file) | |
| 2. `get_code_chunk(path, name)` - Extracts specific function/class by name | |
| 3. `search_repository(query)` - Hybrid search (use BEFORE reading files) | |
| Only use `read_file` when you need complete file contents. | |
| ### Agent Tool Access (v3.0+ separation) | |
| Each agent has **restricted tool access** to prevent inefficiency: | |
| - **ExplorerAgent**: `search_repository`, `get_file_outline`, `get_code_chunk`, `search_code`, `list_files` | |
| - **PlannerAgent**: **NO TOOLS** (pure LLM reasoning, receives exploration context) | |
| - **CoderAgent**: `write_file`, `get_code_chunk`, `read_file` (NO search tools) | |
| - **ReviewerAgent**: `get_file_outline`, `get_code_chunk`, `read_file` | |
| **Key insight**: v3.0 removed duplicate searching. Explorer searches once, all agents reuse that context. | |
| ## Development Commands | |
| **Setup:** | |
| ```bash | |
| python -m venv venv | |
| source venv/bin/activate # Windows: venv\Scripts\activate | |
| pip install -r requirements.txt | |
| ``` | |
| **Verify installation:** | |
| ```bash | |
| python test_setup.py # Checks API keys are loaded | |
| ``` | |
| **Run Chainlit UI (Primary interface):** | |
| ```bash | |
| chainlit run chainlit_app.py | |
| # Opens at http://localhost:8000 | |
| # Ctrl+C to stop, then pkill -f chainlit to clean up background processes | |
| ``` | |
| **Test individual components:** | |
| ```bash | |
| # Context Engineering (Phase 2) | |
| python test_context.py | |
| # Multi-Agent Workflow (Phase 3) | |
| python test_multi_agent.py | |
| # E2B Sandbox (Phase 4) | |
| python test_sandbox.py | |
| python test_workflow_with_sandbox.py | |
| ``` | |
| **Environment variables** (create `.env` file): | |
| ``` | |
| ANTHROPIC_API_KEY=sk-ant-... | |
| E2B_API_KEY=e2b_... | |
| ``` | |
| ## Current Implementation Status | |
| **β COMPLETED (Phases 1-5):** | |
| - Phase 1: LLM client, tool registry, base agent, core tools | |
| - Phase 2: Hybrid retrieval (BM25 + embeddings), AST-aware parsing, codebase indexing | |
| - Phase 3: Multi-agent architecture (Explorer, Planner, Coder, Reviewer, Orchestrator) | |
| - Phase 4: E2B sandbox integration for isolated code execution | |
| - Phase 5: Chainlit UI with real-time agent progress visualization | |
| **π§ NEXT PHASES:** | |
| - Phase 6 (Weeks 17-18): GitHub Integration - webhooks, automated PR creation | |
| - Phase 7 (Weeks 19-21): Evals & Benchmarks - SWE-bench evaluation | |
| - Phase 8 (Weeks 22-24): Production Hardening - error handling, monitoring, deployment | |
| See `devon-project-plan.md` for complete 24-week roadmap. | |
| ## Key Design Principles | |
| 1. **Context Engineering is the Differentiator** - Not UI/UX, the hybrid retrieval and AST-aware chunking | |
| 2. **ReAct Pattern** - All agents use: Reason β Act (with tools) β Observe β Repeat | |
| 3. **AST-Aware Processing** - Parse code structurally using tree-sitter, not as text | |
| 4. **Sandboxed Execution** - All code runs in E2B containers, never on host machine | |
| 5. **Single-Search Architecture** - Explorer searches once, all downstream agents reuse context (v3.0+) | |
| 6. **Clarification Before Action** - Planner asks questions before creating plan (v3.3+) | |
| ## Important Implementation Details | |
| ### Tool Schema Format | |
| All tools follow Claude/Anthropic function calling format: | |
| ```python | |
| { | |
| "type": "function", | |
| "function": { | |
| "name": "tool_name", | |
| "description": "Clear description for LLM", | |
| "parameters": { | |
| "type": "object", | |
| "properties": {...}, | |
| "required": [...] | |
| } | |
| } | |
| } | |
| ``` | |
| ### Path Handling (Critical for Coder) | |
| - **Planner must provide FULL ABSOLUTE PATHS** (e.g., `/tmp/codepilot_repos/flask_abc123/examples/app.py`) | |
| - **Coder uses paths EXACTLY as written** in the plan | |
| - Repository path is injected in Chainlit context (see `chainlit_app.py:661-672`) | |
| ### File Operations | |
| - `write_file` auto-creates parent directories | |
| - `run_command` has 30-second timeout | |
| - All tool functions return formatted strings (success messages or errors) | |
| ### Version Tracking | |
| Files include version constants for debugging hot-reload issues: | |
| - `orchestrator.py:12` - `ORCHESTRATOR_VERSION` | |
| - `chainlit_app.py:25-26` - `APP_VERSION`, `BUILD_ID` | |
| ### Conversation Management | |
| Agents use `ConversationManager` (`codepilot/agents/conversation.py`) to maintain message history in OpenAI/Anthropic format. This handles: | |
| - System/user/assistant messages | |
| - Tool calls and tool results | |
| - Proper formatting for both Claude and OpenAI APIs | |
| ## Critical Files | |
| - `codepilot/agents/orchestrator.py` - Task classification and multi-agent state machine | |
| - `codepilot/agents/planner_agent.py` - Pure LLM planning (no tools) + clarification questions | |
| - `codepilot/agents/coder_agent.py` - Code implementation (no search tools) | |
| - `codepilot/agents/explorer_agent.py` - Codebase exploration (search tools only) | |
| - `codepilot/context/hybrid_retriever.py` - RRF fusion algorithm | |
| - `codepilot/tools/registry.py` - Tool schemas and function mappings | |
| - `chainlit_app.py` - Interactive UI with GitHub repo cloning and progress visualization | |
| - `requirements.txt` - Python dependencies | |
| ## Project Structure | |
| ``` | |
| codepilot/ | |
| βββ llm/ # LLM client wrappers (Claude, OpenAI) | |
| βββ agents/ # Multi-agent system (Orchestrator, Planner, Coder, Reviewer, Explorer) | |
| βββ tools/ # Tool implementations (file ops, context search, GitHub) | |
| βββ context/ # Hybrid retrieval (BM25, embeddings, parser, indexer) | |
| βββ sandbox/ # E2B sandbox integration | |
| chainlit_app.py # Main UI application | |
| ``` | |