File size: 9,128 Bytes
45bf590 6f39ef4 45bf590 6f39ef4 45bf590 6f39ef4 45bf590 6f39ef4 45bf590 6f39ef4 45bf590 6f39ef4 45bf590 6f39ef4 45bf590 6f39ef4 45bf590 6f39ef4 45bf590 6f39ef4 45bf590 6f39ef4 45bf590 6f39ef4 45bf590 6f39ef4 45bf590 6f39ef4 45bf590 6f39ef4 45bf590 6f39ef4 45bf590 6f39ef4 45bf590 6f39ef4 45bf590 6f39ef4 45bf590 6f39ef4 45bf590 6f39ef4 45bf590 6f39ef4 45bf590 6f39ef4 45bf590 6f39ef4 45bf590 6f39ef4 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 |
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project Overview
**CodePilot** - An autonomous AI coding agent that takes GitHub issues, understands codebases, writes code in sandboxed environments, and creates pull requests autonomously.
**Tech Stack:** Python 3.11+, Claude Sonnet 4.5 (Anthropic API), E2B sandboxed execution, LangChain/LangGraph, Chainlit UI
**Current Phase:** Phase 5 Complete (Chainlit UI with multi-agent visualization)
## Architecture
### Multi-Agent Workflow System
CodePilot uses a **dual-mode orchestrator** that routes tasks to different workflows:
```
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β ORCHESTRATOR β
β (Task Classification) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββ΄βββββββββββββββββ
β β
"explore" "code"
β β
v v
ββββββββββββββββββ ββββββββββββββββββββββββββββββββ
β ExplorerAgent β β Full Multi-Agent Pipeline β
β (Direct) β β Explorer β Clarify β Plan β
ββββββββββββββββββ β β β
β Coder β· Reviewer β
β (iterative) β
ββββββββββββββββββββββββββββββββ
```
**Task Classification Logic** (see `orchestrator.py:92-201`):
- **Explore tasks**: Questions starting with "find", "where", "what", "how", "explain" β Uses ExplorerAgent only
- **Code tasks**: Commands starting with "add", "create", "implement", "fix" β Full pipeline
- Short queries (<100 chars) default to explore; long queries default to code
**Full Pipeline Flow** (code tasks):
1. **Explorer** - Gathers codebase context using token-efficient tools
2. **Clarifier** - Planner generates questions, pauses for user answers (v3.3+)
3. **Planner** - Creates implementation plan (NO tools, pure LLM reasoning)
4. **Coder** - Implements code, tests in sandbox (NO search, uses Explorer's context)
5. **Reviewer** - Reviews code, approves or sends back to Coder with feedback
### Context Engineering (Hybrid Retrieval)
The core differentiator is **Reciprocal Rank Fusion (RRF)** combining two search methods:
```
Query β ββ BM25 (keyword) βββββββ
β β
ββ Embeddings (semantic)β€ β RRF Fusion β Top K Results
β (sentence-transformers)
βββββββββββββββββββββββββ
```
**Implementation**: `codepilot/context/hybrid_retriever.py`
- BM25: Exact matches (function names, variable names)
- Embeddings: Semantic matches (related concepts)
- RRF formula: `score = Ξ£(weight_i / (k + rank_i))` where k=60
- Default weights: 50% BM25, 50% embeddings
### Token-Efficient Tools
**Critical for cost management** - agents should prefer:
1. `get_file_outline(path)` - Shows class/function signatures (~50 tokens vs ~2000 for full file)
2. `get_code_chunk(path, name)` - Extracts specific function/class by name
3. `search_repository(query)` - Hybrid search (use BEFORE reading files)
Only use `read_file` when you need complete file contents.
### Agent Tool Access (v3.0+ separation)
Each agent has **restricted tool access** to prevent inefficiency:
- **ExplorerAgent**: `search_repository`, `get_file_outline`, `get_code_chunk`, `search_code`, `list_files`
- **PlannerAgent**: **NO TOOLS** (pure LLM reasoning, receives exploration context)
- **CoderAgent**: `write_file`, `get_code_chunk`, `read_file` (NO search tools)
- **ReviewerAgent**: `get_file_outline`, `get_code_chunk`, `read_file`
**Key insight**: v3.0 removed duplicate searching. Explorer searches once, all agents reuse that context.
## Development Commands
**Setup:**
```bash
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
```
**Verify installation:**
```bash
python test_setup.py # Checks API keys are loaded
```
**Run Chainlit UI (Primary interface):**
```bash
chainlit run chainlit_app.py
# Opens at http://localhost:8000
# Ctrl+C to stop, then pkill -f chainlit to clean up background processes
```
**Test individual components:**
```bash
# Context Engineering (Phase 2)
python test_context.py
# Multi-Agent Workflow (Phase 3)
python test_multi_agent.py
# E2B Sandbox (Phase 4)
python test_sandbox.py
python test_workflow_with_sandbox.py
```
**Environment variables** (create `.env` file):
```
ANTHROPIC_API_KEY=sk-ant-...
E2B_API_KEY=e2b_...
```
## Current Implementation Status
**β
COMPLETED (Phases 1-5):**
- Phase 1: LLM client, tool registry, base agent, core tools
- Phase 2: Hybrid retrieval (BM25 + embeddings), AST-aware parsing, codebase indexing
- Phase 3: Multi-agent architecture (Explorer, Planner, Coder, Reviewer, Orchestrator)
- Phase 4: E2B sandbox integration for isolated code execution
- Phase 5: Chainlit UI with real-time agent progress visualization
**π§ NEXT PHASES:**
- Phase 6 (Weeks 17-18): GitHub Integration - webhooks, automated PR creation
- Phase 7 (Weeks 19-21): Evals & Benchmarks - SWE-bench evaluation
- Phase 8 (Weeks 22-24): Production Hardening - error handling, monitoring, deployment
See `devon-project-plan.md` for complete 24-week roadmap.
## Key Design Principles
1. **Context Engineering is the Differentiator** - Not UI/UX, the hybrid retrieval and AST-aware chunking
2. **ReAct Pattern** - All agents use: Reason β Act (with tools) β Observe β Repeat
3. **AST-Aware Processing** - Parse code structurally using tree-sitter, not as text
4. **Sandboxed Execution** - All code runs in E2B containers, never on host machine
5. **Single-Search Architecture** - Explorer searches once, all downstream agents reuse context (v3.0+)
6. **Clarification Before Action** - Planner asks questions before creating plan (v3.3+)
## Important Implementation Details
### Tool Schema Format
All tools follow Claude/Anthropic function calling format:
```python
{
"type": "function",
"function": {
"name": "tool_name",
"description": "Clear description for LLM",
"parameters": {
"type": "object",
"properties": {...},
"required": [...]
}
}
}
```
### Path Handling (Critical for Coder)
- **Planner must provide FULL ABSOLUTE PATHS** (e.g., `/tmp/codepilot_repos/flask_abc123/examples/app.py`)
- **Coder uses paths EXACTLY as written** in the plan
- Repository path is injected in Chainlit context (see `chainlit_app.py:661-672`)
### File Operations
- `write_file` auto-creates parent directories
- `run_command` has 30-second timeout
- All tool functions return formatted strings (success messages or errors)
### Version Tracking
Files include version constants for debugging hot-reload issues:
- `orchestrator.py:12` - `ORCHESTRATOR_VERSION`
- `chainlit_app.py:25-26` - `APP_VERSION`, `BUILD_ID`
### Conversation Management
Agents use `ConversationManager` (`codepilot/agents/conversation.py`) to maintain message history in OpenAI/Anthropic format. This handles:
- System/user/assistant messages
- Tool calls and tool results
- Proper formatting for both Claude and OpenAI APIs
## Critical Files
- `codepilot/agents/orchestrator.py` - Task classification and multi-agent state machine
- `codepilot/agents/planner_agent.py` - Pure LLM planning (no tools) + clarification questions
- `codepilot/agents/coder_agent.py` - Code implementation (no search tools)
- `codepilot/agents/explorer_agent.py` - Codebase exploration (search tools only)
- `codepilot/context/hybrid_retriever.py` - RRF fusion algorithm
- `codepilot/tools/registry.py` - Tool schemas and function mappings
- `chainlit_app.py` - Interactive UI with GitHub repo cloning and progress visualization
- `requirements.txt` - Python dependencies
## Project Structure
```
codepilot/
βββ llm/ # LLM client wrappers (Claude, OpenAI)
βββ agents/ # Multi-agent system (Orchestrator, Planner, Coder, Reviewer, Explorer)
βββ tools/ # Tool implementations (file ops, context search, GitHub)
βββ context/ # Hybrid retrieval (BM25, embeddings, parser, indexer)
βββ sandbox/ # E2B sandbox integration
chainlit_app.py # Main UI application
```
|