CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project Overview
CodePilot - An autonomous AI coding agent that takes GitHub issues, understands codebases, writes code in sandboxed environments, and creates pull requests autonomously.
Tech Stack: Python 3.11+, Claude Sonnet 4.5 (Anthropic API), E2B sandboxed execution, LangChain/LangGraph, Chainlit UI Current Phase: Phase 5 Complete (Chainlit UI with multi-agent visualization)
Architecture
Multi-Agent Workflow System
CodePilot uses a dual-mode orchestrator that routes tasks to different workflows:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β ORCHESTRATOR β
β (Task Classification) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββ΄βββββββββββββββββ
β β
"explore" "code"
β β
v v
ββββββββββββββββββ ββββββββββββββββββββββββββββββββ
β ExplorerAgent β β Full Multi-Agent Pipeline β
β (Direct) β β Explorer β Clarify β Plan β
ββββββββββββββββββ β β β
β Coder β· Reviewer β
β (iterative) β
ββββββββββββββββββββββββββββββββ
Task Classification Logic (see orchestrator.py:92-201):
- Explore tasks: Questions starting with "find", "where", "what", "how", "explain" β Uses ExplorerAgent only
- Code tasks: Commands starting with "add", "create", "implement", "fix" β Full pipeline
- Short queries (<100 chars) default to explore; long queries default to code
Full Pipeline Flow (code tasks):
- Explorer - Gathers codebase context using token-efficient tools
- Clarifier - Planner generates questions, pauses for user answers (v3.3+)
- Planner - Creates implementation plan (NO tools, pure LLM reasoning)
- Coder - Implements code, tests in sandbox (NO search, uses Explorer's context)
- Reviewer - Reviews code, approves or sends back to Coder with feedback
Context Engineering (Hybrid Retrieval)
The core differentiator is Reciprocal Rank Fusion (RRF) combining two search methods:
Query β ββ BM25 (keyword) βββββββ
β β
ββ Embeddings (semantic)β€ β RRF Fusion β Top K Results
β (sentence-transformers)
βββββββββββββββββββββββββ
Implementation: codepilot/context/hybrid_retriever.py
- BM25: Exact matches (function names, variable names)
- Embeddings: Semantic matches (related concepts)
- RRF formula:
score = Ξ£(weight_i / (k + rank_i))where k=60 - Default weights: 50% BM25, 50% embeddings
Token-Efficient Tools
Critical for cost management - agents should prefer:
get_file_outline(path)- Shows class/function signatures (~50 tokens vs ~2000 for full file)get_code_chunk(path, name)- Extracts specific function/class by namesearch_repository(query)- Hybrid search (use BEFORE reading files)
Only use read_file when you need complete file contents.
Agent Tool Access (v3.0+ separation)
Each agent has restricted tool access to prevent inefficiency:
- ExplorerAgent:
search_repository,get_file_outline,get_code_chunk,search_code,list_files - PlannerAgent: NO TOOLS (pure LLM reasoning, receives exploration context)
- CoderAgent:
write_file,get_code_chunk,read_file(NO search tools) - ReviewerAgent:
get_file_outline,get_code_chunk,read_file
Key insight: v3.0 removed duplicate searching. Explorer searches once, all agents reuse that context.
Development Commands
Setup:
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
Verify installation:
python test_setup.py # Checks API keys are loaded
Run Chainlit UI (Primary interface):
chainlit run chainlit_app.py
# Opens at http://localhost:8000
# Ctrl+C to stop, then pkill -f chainlit to clean up background processes
Test individual components:
# Context Engineering (Phase 2)
python test_context.py
# Multi-Agent Workflow (Phase 3)
python test_multi_agent.py
# E2B Sandbox (Phase 4)
python test_sandbox.py
python test_workflow_with_sandbox.py
Environment variables (create .env file):
ANTHROPIC_API_KEY=sk-ant-...
E2B_API_KEY=e2b_...
Current Implementation Status
β COMPLETED (Phases 1-5):
- Phase 1: LLM client, tool registry, base agent, core tools
- Phase 2: Hybrid retrieval (BM25 + embeddings), AST-aware parsing, codebase indexing
- Phase 3: Multi-agent architecture (Explorer, Planner, Coder, Reviewer, Orchestrator)
- Phase 4: E2B sandbox integration for isolated code execution
- Phase 5: Chainlit UI with real-time agent progress visualization
π§ NEXT PHASES:
- Phase 6 (Weeks 17-18): GitHub Integration - webhooks, automated PR creation
- Phase 7 (Weeks 19-21): Evals & Benchmarks - SWE-bench evaluation
- Phase 8 (Weeks 22-24): Production Hardening - error handling, monitoring, deployment
See devon-project-plan.md for complete 24-week roadmap.
Key Design Principles
- Context Engineering is the Differentiator - Not UI/UX, the hybrid retrieval and AST-aware chunking
- ReAct Pattern - All agents use: Reason β Act (with tools) β Observe β Repeat
- AST-Aware Processing - Parse code structurally using tree-sitter, not as text
- Sandboxed Execution - All code runs in E2B containers, never on host machine
- Single-Search Architecture - Explorer searches once, all downstream agents reuse context (v3.0+)
- Clarification Before Action - Planner asks questions before creating plan (v3.3+)
Important Implementation Details
Tool Schema Format
All tools follow Claude/Anthropic function calling format:
{
"type": "function",
"function": {
"name": "tool_name",
"description": "Clear description for LLM",
"parameters": {
"type": "object",
"properties": {...},
"required": [...]
}
}
}
Path Handling (Critical for Coder)
- Planner must provide FULL ABSOLUTE PATHS (e.g.,
/tmp/codepilot_repos/flask_abc123/examples/app.py) - Coder uses paths EXACTLY as written in the plan
- Repository path is injected in Chainlit context (see
chainlit_app.py:661-672)
File Operations
write_fileauto-creates parent directoriesrun_commandhas 30-second timeout- All tool functions return formatted strings (success messages or errors)
Version Tracking
Files include version constants for debugging hot-reload issues:
orchestrator.py:12-ORCHESTRATOR_VERSIONchainlit_app.py:25-26-APP_VERSION,BUILD_ID
Conversation Management
Agents use ConversationManager (codepilot/agents/conversation.py) to maintain message history in OpenAI/Anthropic format. This handles:
- System/user/assistant messages
- Tool calls and tool results
- Proper formatting for both Claude and OpenAI APIs
Critical Files
codepilot/agents/orchestrator.py- Task classification and multi-agent state machinecodepilot/agents/planner_agent.py- Pure LLM planning (no tools) + clarification questionscodepilot/agents/coder_agent.py- Code implementation (no search tools)codepilot/agents/explorer_agent.py- Codebase exploration (search tools only)codepilot/context/hybrid_retriever.py- RRF fusion algorithmcodepilot/tools/registry.py- Tool schemas and function mappingschainlit_app.py- Interactive UI with GitHub repo cloning and progress visualizationrequirements.txt- Python dependencies
Project Structure
codepilot/
βββ llm/ # LLM client wrappers (Claude, OpenAI)
βββ agents/ # Multi-agent system (Orchestrator, Planner, Coder, Reviewer, Explorer)
βββ tools/ # Tool implementations (file ops, context search, GitHub)
βββ context/ # Hybrid retrieval (BM25, embeddings, parser, indexer)
βββ sandbox/ # E2B sandbox integration
chainlit_app.py # Main UI application