codepilot / CLAUDE.md
ayushm98's picture
Improve UI: cleaner welcome, better progress display, simplified results
6f39ef4

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

CodePilot - An autonomous AI coding agent that takes GitHub issues, understands codebases, writes code in sandboxed environments, and creates pull requests autonomously.

Tech Stack: Python 3.11+, Claude Sonnet 4.5 (Anthropic API), E2B sandboxed execution, LangChain/LangGraph, Chainlit UI Current Phase: Phase 5 Complete (Chainlit UI with multi-agent visualization)

Architecture

Multi-Agent Workflow System

CodePilot uses a dual-mode orchestrator that routes tasks to different workflows:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    ORCHESTRATOR                         β”‚
β”‚              (Task Classification)                      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         β”‚
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β”‚                                β”‚
    "explore"                         "code"
         β”‚                                β”‚
         v                                v
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ ExplorerAgent  β”‚         β”‚ Full Multi-Agent Pipeline    β”‚
β”‚   (Direct)     β”‚         β”‚ Explorer β†’ Clarify β†’ Plan    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β”‚         ↓                    β”‚
                           β”‚   Coder ⟷ Reviewer           β”‚
                           β”‚  (iterative)                 β”‚
                           β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Task Classification Logic (see orchestrator.py:92-201):

  • Explore tasks: Questions starting with "find", "where", "what", "how", "explain" β†’ Uses ExplorerAgent only
  • Code tasks: Commands starting with "add", "create", "implement", "fix" β†’ Full pipeline
  • Short queries (<100 chars) default to explore; long queries default to code

Full Pipeline Flow (code tasks):

  1. Explorer - Gathers codebase context using token-efficient tools
  2. Clarifier - Planner generates questions, pauses for user answers (v3.3+)
  3. Planner - Creates implementation plan (NO tools, pure LLM reasoning)
  4. Coder - Implements code, tests in sandbox (NO search, uses Explorer's context)
  5. Reviewer - Reviews code, approves or sends back to Coder with feedback

Context Engineering (Hybrid Retrieval)

The core differentiator is Reciprocal Rank Fusion (RRF) combining two search methods:

Query β†’ β”Œβ”€ BM25 (keyword) ──────┐
        β”‚                       β”‚
        β”œβ”€ Embeddings (semantic)─ β†’ RRF Fusion β†’ Top K Results
        β”‚   (sentence-transformers)
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Implementation: codepilot/context/hybrid_retriever.py

  • BM25: Exact matches (function names, variable names)
  • Embeddings: Semantic matches (related concepts)
  • RRF formula: score = Ξ£(weight_i / (k + rank_i)) where k=60
  • Default weights: 50% BM25, 50% embeddings

Token-Efficient Tools

Critical for cost management - agents should prefer:

  1. get_file_outline(path) - Shows class/function signatures (~50 tokens vs ~2000 for full file)
  2. get_code_chunk(path, name) - Extracts specific function/class by name
  3. search_repository(query) - Hybrid search (use BEFORE reading files)

Only use read_file when you need complete file contents.

Agent Tool Access (v3.0+ separation)

Each agent has restricted tool access to prevent inefficiency:

  • ExplorerAgent: search_repository, get_file_outline, get_code_chunk, search_code, list_files
  • PlannerAgent: NO TOOLS (pure LLM reasoning, receives exploration context)
  • CoderAgent: write_file, get_code_chunk, read_file (NO search tools)
  • ReviewerAgent: get_file_outline, get_code_chunk, read_file

Key insight: v3.0 removed duplicate searching. Explorer searches once, all agents reuse that context.

Development Commands

Setup:

python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
pip install -r requirements.txt

Verify installation:

python test_setup.py  # Checks API keys are loaded

Run Chainlit UI (Primary interface):

chainlit run chainlit_app.py
# Opens at http://localhost:8000
# Ctrl+C to stop, then pkill -f chainlit to clean up background processes

Test individual components:

# Context Engineering (Phase 2)
python test_context.py

# Multi-Agent Workflow (Phase 3)
python test_multi_agent.py

# E2B Sandbox (Phase 4)
python test_sandbox.py
python test_workflow_with_sandbox.py

Environment variables (create .env file):

ANTHROPIC_API_KEY=sk-ant-...
E2B_API_KEY=e2b_...

Current Implementation Status

βœ… COMPLETED (Phases 1-5):

  • Phase 1: LLM client, tool registry, base agent, core tools
  • Phase 2: Hybrid retrieval (BM25 + embeddings), AST-aware parsing, codebase indexing
  • Phase 3: Multi-agent architecture (Explorer, Planner, Coder, Reviewer, Orchestrator)
  • Phase 4: E2B sandbox integration for isolated code execution
  • Phase 5: Chainlit UI with real-time agent progress visualization

🚧 NEXT PHASES:

  • Phase 6 (Weeks 17-18): GitHub Integration - webhooks, automated PR creation
  • Phase 7 (Weeks 19-21): Evals & Benchmarks - SWE-bench evaluation
  • Phase 8 (Weeks 22-24): Production Hardening - error handling, monitoring, deployment

See devon-project-plan.md for complete 24-week roadmap.

Key Design Principles

  1. Context Engineering is the Differentiator - Not UI/UX, the hybrid retrieval and AST-aware chunking
  2. ReAct Pattern - All agents use: Reason β†’ Act (with tools) β†’ Observe β†’ Repeat
  3. AST-Aware Processing - Parse code structurally using tree-sitter, not as text
  4. Sandboxed Execution - All code runs in E2B containers, never on host machine
  5. Single-Search Architecture - Explorer searches once, all downstream agents reuse context (v3.0+)
  6. Clarification Before Action - Planner asks questions before creating plan (v3.3+)

Important Implementation Details

Tool Schema Format

All tools follow Claude/Anthropic function calling format:

{
    "type": "function",
    "function": {
        "name": "tool_name",
        "description": "Clear description for LLM",
        "parameters": {
            "type": "object",
            "properties": {...},
            "required": [...]
        }
    }
}

Path Handling (Critical for Coder)

  • Planner must provide FULL ABSOLUTE PATHS (e.g., /tmp/codepilot_repos/flask_abc123/examples/app.py)
  • Coder uses paths EXACTLY as written in the plan
  • Repository path is injected in Chainlit context (see chainlit_app.py:661-672)

File Operations

  • write_file auto-creates parent directories
  • run_command has 30-second timeout
  • All tool functions return formatted strings (success messages or errors)

Version Tracking

Files include version constants for debugging hot-reload issues:

  • orchestrator.py:12 - ORCHESTRATOR_VERSION
  • chainlit_app.py:25-26 - APP_VERSION, BUILD_ID

Conversation Management

Agents use ConversationManager (codepilot/agents/conversation.py) to maintain message history in OpenAI/Anthropic format. This handles:

  • System/user/assistant messages
  • Tool calls and tool results
  • Proper formatting for both Claude and OpenAI APIs

Critical Files

  • codepilot/agents/orchestrator.py - Task classification and multi-agent state machine
  • codepilot/agents/planner_agent.py - Pure LLM planning (no tools) + clarification questions
  • codepilot/agents/coder_agent.py - Code implementation (no search tools)
  • codepilot/agents/explorer_agent.py - Codebase exploration (search tools only)
  • codepilot/context/hybrid_retriever.py - RRF fusion algorithm
  • codepilot/tools/registry.py - Tool schemas and function mappings
  • chainlit_app.py - Interactive UI with GitHub repo cloning and progress visualization
  • requirements.txt - Python dependencies

Project Structure

codepilot/
β”œβ”€β”€ llm/               # LLM client wrappers (Claude, OpenAI)
β”œβ”€β”€ agents/            # Multi-agent system (Orchestrator, Planner, Coder, Reviewer, Explorer)
β”œβ”€β”€ tools/             # Tool implementations (file ops, context search, GitHub)
β”œβ”€β”€ context/           # Hybrid retrieval (BM25, embeddings, parser, indexer)
└── sandbox/           # E2B sandbox integration
chainlit_app.py        # Main UI application