codepilot / CLAUDE.md
ayushm98's picture
Improve UI: cleaner welcome, better progress display, simplified results
6f39ef4
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project Overview
**CodePilot** - An autonomous AI coding agent that takes GitHub issues, understands codebases, writes code in sandboxed environments, and creates pull requests autonomously.
**Tech Stack:** Python 3.11+, Claude Sonnet 4.5 (Anthropic API), E2B sandboxed execution, LangChain/LangGraph, Chainlit UI
**Current Phase:** Phase 5 Complete (Chainlit UI with multi-agent visualization)
## Architecture
### Multi-Agent Workflow System
CodePilot uses a **dual-mode orchestrator** that routes tasks to different workflows:
```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ ORCHESTRATOR β”‚
β”‚ (Task Classification) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ β”‚
"explore" "code"
β”‚ β”‚
v v
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ ExplorerAgent β”‚ β”‚ Full Multi-Agent Pipeline β”‚
β”‚ (Direct) β”‚ β”‚ Explorer β†’ Clarify β†’ Plan β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ ↓ β”‚
β”‚ Coder ⟷ Reviewer β”‚
β”‚ (iterative) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
**Task Classification Logic** (see `orchestrator.py:92-201`):
- **Explore tasks**: Questions starting with "find", "where", "what", "how", "explain" β†’ Uses ExplorerAgent only
- **Code tasks**: Commands starting with "add", "create", "implement", "fix" β†’ Full pipeline
- Short queries (<100 chars) default to explore; long queries default to code
**Full Pipeline Flow** (code tasks):
1. **Explorer** - Gathers codebase context using token-efficient tools
2. **Clarifier** - Planner generates questions, pauses for user answers (v3.3+)
3. **Planner** - Creates implementation plan (NO tools, pure LLM reasoning)
4. **Coder** - Implements code, tests in sandbox (NO search, uses Explorer's context)
5. **Reviewer** - Reviews code, approves or sends back to Coder with feedback
### Context Engineering (Hybrid Retrieval)
The core differentiator is **Reciprocal Rank Fusion (RRF)** combining two search methods:
```
Query β†’ β”Œβ”€ BM25 (keyword) ──────┐
β”‚ β”‚
β”œβ”€ Embeddings (semantic)─ β†’ RRF Fusion β†’ Top K Results
β”‚ (sentence-transformers)
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
**Implementation**: `codepilot/context/hybrid_retriever.py`
- BM25: Exact matches (function names, variable names)
- Embeddings: Semantic matches (related concepts)
- RRF formula: `score = Ξ£(weight_i / (k + rank_i))` where k=60
- Default weights: 50% BM25, 50% embeddings
### Token-Efficient Tools
**Critical for cost management** - agents should prefer:
1. `get_file_outline(path)` - Shows class/function signatures (~50 tokens vs ~2000 for full file)
2. `get_code_chunk(path, name)` - Extracts specific function/class by name
3. `search_repository(query)` - Hybrid search (use BEFORE reading files)
Only use `read_file` when you need complete file contents.
### Agent Tool Access (v3.0+ separation)
Each agent has **restricted tool access** to prevent inefficiency:
- **ExplorerAgent**: `search_repository`, `get_file_outline`, `get_code_chunk`, `search_code`, `list_files`
- **PlannerAgent**: **NO TOOLS** (pure LLM reasoning, receives exploration context)
- **CoderAgent**: `write_file`, `get_code_chunk`, `read_file` (NO search tools)
- **ReviewerAgent**: `get_file_outline`, `get_code_chunk`, `read_file`
**Key insight**: v3.0 removed duplicate searching. Explorer searches once, all agents reuse that context.
## Development Commands
**Setup:**
```bash
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
```
**Verify installation:**
```bash
python test_setup.py # Checks API keys are loaded
```
**Run Chainlit UI (Primary interface):**
```bash
chainlit run chainlit_app.py
# Opens at http://localhost:8000
# Ctrl+C to stop, then pkill -f chainlit to clean up background processes
```
**Test individual components:**
```bash
# Context Engineering (Phase 2)
python test_context.py
# Multi-Agent Workflow (Phase 3)
python test_multi_agent.py
# E2B Sandbox (Phase 4)
python test_sandbox.py
python test_workflow_with_sandbox.py
```
**Environment variables** (create `.env` file):
```
ANTHROPIC_API_KEY=sk-ant-...
E2B_API_KEY=e2b_...
```
## Current Implementation Status
**βœ… COMPLETED (Phases 1-5):**
- Phase 1: LLM client, tool registry, base agent, core tools
- Phase 2: Hybrid retrieval (BM25 + embeddings), AST-aware parsing, codebase indexing
- Phase 3: Multi-agent architecture (Explorer, Planner, Coder, Reviewer, Orchestrator)
- Phase 4: E2B sandbox integration for isolated code execution
- Phase 5: Chainlit UI with real-time agent progress visualization
**🚧 NEXT PHASES:**
- Phase 6 (Weeks 17-18): GitHub Integration - webhooks, automated PR creation
- Phase 7 (Weeks 19-21): Evals & Benchmarks - SWE-bench evaluation
- Phase 8 (Weeks 22-24): Production Hardening - error handling, monitoring, deployment
See `devon-project-plan.md` for complete 24-week roadmap.
## Key Design Principles
1. **Context Engineering is the Differentiator** - Not UI/UX, the hybrid retrieval and AST-aware chunking
2. **ReAct Pattern** - All agents use: Reason β†’ Act (with tools) β†’ Observe β†’ Repeat
3. **AST-Aware Processing** - Parse code structurally using tree-sitter, not as text
4. **Sandboxed Execution** - All code runs in E2B containers, never on host machine
5. **Single-Search Architecture** - Explorer searches once, all downstream agents reuse context (v3.0+)
6. **Clarification Before Action** - Planner asks questions before creating plan (v3.3+)
## Important Implementation Details
### Tool Schema Format
All tools follow Claude/Anthropic function calling format:
```python
{
"type": "function",
"function": {
"name": "tool_name",
"description": "Clear description for LLM",
"parameters": {
"type": "object",
"properties": {...},
"required": [...]
}
}
}
```
### Path Handling (Critical for Coder)
- **Planner must provide FULL ABSOLUTE PATHS** (e.g., `/tmp/codepilot_repos/flask_abc123/examples/app.py`)
- **Coder uses paths EXACTLY as written** in the plan
- Repository path is injected in Chainlit context (see `chainlit_app.py:661-672`)
### File Operations
- `write_file` auto-creates parent directories
- `run_command` has 30-second timeout
- All tool functions return formatted strings (success messages or errors)
### Version Tracking
Files include version constants for debugging hot-reload issues:
- `orchestrator.py:12` - `ORCHESTRATOR_VERSION`
- `chainlit_app.py:25-26` - `APP_VERSION`, `BUILD_ID`
### Conversation Management
Agents use `ConversationManager` (`codepilot/agents/conversation.py`) to maintain message history in OpenAI/Anthropic format. This handles:
- System/user/assistant messages
- Tool calls and tool results
- Proper formatting for both Claude and OpenAI APIs
## Critical Files
- `codepilot/agents/orchestrator.py` - Task classification and multi-agent state machine
- `codepilot/agents/planner_agent.py` - Pure LLM planning (no tools) + clarification questions
- `codepilot/agents/coder_agent.py` - Code implementation (no search tools)
- `codepilot/agents/explorer_agent.py` - Codebase exploration (search tools only)
- `codepilot/context/hybrid_retriever.py` - RRF fusion algorithm
- `codepilot/tools/registry.py` - Tool schemas and function mappings
- `chainlit_app.py` - Interactive UI with GitHub repo cloning and progress visualization
- `requirements.txt` - Python dependencies
## Project Structure
```
codepilot/
β”œβ”€β”€ llm/ # LLM client wrappers (Claude, OpenAI)
β”œβ”€β”€ agents/ # Multi-agent system (Orchestrator, Planner, Coder, Reviewer, Explorer)
β”œβ”€β”€ tools/ # Tool implementations (file ops, context search, GitHub)
β”œβ”€β”€ context/ # Hybrid retrieval (BM25, embeddings, parser, indexer)
└── sandbox/ # E2B sandbox integration
chainlit_app.py # Main UI application
```