# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

**CodePilot** - An autonomous AI coding agent that takes GitHub issues, understands codebases, writes code in sandboxed environments, and creates pull requests autonomously.

**Tech Stack:** Python 3.11+, Claude Sonnet 4.5 (Anthropic API), E2B sandboxed execution, LangChain/LangGraph, Chainlit UI
**Current Phase:** Phase 5 Complete (Chainlit UI with multi-agent visualization)

## Architecture

### Multi-Agent Workflow System

CodePilot uses a **dual-mode orchestrator** that routes tasks to different workflows:

```
┌─────────────────────────────────────────────────────────┐
│                    ORCHESTRATOR                         │
│              (Task Classification)                      │
└─────────────────────────────────────────────────────────┘
                         │
         ┌───────────────┴────────────────┐
         │                                │
    "explore"                         "code"
         │                                │
         v                                v
┌────────────────┐         ┌──────────────────────────────┐
│ ExplorerAgent  │         │ Full Multi-Agent Pipeline    │
│   (Direct)     │         │ Explorer → Clarify → Plan    │
└────────────────┘         │         ↓                    │
                           │   Coder ⟷ Reviewer           │
                           │  (iterative)                 │
                           └──────────────────────────────┘
```

**Task Classification Logic** (see `orchestrator.py:92-201`):
- **Explore tasks**: Questions starting with "find", "where", "what", "how", "explain" → Uses ExplorerAgent only
- **Code tasks**: Commands starting with "add", "create", "implement", "fix" → Full pipeline
- Short queries (<100 chars) default to explore; long queries default to code

**Full Pipeline Flow** (code tasks):
1. **Explorer** - Gathers codebase context using token-efficient tools
2. **Clarifier** - Planner generates questions, pauses for user answers (v3.3+)
3. **Planner** - Creates implementation plan (NO tools, pure LLM reasoning)
4. **Coder** - Implements code, tests in sandbox (NO search, uses Explorer's context)
5. **Reviewer** - Reviews code, approves or sends back to Coder with feedback

### Context Engineering (Hybrid Retrieval)

The core differentiator is **Reciprocal Rank Fusion (RRF)** combining two search methods:

```
Query → ┌─ BM25 (keyword) ──────┐
        │                       │
        ├─ Embeddings (semantic)┤ → RRF Fusion → Top K Results
        │   (sentence-transformers)
        └───────────────────────┘
```

**Implementation**: `codepilot/context/hybrid_retriever.py`
- BM25: Exact matches (function names, variable names)
- Embeddings: Semantic matches (related concepts)
- RRF formula: `score = Σ(weight_i / (k + rank_i))` where k=60
- Default weights: 50% BM25, 50% embeddings

### Token-Efficient Tools

**Critical for cost management** - agents should prefer:
1. `get_file_outline(path)` - Shows class/function signatures (~50 tokens vs ~2000 for full file)
2. `get_code_chunk(path, name)` - Extracts specific function/class by name
3. `search_repository(query)` - Hybrid search (use BEFORE reading files)

Only use `read_file` when you need complete file contents.

### Agent Tool Access (v3.0+ separation)

Each agent has **restricted tool access** to prevent inefficiency:

- **ExplorerAgent**: `search_repository`, `get_file_outline`, `get_code_chunk`, `search_code`, `list_files`
- **PlannerAgent**: **NO TOOLS** (pure LLM reasoning, receives exploration context)
- **CoderAgent**: `write_file`, `get_code_chunk`, `read_file` (NO search tools)
- **ReviewerAgent**: `get_file_outline`, `get_code_chunk`, `read_file`

**Key insight**: v3.0 removed duplicate searching. Explorer searches once, all agents reuse that context.

## Development Commands

**Setup:**
```bash
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
pip install -r requirements.txt
```

**Verify installation:**
```bash
python test_setup.py  # Checks API keys are loaded
```

**Run Chainlit UI (Primary interface):**
```bash
chainlit run chainlit_app.py
# Opens at http://localhost:8000
# Ctrl+C to stop, then pkill -f chainlit to clean up background processes
```

**Test individual components:**
```bash
# Context Engineering (Phase 2)
python test_context.py

# Multi-Agent Workflow (Phase 3)
python test_multi_agent.py

# E2B Sandbox (Phase 4)
python test_sandbox.py
python test_workflow_with_sandbox.py
```

**Environment variables** (create `.env` file):
```
ANTHROPIC_API_KEY=sk-ant-...
E2B_API_KEY=e2b_...
```

## Current Implementation Status

**✅ COMPLETED (Phases 1-5):**
- Phase 1: LLM client, tool registry, base agent, core tools
- Phase 2: Hybrid retrieval (BM25 + embeddings), AST-aware parsing, codebase indexing
- Phase 3: Multi-agent architecture (Explorer, Planner, Coder, Reviewer, Orchestrator)
- Phase 4: E2B sandbox integration for isolated code execution
- Phase 5: Chainlit UI with real-time agent progress visualization

**🚧 NEXT PHASES:**
- Phase 6 (Weeks 17-18): GitHub Integration - webhooks, automated PR creation
- Phase 7 (Weeks 19-21): Evals & Benchmarks - SWE-bench evaluation
- Phase 8 (Weeks 22-24): Production Hardening - error handling, monitoring, deployment

See `devon-project-plan.md` for complete 24-week roadmap.

## Key Design Principles

1. **Context Engineering is the Differentiator** - Not UI/UX, the hybrid retrieval and AST-aware chunking
2. **ReAct Pattern** - All agents use: Reason → Act (with tools) → Observe → Repeat
3. **AST-Aware Processing** - Parse code structurally using tree-sitter, not as text
4. **Sandboxed Execution** - All code runs in E2B containers, never on host machine
5. **Single-Search Architecture** - Explorer searches once, all downstream agents reuse context (v3.0+)
6. **Clarification Before Action** - Planner asks questions before creating plan (v3.3+)

## Important Implementation Details

### Tool Schema Format
All tools follow Claude/Anthropic function calling format:
```python
{
    "type": "function",
    "function": {
        "name": "tool_name",
        "description": "Clear description for LLM",
        "parameters": {
            "type": "object",
            "properties": {...},
            "required": [...]
        }
    }
}
```

### Path Handling (Critical for Coder)
- **Planner must provide FULL ABSOLUTE PATHS** (e.g., `/tmp/codepilot_repos/flask_abc123/examples/app.py`)
- **Coder uses paths EXACTLY as written** in the plan
- Repository path is injected in Chainlit context (see `chainlit_app.py:661-672`)

### File Operations
- `write_file` auto-creates parent directories
- `run_command` has 30-second timeout
- All tool functions return formatted strings (success messages or errors)

### Version Tracking
Files include version constants for debugging hot-reload issues:
- `orchestrator.py:12` - `ORCHESTRATOR_VERSION`
- `chainlit_app.py:25-26` - `APP_VERSION`, `BUILD_ID`

### Conversation Management
Agents use `ConversationManager` (`codepilot/agents/conversation.py`) to maintain message history in OpenAI/Anthropic format. This handles:
- System/user/assistant messages
- Tool calls and tool results
- Proper formatting for both Claude and OpenAI APIs

## Critical Files

- `codepilot/agents/orchestrator.py` - Task classification and multi-agent state machine
- `codepilot/agents/planner_agent.py` - Pure LLM planning (no tools) + clarification questions
- `codepilot/agents/coder_agent.py` - Code implementation (no search tools)
- `codepilot/agents/explorer_agent.py` - Codebase exploration (search tools only)
- `codepilot/context/hybrid_retriever.py` - RRF fusion algorithm
- `codepilot/tools/registry.py` - Tool schemas and function mappings
- `chainlit_app.py` - Interactive UI with GitHub repo cloning and progress visualization
- `requirements.txt` - Python dependencies

## Project Structure

```
codepilot/
├── llm/               # LLM client wrappers (Claude, OpenAI)
├── agents/            # Multi-agent system (Orchestrator, Planner, Coder, Reviewer, Explorer)
├── tools/             # Tool implementations (file ops, context search, GitHub)
├── context/           # Hybrid retrieval (BM25, embeddings, parser, indexer)
└── sandbox/           # E2B sandbox integration
chainlit_app.py        # Main UI application
```