Spaces:

ayushm98
/

codepilot

Running

App Files Files Community

codepilot / CLAUDE.md

ayushm98

Improve UI: cleaner welcome, better progress display, simplified results

6f39ef4 1 day ago

preview code

raw

history blame contribute delete

9.13 kB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

CodePilot - An autonomous AI coding agent that takes GitHub issues, understands codebases, writes code in sandboxed environments, and creates pull requests autonomously.

Tech Stack: Python 3.11+, Claude Sonnet 4.5 (Anthropic API), E2B sandboxed execution, LangChain/LangGraph, Chainlit UI Current Phase: Phase 5 Complete (Chainlit UI with multi-agent visualization)

Architecture

Multi-Agent Workflow System

CodePilot uses a dual-mode orchestrator that routes tasks to different workflows:

┌─────────────────────────────────────────────────────────┐
│                    ORCHESTRATOR                         │
│              (Task Classification)                      │
└─────────────────────────────────────────────────────────┘
                         │
         ┌───────────────┴────────────────┐
         │                                │
    "explore"                         "code"
         │                                │
         v                                v
┌────────────────┐         ┌──────────────────────────────┐
│ ExplorerAgent  │         │ Full Multi-Agent Pipeline    │
│   (Direct)     │         │ Explorer → Clarify → Plan    │
└────────────────┘         │         ↓                    │
                           │   Coder ⟷ Reviewer           │
                           │  (iterative)                 │
                           └──────────────────────────────┘

Task Classification Logic (see orchestrator.py:92-201):

Explore tasks: Questions starting with "find", "where", "what", "how", "explain" → Uses ExplorerAgent only
Code tasks: Commands starting with "add", "create", "implement", "fix" → Full pipeline
Short queries (<100 chars) default to explore; long queries default to code

Full Pipeline Flow (code tasks):

Explorer - Gathers codebase context using token-efficient tools
Clarifier - Planner generates questions, pauses for user answers (v3.3+)
Planner - Creates implementation plan (NO tools, pure LLM reasoning)
Coder - Implements code, tests in sandbox (NO search, uses Explorer's context)
Reviewer - Reviews code, approves or sends back to Coder with feedback

Context Engineering (Hybrid Retrieval)

The core differentiator is Reciprocal Rank Fusion (RRF) combining two search methods:

Query → ┌─ BM25 (keyword) ──────┐
        │                       │
        ├─ Embeddings (semantic)┤ → RRF Fusion → Top K Results
        │   (sentence-transformers)
        └───────────────────────┘

Implementation: codepilot/context/hybrid_retriever.py

BM25: Exact matches (function names, variable names)
Embeddings: Semantic matches (related concepts)
RRF formula: score = Σ(weight_i / (k + rank_i)) where k=60
Default weights: 50% BM25, 50% embeddings

Token-Efficient Tools

Critical for cost management - agents should prefer:

get_file_outline(path) - Shows class/function signatures (~50 tokens vs ~2000 for full file)
get_code_chunk(path, name) - Extracts specific function/class by name
search_repository(query) - Hybrid search (use BEFORE reading files)

Only use read_file when you need complete file contents.

Agent Tool Access (v3.0+ separation)

Each agent has restricted tool access to prevent inefficiency:

ExplorerAgent: search_repository, get_file_outline, get_code_chunk, search_code, list_files
PlannerAgent: NO TOOLS (pure LLM reasoning, receives exploration context)
CoderAgent: write_file, get_code_chunk, read_file (NO search tools)
ReviewerAgent: get_file_outline, get_code_chunk, read_file

Key insight: v3.0 removed duplicate searching. Explorer searches once, all agents reuse that context.

Development Commands

Setup:

python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
pip install -r requirements.txt

Verify installation:

python test_setup.py  # Checks API keys are loaded

Run Chainlit UI (Primary interface):

chainlit run chainlit_app.py
# Opens at http://localhost:8000
# Ctrl+C to stop, then pkill -f chainlit to clean up background processes

Test individual components:

# Context Engineering (Phase 2)
python test_context.py

# Multi-Agent Workflow (Phase 3)
python test_multi_agent.py

# E2B Sandbox (Phase 4)
python test_sandbox.py
python test_workflow_with_sandbox.py

Environment variables (create .env file):

ANTHROPIC_API_KEY=sk-ant-...
E2B_API_KEY=e2b_...

Current Implementation Status

✅ COMPLETED (Phases 1-5):

Phase 1: LLM client, tool registry, base agent, core tools
Phase 2: Hybrid retrieval (BM25 + embeddings), AST-aware parsing, codebase indexing
Phase 3: Multi-agent architecture (Explorer, Planner, Coder, Reviewer, Orchestrator)
Phase 4: E2B sandbox integration for isolated code execution
Phase 5: Chainlit UI with real-time agent progress visualization

🚧 NEXT PHASES:

Phase 6 (Weeks 17-18): GitHub Integration - webhooks, automated PR creation
Phase 7 (Weeks 19-21): Evals & Benchmarks - SWE-bench evaluation
Phase 8 (Weeks 22-24): Production Hardening - error handling, monitoring, deployment

See devon-project-plan.md for complete 24-week roadmap.

Key Design Principles

Context Engineering is the Differentiator - Not UI/UX, the hybrid retrieval and AST-aware chunking
ReAct Pattern - All agents use: Reason → Act (with tools) → Observe → Repeat
AST-Aware Processing - Parse code structurally using tree-sitter, not as text
Sandboxed Execution - All code runs in E2B containers, never on host machine
Single-Search Architecture - Explorer searches once, all downstream agents reuse context (v3.0+)
Clarification Before Action - Planner asks questions before creating plan (v3.3+)

Important Implementation Details

Tool Schema Format

All tools follow Claude/Anthropic function calling format:

{
    "type": "function",
    "function": {
        "name": "tool_name",
        "description": "Clear description for LLM",
        "parameters": {
            "type": "object",
            "properties": {...},
            "required": [...]
        }
    }
}

Path Handling (Critical for Coder)

Planner must provide FULL ABSOLUTE PATHS (e.g., /tmp/codepilot_repos/flask_abc123/examples/app.py)
Coder uses paths EXACTLY as written in the plan
Repository path is injected in Chainlit context (see chainlit_app.py:661-672)

File Operations

write_file auto-creates parent directories
run_command has 30-second timeout
All tool functions return formatted strings (success messages or errors)

Version Tracking

Files include version constants for debugging hot-reload issues:

orchestrator.py:12 - ORCHESTRATOR_VERSION
chainlit_app.py:25-26 - APP_VERSION, BUILD_ID

Conversation Management

Agents use ConversationManager (codepilot/agents/conversation.py) to maintain message history in OpenAI/Anthropic format. This handles:

System/user/assistant messages
Tool calls and tool results
Proper formatting for both Claude and OpenAI APIs

Critical Files

codepilot/agents/orchestrator.py - Task classification and multi-agent state machine
codepilot/agents/planner_agent.py - Pure LLM planning (no tools) + clarification questions
codepilot/agents/coder_agent.py - Code implementation (no search tools)
codepilot/agents/explorer_agent.py - Codebase exploration (search tools only)
codepilot/context/hybrid_retriever.py - RRF fusion algorithm
codepilot/tools/registry.py - Tool schemas and function mappings
chainlit_app.py - Interactive UI with GitHub repo cloning and progress visualization
requirements.txt - Python dependencies

Project Structure

codepilot/
├── llm/               # LLM client wrappers (Claude, OpenAI)
├── agents/            # Multi-agent system (Orchestrator, Planner, Coder, Reviewer, Explorer)
├── tools/             # Tool implementations (file ops, context search, GitHub)
├── context/           # Hybrid retrieval (BM25, embeddings, parser, indexer)
└── sandbox/           # E2B sandbox integration
chainlit_app.py        # Main UI application