Spaces:

ayushm98
/

codepilot

Running

App Files Files Community

codepilot / CLAUDE.md

ayushm98

Improve UI: cleaner welcome, better progress display, simplified results

6f39ef4 2 days ago

preview code

raw

history blame contribute delete

9.13 kB

	# CLAUDE.md

	This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

	## Project Overview

	CodePilot - An autonomous AI coding agent that takes GitHub issues, understands codebases, writes code in sandboxed environments, and creates pull requests autonomously.

	Tech Stack: Python 3.11+, Claude Sonnet 4.5 (Anthropic API), E2B sandboxed execution, LangChain/LangGraph, Chainlit UI
	Current Phase: Phase 5 Complete (Chainlit UI with multi-agent visualization)

	## Architecture

	### Multi-Agent Workflow System

	CodePilot uses a dual-mode orchestrator that routes tasks to different workflows:

	```
	┌─────────────────────────────────────────────────────────┐
	│ ORCHESTRATOR │
	│ (Task Classification) │
	└─────────────────────────────────────────────────────────┘
	│
	┌───────────────┴────────────────┐
	│ │
	"explore" "code"
	│ │
	v v
	┌────────────────┐ ┌──────────────────────────────┐
	│ ExplorerAgent │ │ Full Multi-Agent Pipeline │
	│ (Direct) │ │ Explorer → Clarify → Plan │
	└────────────────┘ │ ↓ │
	│ Coder ⟷ Reviewer │
	│ (iterative) │
	└──────────────────────────────┘
	```

	Task Classification Logic (see `orchestrator.py:92-201`):
	- Explore tasks: Questions starting with "find", "where", "what", "how", "explain" → Uses ExplorerAgent only
	- Code tasks: Commands starting with "add", "create", "implement", "fix" → Full pipeline
	- Short queries (<100 chars) default to explore; long queries default to code

	Full Pipeline Flow (code tasks):
	1. Explorer - Gathers codebase context using token-efficient tools
	2. Clarifier - Planner generates questions, pauses for user answers (v3.3+)
	3. Planner - Creates implementation plan (NO tools, pure LLM reasoning)
	4. Coder - Implements code, tests in sandbox (NO search, uses Explorer's context)
	5. Reviewer - Reviews code, approves or sends back to Coder with feedback

	### Context Engineering (Hybrid Retrieval)

	The core differentiator is Reciprocal Rank Fusion (RRF) combining two search methods:

	```
	Query → ┌─ BM25 (keyword) ──────┐
	│ │
	├─ Embeddings (semantic)┤ → RRF Fusion → Top K Results
	│ (sentence-transformers)
	└───────────────────────┘
	```

	Implementation: `codepilot/context/hybrid_retriever.py`
	- BM25: Exact matches (function names, variable names)
	- Embeddings: Semantic matches (related concepts)
	- RRF formula: `score = Σ(weight_i / (k + rank_i))` where k=60
	- Default weights: 50% BM25, 50% embeddings

	### Token-Efficient Tools

	Critical for cost management - agents should prefer:
	1. `get_file_outline(path)` - Shows class/function signatures (~50 tokens vs ~2000 for full file)
	2. `get_code_chunk(path, name)` - Extracts specific function/class by name
	3. `search_repository(query)` - Hybrid search (use BEFORE reading files)

	Only use `read_file` when you need complete file contents.

	### Agent Tool Access (v3.0+ separation)

	Each agent has restricted tool access to prevent inefficiency:

	- ExplorerAgent: `search_repository`, `get_file_outline`, `get_code_chunk`, `search_code`, `list_files`
	- PlannerAgent: NO TOOLS (pure LLM reasoning, receives exploration context)
	- CoderAgent: `write_file`, `get_code_chunk`, `read_file` (NO search tools)
	- ReviewerAgent: `get_file_outline`, `get_code_chunk`, `read_file`

	Key insight: v3.0 removed duplicate searching. Explorer searches once, all agents reuse that context.

	## Development Commands

	Setup:
	```bash
	python -m venv venv
	source venv/bin/activate # Windows: venv\Scripts\activate
	pip install -r requirements.txt
	```

	Verify installation:
	```bash
	python test_setup.py # Checks API keys are loaded
	```

	Run Chainlit UI (Primary interface):
	```bash
	chainlit run chainlit_app.py
	# Opens at http://localhost:8000
	# Ctrl+C to stop, then pkill -f chainlit to clean up background processes
	```

	Test individual components:
	```bash
	# Context Engineering (Phase 2)
	python test_context.py

	# Multi-Agent Workflow (Phase 3)
	python test_multi_agent.py

	# E2B Sandbox (Phase 4)
	python test_sandbox.py
	python test_workflow_with_sandbox.py
	```

	Environment variables (create `.env` file):
	```
	ANTHROPIC_API_KEY=sk-ant-...
	E2B_API_KEY=e2b_...
	```

	## Current Implementation Status

	✅ COMPLETED (Phases 1-5):
	- Phase 1: LLM client, tool registry, base agent, core tools
	- Phase 2: Hybrid retrieval (BM25 + embeddings), AST-aware parsing, codebase indexing
	- Phase 3: Multi-agent architecture (Explorer, Planner, Coder, Reviewer, Orchestrator)
	- Phase 4: E2B sandbox integration for isolated code execution
	- Phase 5: Chainlit UI with real-time agent progress visualization

	🚧 NEXT PHASES:
	- Phase 6 (Weeks 17-18): GitHub Integration - webhooks, automated PR creation
	- Phase 7 (Weeks 19-21): Evals & Benchmarks - SWE-bench evaluation
	- Phase 8 (Weeks 22-24): Production Hardening - error handling, monitoring, deployment

	See `devon-project-plan.md` for complete 24-week roadmap.

	## Key Design Principles

	1. Context Engineering is the Differentiator - Not UI/UX, the hybrid retrieval and AST-aware chunking
	2. ReAct Pattern - All agents use: Reason → Act (with tools) → Observe → Repeat
	3. AST-Aware Processing - Parse code structurally using tree-sitter, not as text
	4. Sandboxed Execution - All code runs in E2B containers, never on host machine
	5. Single-Search Architecture - Explorer searches once, all downstream agents reuse context (v3.0+)
	6. Clarification Before Action - Planner asks questions before creating plan (v3.3+)

	## Important Implementation Details

	### Tool Schema Format
	All tools follow Claude/Anthropic function calling format:
	```python
	{
	"type": "function",
	"function": {
	"name": "tool_name",
	"description": "Clear description for LLM",
	"parameters": {
	"type": "object",
	"properties": {...},
	"required": [...]
	}
	}
	}
	```

	### Path Handling (Critical for Coder)
	- Planner must provide FULL ABSOLUTE PATHS (e.g., `/tmp/codepilot_repos/flask_abc123/examples/app.py`)
	- Coder uses paths EXACTLY as written in the plan
	- Repository path is injected in Chainlit context (see `chainlit_app.py:661-672`)

	### File Operations
	- `write_file` auto-creates parent directories
	- `run_command` has 30-second timeout
	- All tool functions return formatted strings (success messages or errors)

	### Version Tracking
	Files include version constants for debugging hot-reload issues:
	- `orchestrator.py:12` - `ORCHESTRATOR_VERSION`
	- `chainlit_app.py:25-26` - `APP_VERSION`, `BUILD_ID`

	### Conversation Management
	Agents use `ConversationManager` (`codepilot/agents/conversation.py`) to maintain message history in OpenAI/Anthropic format. This handles:
	- System/user/assistant messages
	- Tool calls and tool results
	- Proper formatting for both Claude and OpenAI APIs

	## Critical Files

	- `codepilot/agents/orchestrator.py` - Task classification and multi-agent state machine
	- `codepilot/agents/planner_agent.py` - Pure LLM planning (no tools) + clarification questions
	- `codepilot/agents/coder_agent.py` - Code implementation (no search tools)
	- `codepilot/agents/explorer_agent.py` - Codebase exploration (search tools only)
	- `codepilot/context/hybrid_retriever.py` - RRF fusion algorithm
	- `codepilot/tools/registry.py` - Tool schemas and function mappings
	- `chainlit_app.py` - Interactive UI with GitHub repo cloning and progress visualization
	- `requirements.txt` - Python dependencies

	## Project Structure

	```
	codepilot/
	├── llm/ # LLM client wrappers (Claude, OpenAI)
	├── agents/ # Multi-agent system (Orchestrator, Planner, Coder, Reviewer, Explorer)
	├── tools/ # Tool implementations (file ops, context search, GitHub)
	├── context/ # Hybrid retrieval (BM25, embeddings, parser, indexer)
	└── sandbox/ # E2B sandbox integration
	chainlit_app.py # Main UI application
	```