v3.2: Complete multi-agent workflow with Explorer-first architecture
Browse filesFeatures:
- ExplorerAgent for token-efficient codebase exploration
- Task classifier (explore vs code queries)
- Claude Sonnet 4.5 integration (200K context)
- Token-efficient tools (get_file_outline, get_code_chunk)
- Optimized workflow: Explorer → Planner → Coder → Reviewer
- Version tracking for debugging deployments
Architecture:
- Explorer does all searching (BM25 + embeddings)
- Planner creates plans (no tools, pure LLM)
- Coder implements (no search, uses exploration context)
- Reviewer validates (read-only tools)
- .gitignore +36 -0
- CLAUDE.md +171 -0
- Dockerfile +2 -1
- README.md +2 -2
- chainlit_app.py +195 -91
- codepilot/agents/__init__.py +24 -0
- codepilot/agents/base_agent.py +14 -7
- codepilot/agents/coder_agent.py +101 -69
- codepilot/agents/conversation.py +23 -4
- codepilot/agents/explorer_agent.py +168 -0
- codepilot/agents/orchestrator.py +224 -29
- codepilot/agents/planner_agent.py +87 -116
- codepilot/agents/reviewer_agent.py +41 -37
- codepilot/context/indexer.py +5 -1
- codepilot/llm/claude_client.py +235 -0
- codepilot/llm/client.py +1 -1
- codepilot/tools/context_tools.py +19 -1
- codepilot/tools/file_tools.py +18 -3
- codepilot/tools/github_tools.py +211 -0
- codepilot/tools/registry.py +172 -17
- requirements.txt +9 -3
.gitignore
ADDED
|
@@ -0,0 +1,36 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Environment
|
| 2 |
+
.env
|
| 3 |
+
.env.local
|
| 4 |
+
|
| 5 |
+
# Python
|
| 6 |
+
__pycache__/
|
| 7 |
+
*.py[cod]
|
| 8 |
+
*$py.class
|
| 9 |
+
*.so
|
| 10 |
+
.Python
|
| 11 |
+
venv/
|
| 12 |
+
env/
|
| 13 |
+
.venv/
|
| 14 |
+
|
| 15 |
+
# IDE
|
| 16 |
+
.idea/
|
| 17 |
+
.vscode/
|
| 18 |
+
*.swp
|
| 19 |
+
*.swo
|
| 20 |
+
|
| 21 |
+
# OS
|
| 22 |
+
.DS_Store
|
| 23 |
+
Thumbs.db
|
| 24 |
+
|
| 25 |
+
# Project specific
|
| 26 |
+
.codepilot_cache/
|
| 27 |
+
.chainlit/
|
| 28 |
+
|
| 29 |
+
# Claude Code
|
| 30 |
+
.claude/
|
| 31 |
+
|
| 32 |
+
# Test files
|
| 33 |
+
manual_tests/
|
| 34 |
+
|
| 35 |
+
# Logs
|
| 36 |
+
*.log
|
CLAUDE.md
ADDED
|
@@ -0,0 +1,171 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# CLAUDE.md
|
| 2 |
+
|
| 3 |
+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
| 4 |
+
|
| 5 |
+
## Project Overview
|
| 6 |
+
|
| 7 |
+
**CodePilot** - An autonomous AI coding agent that takes GitHub issues, understands codebases, writes code in sandboxed environments, and creates pull requests autonomously.
|
| 8 |
+
|
| 9 |
+
**Tech Stack:** Python 3.11+, Claude Sonnet 4.5 (Anthropic API), E2B sandboxed execution, LangChain/LangGraph
|
| 10 |
+
**Development Timeline:** 24-week phased implementation (currently in Phase 5: Chainlit UI - COMPLETE)
|
| 11 |
+
|
| 12 |
+
## Architecture
|
| 13 |
+
|
| 14 |
+
This project follows a layered architecture (planned - see devon-project-plan.md for full roadmap):
|
| 15 |
+
|
| 16 |
+
```
|
| 17 |
+
Multi-Agent System (Planner, Coder, Reviewer)
|
| 18 |
+
↓
|
| 19 |
+
Context Engine (Hybrid Retrieval, AST-aware chunking)
|
| 20 |
+
↓
|
| 21 |
+
Tool Layer (read_file, write_file, run_command, search_code)
|
| 22 |
+
↓
|
| 23 |
+
E2B Sandbox (Isolated code execution)
|
| 24 |
+
```
|
| 25 |
+
|
| 26 |
+
## Current Implementation Status
|
| 27 |
+
|
| 28 |
+
**✅ COMPLETED PHASES:**
|
| 29 |
+
|
| 30 |
+
**Phase 1: Foundation (Weeks 1-3)**
|
| 31 |
+
- ✅ LLM client wrapper (`codepilot/llm/claude_client.py`) - Claude API with tool calling
|
| 32 |
+
- ✅ Tool registry (`codepilot/tools/registry.py`) - Function calling infrastructure
|
| 33 |
+
- ✅ Base agent (`codepilot/agents/base_agent.py`) - Core ReAct loop
|
| 34 |
+
- ✅ Core tools: `read_file`, `write_file`, `run_command`, `search_codebase`, `list_files`
|
| 35 |
+
|
| 36 |
+
**Phase 2: Context Engineering (Weeks 4-8)**
|
| 37 |
+
- ✅ BM25 keyword search (`codepilot/context/bm25_search.py`)
|
| 38 |
+
- ✅ Dense embeddings (`codepilot/context/embeddings.py`) - sentence-transformers
|
| 39 |
+
- ✅ Hybrid retrieval (`codepilot/context/retrieval.py`) - Combined BM25 + semantic search
|
| 40 |
+
- ✅ Code parser (`codepilot/context/parser.py`) - AST-aware chunking
|
| 41 |
+
- ✅ Codebase indexer (`codepilot/context/indexer.py`) - Full codebase indexing
|
| 42 |
+
- ✅ Context selector (`codepilot/context/selector.py`) - Smart context selection
|
| 43 |
+
- ✅ Context tools: `index_codebase`, `search_codebase`, `get_relevant_context`
|
| 44 |
+
|
| 45 |
+
**Phase 3: Multi-Agent Architecture (Weeks 9-12)**
|
| 46 |
+
- ✅ Planner agent (`codepilot/agents/planner_agent.py`) - Creates implementation plans
|
| 47 |
+
- ✅ Coder agent (`codepilot/agents/coder_agent.py`) - Writes and tests code
|
| 48 |
+
- ✅ Reviewer agent (`codepilot/agents/reviewer_agent.py`) - Code review and approval
|
| 49 |
+
- ✅ Orchestrator (`codepilot/agents/orchestrator.py`) - State machine coordination
|
| 50 |
+
|
| 51 |
+
**Phase 4: E2B Sandbox Integration (Weeks 13-14)**
|
| 52 |
+
- ✅ E2B sandbox manager (`codepilot/sandbox/e2b_sandbox.py`) - Isolated execution
|
| 53 |
+
- ✅ Sandbox tools (`codepilot/sandbox/sandbox_tools.py`) - upload, execute, run commands
|
| 54 |
+
- ✅ Integration with Coder agent - Automatic sandbox testing workflow
|
| 55 |
+
|
| 56 |
+
**Phase 5: Chainlit UI (Weeks 15-16)**
|
| 57 |
+
- ✅ Chainlit application (`chainlit_app.py`) - Interactive chat interface
|
| 58 |
+
- ✅ Real-time workflow visualization with Chainlit Steps
|
| 59 |
+
- ✅ Detailed agent progress tracking (Planner → Coder → Reviewer)
|
| 60 |
+
- ✅ Code preview and test results display
|
| 61 |
+
- ✅ User guide (`CHAINLIT_GUIDE.md`)
|
| 62 |
+
|
| 63 |
+
**NEXT PHASES:**
|
| 64 |
+
|
| 65 |
+
**Phase 6: GitHub Integration (Weeks 17-18)** - Not started
|
| 66 |
+
- GitHub webhooks for issue tracking
|
| 67 |
+
- Automated PR creation
|
| 68 |
+
- Branch management
|
| 69 |
+
|
| 70 |
+
**Phase 7: Evals & Benchmarks (Weeks 19-21)** - Not started
|
| 71 |
+
- SWE-bench evaluation
|
| 72 |
+
- Custom test suite
|
| 73 |
+
|
| 74 |
+
**Phase 8: Production Hardening (Weeks 22-24)** - Not started
|
| 75 |
+
- Error handling and retries
|
| 76 |
+
- Logging and monitoring
|
| 77 |
+
- Deployment configuration
|
| 78 |
+
|
| 79 |
+
## Development Commands
|
| 80 |
+
|
| 81 |
+
**Setup:**
|
| 82 |
+
```bash
|
| 83 |
+
python -m venv venv
|
| 84 |
+
source venv/bin/activate # On Windows: venv\Scripts\activate
|
| 85 |
+
pip install -r requirements.txt
|
| 86 |
+
```
|
| 87 |
+
|
| 88 |
+
**Verify setup:**
|
| 89 |
+
```bash
|
| 90 |
+
python test_setup.py # Checks that API keys are loaded correctly
|
| 91 |
+
```
|
| 92 |
+
|
| 93 |
+
**Run Chainlit UI (Phase 5):**
|
| 94 |
+
```bash
|
| 95 |
+
chainlit run chainlit_app.py
|
| 96 |
+
# Opens at http://localhost:8000
|
| 97 |
+
# See CHAINLIT_GUIDE.md for full usage guide
|
| 98 |
+
```
|
| 99 |
+
|
| 100 |
+
**Test individual phases:**
|
| 101 |
+
```bash
|
| 102 |
+
# Phase 2: Context Engineering
|
| 103 |
+
python test_context.py
|
| 104 |
+
|
| 105 |
+
# Phase 3: Multi-Agent Workflow
|
| 106 |
+
python test_multi_agent.py
|
| 107 |
+
|
| 108 |
+
# Phase 4: E2B Sandbox
|
| 109 |
+
python test_sandbox.py
|
| 110 |
+
python test_workflow_with_sandbox.py
|
| 111 |
+
```
|
| 112 |
+
|
| 113 |
+
**Environment variables required in .env:**
|
| 114 |
+
```
|
| 115 |
+
ANTHROPIC_API_KEY=sk-ant-...
|
| 116 |
+
E2B_API_KEY=e2b_...
|
| 117 |
+
```
|
| 118 |
+
|
| 119 |
+
## Project Phases (from devon-project-plan.md)
|
| 120 |
+
|
| 121 |
+
1. **Phase 1 (Weeks 1-3):** Foundation - Basic agent loop, tool calling, LLM abstraction
|
| 122 |
+
2. **Phase 2 (Weeks 4-8):** Context Engineering - Hybrid retrieval (BM25 + dense), AST-aware chunking
|
| 123 |
+
3. **Phase 3 (Weeks 9-12):** Multi-Agent Architecture - Orchestrator with specialized agents
|
| 124 |
+
4. **Phase 4 (Weeks 13-14):** E2B Sandbox Integration
|
| 125 |
+
5. **Phase 5 (Weeks 15-16):** Chainlit UI
|
| 126 |
+
6. **Phase 6 (Weeks 17-18):** GitHub Integration (webhooks, PRs)
|
| 127 |
+
7. **Phase 7 (Weeks 19-21):** Evals & Benchmarks (SWE-bench)
|
| 128 |
+
8. **Phase 8 (Weeks 22-24):** Production Hardening
|
| 129 |
+
|
| 130 |
+
## Key Design Principles
|
| 131 |
+
|
| 132 |
+
**From the project plan:**
|
| 133 |
+
- **Focus on Context Engineering:** This is the differentiator, not UI/UX
|
| 134 |
+
- **ReAct Pattern:** Reason about what to do, Act with tools, observe results, repeat
|
| 135 |
+
- **AST-Aware Processing:** Parse code structurally, not as text (tree-sitter for multi-language support)
|
| 136 |
+
- **Hybrid Retrieval:** Combine BM25 (exact matches) + dense embeddings (semantic search)
|
| 137 |
+
- **Sandboxed Execution:** All code runs in E2B containers, never on host
|
| 138 |
+
- **Multi-Agent Orchestration:** Specialized agents (Planner, Coder, Reviewer) coordinated by orchestrator
|
| 139 |
+
|
| 140 |
+
## Tool Schema Format
|
| 141 |
+
|
| 142 |
+
Tools follow Claude/Anthropic function calling format:
|
| 143 |
+
```python
|
| 144 |
+
{
|
| 145 |
+
"type": "function",
|
| 146 |
+
"function": {
|
| 147 |
+
"name": "tool_name",
|
| 148 |
+
"description": "Clear description for LLM to understand when to use",
|
| 149 |
+
"parameters": {
|
| 150 |
+
"type": "object",
|
| 151 |
+
"properties": {...},
|
| 152 |
+
"required": [...]
|
| 153 |
+
}
|
| 154 |
+
}
|
| 155 |
+
}
|
| 156 |
+
```
|
| 157 |
+
|
| 158 |
+
## Implementation Notes
|
| 159 |
+
|
| 160 |
+
- All tool functions return formatted strings (success messages or errors)
|
| 161 |
+
- `write_file` auto-creates parent directories if needed
|
| 162 |
+
- `run_command` has 30-second timeout to prevent hanging
|
| 163 |
+
- Error handling uses specific exceptions (FileNotFoundError, PermissionError) before generic fallback
|
| 164 |
+
|
| 165 |
+
## Important Files
|
| 166 |
+
|
| 167 |
+
- `devon-project-plan.md` - Complete 24-week implementation roadmap with architectural details
|
| 168 |
+
- `codepilot/llm/claude_client.py` - Claude API wrapper with tool calling
|
| 169 |
+
- `codepilot/agents/orchestrator.py` - Multi-agent state machine
|
| 170 |
+
- `requirements.txt` - Python dependencies (anthropic, e2b-code-interpreter, langchain, langgraph)
|
| 171 |
+
- `.env` - API keys (not committed, in .gitignore)
|
Dockerfile
CHANGED
|
@@ -1,4 +1,5 @@
|
|
| 1 |
# HuggingFace Spaces Dockerfile for CodePilot
|
|
|
|
| 2 |
FROM python:3.11-slim
|
| 3 |
|
| 4 |
# Set working directory
|
|
@@ -19,7 +20,7 @@ ENV HOME=/home/user \
|
|
| 19 |
WORKDIR $HOME/app
|
| 20 |
|
| 21 |
# Copy requirements first (for better caching)
|
| 22 |
-
COPY --chown=user requirements
|
| 23 |
|
| 24 |
# Install Python dependencies
|
| 25 |
RUN pip install --no-cache-dir --upgrade pip && \
|
|
|
|
| 1 |
# HuggingFace Spaces Dockerfile for CodePilot
|
| 2 |
+
# BUILD_VERSION: 7 (v3.2 coder no list_files)
|
| 3 |
FROM python:3.11-slim
|
| 4 |
|
| 5 |
# Set working directory
|
|
|
|
| 20 |
WORKDIR $HOME/app
|
| 21 |
|
| 22 |
# Copy requirements first (for better caching)
|
| 23 |
+
COPY --chown=user requirements.txt ./requirements.txt
|
| 24 |
|
| 25 |
# Install Python dependencies
|
| 26 |
RUN pip install --no-cache-dir --upgrade pip && \
|
README.md
CHANGED
|
@@ -56,7 +56,7 @@ User Request
|
|
| 56 |
## Tech Stack
|
| 57 |
|
| 58 |
- **Python** - Core language
|
| 59 |
-
- **
|
| 60 |
- **LangChain/LangGraph** - Agent orchestration
|
| 61 |
- **E2B** - Sandboxed code execution
|
| 62 |
- **Chainlit** - Chat UI
|
|
@@ -65,7 +65,7 @@ User Request
|
|
| 65 |
|
| 66 |
| Variable | Description |
|
| 67 |
|----------|-------------|
|
| 68 |
-
| `
|
| 69 |
| `E2B_API_KEY` | Your E2B sandbox API key |
|
| 70 |
|
| 71 |
## License
|
|
|
|
| 56 |
## Tech Stack
|
| 57 |
|
| 58 |
- **Python** - Core language
|
| 59 |
+
- **Claude Sonnet 4.5** - LLM for agent reasoning (Anthropic API)
|
| 60 |
- **LangChain/LangGraph** - Agent orchestration
|
| 61 |
- **E2B** - Sandboxed code execution
|
| 62 |
- **Chainlit** - Chat UI
|
|
|
|
| 65 |
|
| 66 |
| Variable | Description |
|
| 67 |
|----------|-------------|
|
| 68 |
+
| `ANTHROPIC_API_KEY` | Your Anthropic API key |
|
| 69 |
| `E2B_API_KEY` | Your E2B sandbox API key |
|
| 70 |
|
| 71 |
## License
|
chainlit_app.py
CHANGED
|
@@ -17,16 +17,34 @@ from contextlib import redirect_stdout, redirect_stderr
|
|
| 17 |
import asyncio
|
| 18 |
from concurrent.futures import ThreadPoolExecutor
|
| 19 |
|
| 20 |
-
#
|
| 21 |
-
#
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
|
| 31 |
|
| 32 |
# Authentication disabled for now - uncomment to enable password protection
|
|
@@ -56,54 +74,39 @@ async def start():
|
|
| 56 |
print("[CHAINLIT] on_chat_start triggered") # Debug log
|
| 57 |
|
| 58 |
await cl.Message(
|
| 59 |
-
content="#
|
|
|
|
| 60 |
"I can help you write code, fix bugs, and implement features!\n\n"
|
| 61 |
-
"**How
|
| 62 |
-
"1.
|
| 63 |
-
"2.
|
| 64 |
-
"3.
|
| 65 |
-
"**
|
| 66 |
-
"
|
| 67 |
-
"
|
| 68 |
-
"- Create tests and verify code works\n"
|
| 69 |
-
"- Search and understand your codebase\n\n"
|
| 70 |
-
"**Ready!** What would you like me to build?"
|
| 71 |
).send()
|
| 72 |
|
| 73 |
print("[CHAINLIT] Welcome message sent") # Debug log
|
| 74 |
|
| 75 |
-
#
|
| 76 |
-
|
| 77 |
-
|
| 78 |
-
await cl.Message(content="ℹ️ Running in cloud mode - codebase indexing disabled").send()
|
| 79 |
-
cl.user_session.set("orchestrator", Orchestrator(max_iterations=3))
|
| 80 |
-
cl.user_session.set("ready", True)
|
| 81 |
-
print("[CHAINLIT] Orchestrator created, ready=True")
|
| 82 |
-
return
|
| 83 |
-
|
| 84 |
-
# Index codebase in background (only in local development)
|
| 85 |
-
index_msg = await cl.Message(content="🔍 Indexing codebase...").send()
|
| 86 |
-
|
| 87 |
-
try:
|
| 88 |
-
# Get project root
|
| 89 |
-
project_root = os.path.dirname(os.path.abspath(__file__))
|
| 90 |
-
index_result = index_codebase(project_root)
|
| 91 |
|
| 92 |
-
|
| 93 |
-
|
| 94 |
-
|
|
|
|
|
|
|
| 95 |
|
| 96 |
-
# Store orchestrator in session (reduced iterations to save API credits)
|
| 97 |
-
cl.user_session.set("orchestrator", Orchestrator(max_iterations=3))
|
| 98 |
-
cl.user_session.set("ready", True)
|
| 99 |
|
| 100 |
-
|
| 101 |
-
|
| 102 |
-
|
| 103 |
-
|
| 104 |
-
|
| 105 |
-
|
| 106 |
-
|
|
|
|
| 107 |
|
| 108 |
|
| 109 |
@cl.on_message
|
|
@@ -112,12 +115,112 @@ async def main(message: cl.Message):
|
|
| 112 |
|
| 113 |
# Check if ready
|
| 114 |
if not cl.user_session.get("ready"):
|
| 115 |
-
await cl.Message(content="
|
| 116 |
return
|
| 117 |
|
| 118 |
# Get orchestrator
|
| 119 |
orchestrator: Orchestrator = cl.user_session.get("orchestrator")
|
| 120 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 121 |
# Create a message for streaming logs
|
| 122 |
log_msg = cl.Message(content="")
|
| 123 |
await log_msg.send()
|
|
@@ -130,10 +233,10 @@ async def main(message: cl.Message):
|
|
| 130 |
"""Run orchestrator in thread and capture output."""
|
| 131 |
try:
|
| 132 |
with redirect_stdout(captured_output), redirect_stderr(captured_output):
|
| 133 |
-
return orchestrator.run(
|
| 134 |
except Exception as e:
|
| 135 |
# Capture any exceptions from orchestrator
|
| 136 |
-
print(f"
|
| 137 |
import traceback
|
| 138 |
traceback.print_exc()
|
| 139 |
raise
|
|
@@ -165,10 +268,10 @@ async def main(message: cl.Message):
|
|
| 165 |
filtered_lines = []
|
| 166 |
for line in accumulated_logs.split('\n'):
|
| 167 |
# Extract token usage before filtering (only count each line once!)
|
| 168 |
-
if '
|
| 169 |
seen_token_lines.add(line) # Mark as counted
|
| 170 |
try:
|
| 171 |
-
# Parse: "
|
| 172 |
parts = line.split('Tokens:')[1].strip()
|
| 173 |
prompt = int(parts.split('prompt')[0].strip())
|
| 174 |
completion = int(parts.split('+')[1].split('completion')[0].strip())
|
|
@@ -179,24 +282,25 @@ async def main(message: cl.Message):
|
|
| 179 |
pass
|
| 180 |
|
| 181 |
# Skip token counts, progress bars, and verbose details
|
| 182 |
-
if any(skip in line for skip in ['
|
| 183 |
continue
|
| 184 |
# Keep important lines
|
| 185 |
if any(keep in line for keep in [
|
| 186 |
-
'[ORCHESTRATOR]', '[PLANNER]', '[CODER]', '[REVIEWER]',
|
| 187 |
-
'Calling tool:', '
|
|
|
|
| 188 |
]):
|
| 189 |
filtered_lines.append(line)
|
| 190 |
|
| 191 |
filtered_output = '\n'.join(filtered_lines)
|
| 192 |
|
| 193 |
-
# Calculate cost (
|
| 194 |
-
input_cost = (total_prompt_tokens /
|
| 195 |
-
output_cost = (total_completion_tokens /
|
| 196 |
total_cost = input_cost + output_cost
|
| 197 |
|
| 198 |
# Add usage summary to logs
|
| 199 |
-
usage_summary = f"\n\
|
| 200 |
usage_summary += f" Input: {total_prompt_tokens:,} tokens (${input_cost:.4f})\n"
|
| 201 |
usage_summary += f" Output: {total_completion_tokens:,} tokens (${output_cost:.4f})\n"
|
| 202 |
usage_summary += f" Total: {total_tokens:,} tokens (${total_cost:.4f})"
|
|
@@ -212,42 +316,42 @@ async def main(message: cl.Message):
|
|
| 212 |
final_logs = captured_output.getvalue()
|
| 213 |
|
| 214 |
# Update with final logs
|
| 215 |
-
log_msg.content = f"##
|
| 216 |
await log_msg.update()
|
| 217 |
|
| 218 |
# Send results summary
|
| 219 |
summary_lines = []
|
| 220 |
|
| 221 |
if result.get('plan'):
|
| 222 |
-
summary_lines.append("##
|
| 223 |
-
summary_lines.append(f"
|
| 224 |
|
| 225 |
if result.get('code_changes'):
|
| 226 |
-
summary_lines.append("##
|
| 227 |
-
summary_lines.append(f"
|
| 228 |
for file_path in result['code_changes'].keys():
|
| 229 |
summary_lines.append(f" - {file_path}")
|
| 230 |
summary_lines.append("")
|
| 231 |
|
| 232 |
if result.get('review_feedback'):
|
| 233 |
-
summary_lines.append("##
|
| 234 |
if result.get('success'):
|
| 235 |
-
summary_lines.append("
|
| 236 |
else:
|
| 237 |
-
summary_lines.append("
|
| 238 |
summary_lines.append("")
|
| 239 |
|
| 240 |
-
summary_lines.append("##
|
| 241 |
if result.get('success'):
|
| 242 |
-
summary_lines.append(f"
|
| 243 |
else:
|
| 244 |
-
summary_lines.append(f"
|
| 245 |
|
| 246 |
-
# Add final cost summary
|
| 247 |
-
summary_lines.append("\n##
|
| 248 |
summary_lines.append(f"**Total Tokens:** {total_tokens:,}")
|
| 249 |
-
summary_lines.append(f"- Input: {total_prompt_tokens:,} tokens (${(total_prompt_tokens/
|
| 250 |
-
summary_lines.append(f"- Output: {total_completion_tokens:,} tokens (${(total_completion_tokens/
|
| 251 |
summary_lines.append(f"\n**Estimated Cost:** ${total_cost:.4f}")
|
| 252 |
|
| 253 |
await cl.Message(content="\n".join(summary_lines)).send()
|
|
@@ -258,9 +362,9 @@ async def main(message: cl.Message):
|
|
| 258 |
error_type = type(e).__name__
|
| 259 |
|
| 260 |
if "rate_limit" in error_message.lower() or "429" in error_message:
|
| 261 |
-
user_message = f"""##
|
| 262 |
|
| 263 |
-
|
| 264 |
|
| 265 |
**What to do:**
|
| 266 |
- Wait a few minutes and try again
|
|
@@ -272,15 +376,15 @@ OpenAI API rate limit exceeded. This happens when too many requests are made in
|
|
| 272 |
{error_message}
|
| 273 |
```
|
| 274 |
"""
|
| 275 |
-
elif "insufficient_quota" in error_message.lower():
|
| 276 |
-
user_message = f"""##
|
| 277 |
|
| 278 |
-
Your
|
| 279 |
|
| 280 |
**What to do:**
|
| 281 |
-
- Add credits to your
|
| 282 |
-
- Check your usage at https://
|
| 283 |
-
- Current model:
|
| 284 |
|
| 285 |
**Error details:**
|
| 286 |
```
|
|
@@ -288,13 +392,13 @@ Your OpenAI API credits have been exhausted.
|
|
| 288 |
```
|
| 289 |
"""
|
| 290 |
elif "api_key" in error_message.lower() or "authentication" in error_message.lower():
|
| 291 |
-
user_message = f"""##
|
| 292 |
|
| 293 |
-
There's an issue with your
|
| 294 |
|
| 295 |
**What to do:**
|
| 296 |
-
- Verify your
|
| 297 |
-
- Check that the key is valid at https://
|
| 298 |
- Restart the application after updating .env
|
| 299 |
|
| 300 |
**Error details:**
|
|
@@ -303,7 +407,7 @@ There's an issue with your OpenAI API key.
|
|
| 303 |
```
|
| 304 |
"""
|
| 305 |
elif "timeout" in error_message.lower():
|
| 306 |
-
user_message = f"""##
|
| 307 |
|
| 308 |
The operation took too long and timed out.
|
| 309 |
|
|
@@ -319,7 +423,7 @@ The operation took too long and timed out.
|
|
| 319 |
"""
|
| 320 |
else:
|
| 321 |
# Generic error with helpful context
|
| 322 |
-
user_message = f"""##
|
| 323 |
|
| 324 |
An unexpected error occurred during execution.
|
| 325 |
|
|
|
|
| 17 |
import asyncio
|
| 18 |
from concurrent.futures import ThreadPoolExecutor
|
| 19 |
|
| 20 |
+
# ============================================================
|
| 21 |
+
# STARTUP VERSION CHECK - Change this to detect if rebuild worked
|
| 22 |
+
# ============================================================
|
| 23 |
+
APP_VERSION = "3.2.0-coder-no-list"
|
| 24 |
+
BUILD_ID = "2024-12-19-v6"
|
| 25 |
+
print("=" * 60)
|
| 26 |
+
print(f"[STARTUP] CodePilot Chainlit App")
|
| 27 |
+
print(f"[STARTUP] APP_VERSION: {APP_VERSION}")
|
| 28 |
+
print(f"[STARTUP] BUILD_ID: {BUILD_ID}")
|
| 29 |
+
print("=" * 60)
|
| 30 |
+
# ============================================================
|
| 31 |
+
|
| 32 |
+
# Import full context tools (embeddings + BM25) - requires 16GB+ RAM
|
| 33 |
+
from codepilot.tools.context_tools import index_codebase
|
| 34 |
+
|
| 35 |
+
# Import orchestrator
|
| 36 |
+
from codepilot.agents.orchestrator import Orchestrator, ORCHESTRATOR_VERSION
|
| 37 |
+
|
| 38 |
+
# Print orchestrator version for debugging
|
| 39 |
+
print(f"[STARTUP] ORCHESTRATOR_VERSION: {ORCHESTRATOR_VERSION}")
|
| 40 |
+
|
| 41 |
+
# Import GitHub tools for repo cloning
|
| 42 |
+
from codepilot.tools.github_tools import (
|
| 43 |
+
extract_github_url,
|
| 44 |
+
clone_repository,
|
| 45 |
+
get_repo_info,
|
| 46 |
+
cleanup_repository
|
| 47 |
+
)
|
| 48 |
|
| 49 |
|
| 50 |
# Authentication disabled for now - uncomment to enable password protection
|
|
|
|
| 74 |
print("[CHAINLIT] on_chat_start triggered") # Debug log
|
| 75 |
|
| 76 |
await cl.Message(
|
| 77 |
+
content=f"# CodePilot - Autonomous AI Coding Agent\n\n"
|
| 78 |
+
f"**Version:** `{APP_VERSION}` | **Build:** `{BUILD_ID}`\n\n"
|
| 79 |
"I can help you write code, fix bugs, and implement features!\n\n"
|
| 80 |
+
"**How to use:**\n"
|
| 81 |
+
"1. Paste a **public GitHub URL** and I'll clone and analyze it\n"
|
| 82 |
+
"2. Tell me what you want to build or fix\n"
|
| 83 |
+
"3. Watch my agents (Planner > Coder > Reviewer) work!\n\n"
|
| 84 |
+
"**Example:**\n"
|
| 85 |
+
"```\nAnalyze https://github.com/user/repo and add error handling to the API endpoints\n```\n\n"
|
| 86 |
+
"**Ready!** Paste a GitHub URL or describe your task."
|
|
|
|
|
|
|
|
|
|
| 87 |
).send()
|
| 88 |
|
| 89 |
print("[CHAINLIT] Welcome message sent") # Debug log
|
| 90 |
|
| 91 |
+
# Initialize session variables
|
| 92 |
+
cl.user_session.set("repo_path", None)
|
| 93 |
+
cl.user_session.set("repo_info", None)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 94 |
|
| 95 |
+
# Skip self-indexing - agents will only work with cloned GitHub repos
|
| 96 |
+
# Create orchestrator and mark as ready
|
| 97 |
+
cl.user_session.set("orchestrator", Orchestrator(max_iterations=3))
|
| 98 |
+
cl.user_session.set("ready", True)
|
| 99 |
+
print("[CHAINLIT] Orchestrator created, ready for GitHub repos")
|
| 100 |
|
|
|
|
|
|
|
|
|
|
| 101 |
|
| 102 |
+
@cl.on_chat_end
|
| 103 |
+
async def end():
|
| 104 |
+
"""Cleanup when chat ends."""
|
| 105 |
+
# Clean up any cloned repositories
|
| 106 |
+
repo_path = cl.user_session.get("repo_path")
|
| 107 |
+
if repo_path:
|
| 108 |
+
print(f"[CHAINLIT] Cleaning up repo: {repo_path}")
|
| 109 |
+
cleanup_repository(repo_path)
|
| 110 |
|
| 111 |
|
| 112 |
@cl.on_message
|
|
|
|
| 115 |
|
| 116 |
# Check if ready
|
| 117 |
if not cl.user_session.get("ready"):
|
| 118 |
+
await cl.Message(content="System is still initializing, please wait...").send()
|
| 119 |
return
|
| 120 |
|
| 121 |
# Get orchestrator
|
| 122 |
orchestrator: Orchestrator = cl.user_session.get("orchestrator")
|
| 123 |
|
| 124 |
+
# Check for GitHub URL in message
|
| 125 |
+
github_url = extract_github_url(message.content)
|
| 126 |
+
task_context = ""
|
| 127 |
+
|
| 128 |
+
if github_url:
|
| 129 |
+
# Clone the repository
|
| 130 |
+
clone_msg = await cl.Message(content=f"Cloning repository: `{github_url}`...").send()
|
| 131 |
+
|
| 132 |
+
success, result, repo_name = clone_repository(github_url)
|
| 133 |
+
|
| 134 |
+
if success:
|
| 135 |
+
repo_path = result
|
| 136 |
+
repo_info = get_repo_info(repo_path)
|
| 137 |
+
|
| 138 |
+
# Store in session
|
| 139 |
+
cl.user_session.set("repo_path", repo_path)
|
| 140 |
+
cl.user_session.set("repo_info", repo_info)
|
| 141 |
+
|
| 142 |
+
# Index the repository for search (full BM25 + embeddings)
|
| 143 |
+
try:
|
| 144 |
+
index_result = index_codebase(repo_path)
|
| 145 |
+
print(f"[CHAINLIT] Repository indexed: {index_result}")
|
| 146 |
+
except Exception as e:
|
| 147 |
+
print(f"[CHAINLIT] Indexing failed (non-critical): {e}")
|
| 148 |
+
|
| 149 |
+
# Create context for the task (limited to avoid token overflow)
|
| 150 |
+
languages = ", ".join(repo_info["languages"][:5]) if repo_info["languages"] else "Unknown"
|
| 151 |
+
# Only include first 20 files to keep context small
|
| 152 |
+
sample_files = repo_info["files"][:20] if repo_info["files"] else []
|
| 153 |
+
files_preview = "\n".join(f" - {f}" for f in sample_files)
|
| 154 |
+
if len(repo_info["files"]) > 20:
|
| 155 |
+
files_preview += f"\n ... and {len(repo_info['files']) - 20} more files"
|
| 156 |
+
|
| 157 |
+
task_context = f"""
|
| 158 |
+
[REPOSITORY CONTEXT]
|
| 159 |
+
Repository: {repo_name}
|
| 160 |
+
Path: {repo_path}
|
| 161 |
+
Total Files: {repo_info['total_files']}
|
| 162 |
+
Languages: {languages}
|
| 163 |
+
Sample Files:
|
| 164 |
+
{files_preview}
|
| 165 |
+
|
| 166 |
+
AVAILABLE TOOLS:
|
| 167 |
+
- search_repository: Search this cloned repository using BM25 keyword matching (use this to find functions, classes, or code patterns in the Flask repo)
|
| 168 |
+
- read_file: Read a specific file (use full path: {repo_path}/filename.py)
|
| 169 |
+
- search_code: Grep for exact pattern matches in the repository
|
| 170 |
+
"""
|
| 171 |
+
# Update clone message
|
| 172 |
+
clone_msg.content = f"**Repository cloned successfully!**\n\n" \
|
| 173 |
+
f"- **Name:** {repo_name}\n" \
|
| 174 |
+
f"- **Files:** {repo_info['total_files']}\n" \
|
| 175 |
+
f"- **Languages:** {languages}\n" \
|
| 176 |
+
f"- **Path:** `{repo_path}`"
|
| 177 |
+
await clone_msg.update()
|
| 178 |
+
|
| 179 |
+
else:
|
| 180 |
+
# Clone failed
|
| 181 |
+
clone_msg.content = f"**Failed to clone repository**\n\n{result}\n\n" \
|
| 182 |
+
f"Make sure the repository is public and the URL is correct."
|
| 183 |
+
await clone_msg.update()
|
| 184 |
+
return
|
| 185 |
+
|
| 186 |
+
# Check if we have a repo from previous message
|
| 187 |
+
elif cl.user_session.get("repo_path"):
|
| 188 |
+
repo_path = cl.user_session.get("repo_path")
|
| 189 |
+
repo_info = cl.user_session.get("repo_info")
|
| 190 |
+
if repo_info:
|
| 191 |
+
languages = ", ".join(repo_info["languages"][:5]) if repo_info["languages"] else "Unknown"
|
| 192 |
+
task_context = f"""
|
| 193 |
+
[REPOSITORY CONTEXT]
|
| 194 |
+
Repository: {repo_info['name']}
|
| 195 |
+
Path: {repo_path}
|
| 196 |
+
Total Files: {repo_info['total_files']}
|
| 197 |
+
Languages: {languages}
|
| 198 |
+
|
| 199 |
+
AVAILABLE TOOLS:
|
| 200 |
+
- search_repository: Search this cloned repository using BM25 keyword matching (use this to find functions, classes, or code patterns in the Flask repo)
|
| 201 |
+
- read_file: Read a specific file (use full path: {repo_path}/filename.py)
|
| 202 |
+
- search_code: Grep for exact pattern matches in the repository
|
| 203 |
+
"""
|
| 204 |
+
|
| 205 |
+
# Prepare the full task with context
|
| 206 |
+
# Remove the GitHub URL from the message to get just the user's query
|
| 207 |
+
user_query = message.content
|
| 208 |
+
print(f"[DEBUG] Original message.content: '{message.content}'")
|
| 209 |
+
print(f"[DEBUG] GitHub URL found: '{github_url}'")
|
| 210 |
+
|
| 211 |
+
if github_url:
|
| 212 |
+
# Remove the URL from the message to get the actual task
|
| 213 |
+
import re
|
| 214 |
+
user_query = re.sub(r'https?://github\.com/[^\s]+', '', user_query).strip()
|
| 215 |
+
print(f"[DEBUG] After URL removal: '{user_query}'")
|
| 216 |
+
|
| 217 |
+
full_task = task_context + "\n\n" + user_query if task_context else user_query
|
| 218 |
+
|
| 219 |
+
print(f"[DEBUG] task_context exists: {bool(task_context)}")
|
| 220 |
+
print(f"[DEBUG] task_context length: {len(task_context) if task_context else 0}")
|
| 221 |
+
print(f"[DEBUG] Final user_query: '{user_query}'")
|
| 222 |
+
print(f"[DEBUG] Full task (first 500 chars): '{full_task[:500]}...'")
|
| 223 |
+
|
| 224 |
# Create a message for streaming logs
|
| 225 |
log_msg = cl.Message(content="")
|
| 226 |
await log_msg.send()
|
|
|
|
| 233 |
"""Run orchestrator in thread and capture output."""
|
| 234 |
try:
|
| 235 |
with redirect_stdout(captured_output), redirect_stderr(captured_output):
|
| 236 |
+
return orchestrator.run(full_task)
|
| 237 |
except Exception as e:
|
| 238 |
# Capture any exceptions from orchestrator
|
| 239 |
+
print(f"Error in orchestrator: {str(e)}")
|
| 240 |
import traceback
|
| 241 |
traceback.print_exc()
|
| 242 |
raise
|
|
|
|
| 268 |
filtered_lines = []
|
| 269 |
for line in accumulated_logs.split('\n'):
|
| 270 |
# Extract token usage before filtering (only count each line once!)
|
| 271 |
+
if 'Tokens:' in line and line not in seen_token_lines:
|
| 272 |
seen_token_lines.add(line) # Mark as counted
|
| 273 |
try:
|
| 274 |
+
# Parse: "Tokens: 505 prompt + 20 completion = 525 total"
|
| 275 |
parts = line.split('Tokens:')[1].strip()
|
| 276 |
prompt = int(parts.split('prompt')[0].strip())
|
| 277 |
completion = int(parts.split('+')[1].split('completion')[0].strip())
|
|
|
|
| 282 |
pass
|
| 283 |
|
| 284 |
# Skip token counts, progress bars, and verbose details
|
| 285 |
+
if any(skip in line for skip in ['Tokens:', 'Batches:', '|##', 'it/s]']):
|
| 286 |
continue
|
| 287 |
# Keep important lines
|
| 288 |
if any(keep in line for keep in [
|
| 289 |
+
'[CLASSIFIER]', '[ORCHESTRATOR]', '[PLANNER]', '[CODER]', '[REVIEWER]',
|
| 290 |
+
'[EXPLORER]', 'Calling tool:', 'Tool', 'Transitioning', 'APPROVED', 'REJECTED',
|
| 291 |
+
'[GITHUB]', 'Cloning', 'Repository'
|
| 292 |
]):
|
| 293 |
filtered_lines.append(line)
|
| 294 |
|
| 295 |
filtered_output = '\n'.join(filtered_lines)
|
| 296 |
|
| 297 |
+
# Calculate cost (Claude Sonnet 4.5 pricing: $3/1M input, $15/1M output)
|
| 298 |
+
input_cost = (total_prompt_tokens / 1000000) * 3.0
|
| 299 |
+
output_cost = (total_completion_tokens / 1000000) * 15.0
|
| 300 |
total_cost = input_cost + output_cost
|
| 301 |
|
| 302 |
# Add usage summary to logs
|
| 303 |
+
usage_summary = f"\n\nCREDITS USED:\n"
|
| 304 |
usage_summary += f" Input: {total_prompt_tokens:,} tokens (${input_cost:.4f})\n"
|
| 305 |
usage_summary += f" Output: {total_completion_tokens:,} tokens (${output_cost:.4f})\n"
|
| 306 |
usage_summary += f" Total: {total_tokens:,} tokens (${total_cost:.4f})"
|
|
|
|
| 316 |
final_logs = captured_output.getvalue()
|
| 317 |
|
| 318 |
# Update with final logs
|
| 319 |
+
log_msg.content = f"## Execution Log\n```\n{final_logs}\n```"
|
| 320 |
await log_msg.update()
|
| 321 |
|
| 322 |
# Send results summary
|
| 323 |
summary_lines = []
|
| 324 |
|
| 325 |
if result.get('plan'):
|
| 326 |
+
summary_lines.append("## Planner")
|
| 327 |
+
summary_lines.append(f"Plan created ({len(result['plan'])} chars)\n")
|
| 328 |
|
| 329 |
if result.get('code_changes'):
|
| 330 |
+
summary_lines.append("## Coder")
|
| 331 |
+
summary_lines.append(f"Created {len(result['code_changes'])} file(s):")
|
| 332 |
for file_path in result['code_changes'].keys():
|
| 333 |
summary_lines.append(f" - {file_path}")
|
| 334 |
summary_lines.append("")
|
| 335 |
|
| 336 |
if result.get('review_feedback'):
|
| 337 |
+
summary_lines.append("## Reviewer")
|
| 338 |
if result.get('success'):
|
| 339 |
+
summary_lines.append("Code approved")
|
| 340 |
else:
|
| 341 |
+
summary_lines.append("Needs revision")
|
| 342 |
summary_lines.append("")
|
| 343 |
|
| 344 |
+
summary_lines.append("## Result")
|
| 345 |
if result.get('success'):
|
| 346 |
+
summary_lines.append(f"**Success** (Iterations: {result.get('iterations', 'N/A')})")
|
| 347 |
else:
|
| 348 |
+
summary_lines.append(f"**Incomplete** (Iterations: {result.get('iterations', 'N/A')})")
|
| 349 |
|
| 350 |
+
# Add final cost summary (Claude Sonnet 4.5 pricing: $3/1M input, $15/1M output)
|
| 351 |
+
summary_lines.append("\n## API Credits Used (Claude Sonnet 4.5)")
|
| 352 |
summary_lines.append(f"**Total Tokens:** {total_tokens:,}")
|
| 353 |
+
summary_lines.append(f"- Input: {total_prompt_tokens:,} tokens (${(total_prompt_tokens/1000000)*3.0:.4f})")
|
| 354 |
+
summary_lines.append(f"- Output: {total_completion_tokens:,} tokens (${(total_completion_tokens/1000000)*15.0:.4f})")
|
| 355 |
summary_lines.append(f"\n**Estimated Cost:** ${total_cost:.4f}")
|
| 356 |
|
| 357 |
await cl.Message(content="\n".join(summary_lines)).send()
|
|
|
|
| 362 |
error_type = type(e).__name__
|
| 363 |
|
| 364 |
if "rate_limit" in error_message.lower() or "429" in error_message:
|
| 365 |
+
user_message = f"""## Rate Limit Reached
|
| 366 |
|
| 367 |
+
Claude API rate limit exceeded. This happens when too many requests are made in a short time.
|
| 368 |
|
| 369 |
**What to do:**
|
| 370 |
- Wait a few minutes and try again
|
|
|
|
| 376 |
{error_message}
|
| 377 |
```
|
| 378 |
"""
|
| 379 |
+
elif "insufficient_quota" in error_message.lower() or "credit" in error_message.lower():
|
| 380 |
+
user_message = f"""## API Credits Exhausted
|
| 381 |
|
| 382 |
+
Your Anthropic API credits have been exhausted.
|
| 383 |
|
| 384 |
**What to do:**
|
| 385 |
+
- Add credits to your Anthropic account at https://console.anthropic.com/settings/billing
|
| 386 |
+
- Check your usage at https://console.anthropic.com/settings/usage
|
| 387 |
+
- Current model: Claude Sonnet 4.5 (~$0.20 per task)
|
| 388 |
|
| 389 |
**Error details:**
|
| 390 |
```
|
|
|
|
| 392 |
```
|
| 393 |
"""
|
| 394 |
elif "api_key" in error_message.lower() or "authentication" in error_message.lower():
|
| 395 |
+
user_message = f"""## API Key Error
|
| 396 |
|
| 397 |
+
There's an issue with your Anthropic API key.
|
| 398 |
|
| 399 |
**What to do:**
|
| 400 |
+
- Verify your ANTHROPIC_API_KEY in .env file
|
| 401 |
+
- Check that the key is valid at https://console.anthropic.com/settings/keys
|
| 402 |
- Restart the application after updating .env
|
| 403 |
|
| 404 |
**Error details:**
|
|
|
|
| 407 |
```
|
| 408 |
"""
|
| 409 |
elif "timeout" in error_message.lower():
|
| 410 |
+
user_message = f"""## Request Timeout
|
| 411 |
|
| 412 |
The operation took too long and timed out.
|
| 413 |
|
|
|
|
| 423 |
"""
|
| 424 |
else:
|
| 425 |
# Generic error with helpful context
|
| 426 |
+
user_message = f"""## Error Occurred
|
| 427 |
|
| 428 |
An unexpected error occurred during execution.
|
| 429 |
|
codepilot/agents/__init__.py
CHANGED
|
@@ -0,0 +1,24 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
CodePilot Agents Module
|
| 3 |
+
|
| 4 |
+
This module contains all agent implementations:
|
| 5 |
+
- ExplorerAgent: Lightweight agent for search/exploration queries
|
| 6 |
+
- PlannerAgent: Creates implementation plans
|
| 7 |
+
- CoderAgent: Implements code based on plans
|
| 8 |
+
- ReviewerAgent: Reviews code for quality
|
| 9 |
+
- Orchestrator: Routes tasks and manages multi-agent workflow
|
| 10 |
+
"""
|
| 11 |
+
|
| 12 |
+
from codepilot.agents.explorer_agent import ExplorerAgent
|
| 13 |
+
from codepilot.agents.planner_agent import PlannerAgent
|
| 14 |
+
from codepilot.agents.coder_agent import CoderAgent
|
| 15 |
+
from codepilot.agents.reviewer_agent import ReviewerAgent
|
| 16 |
+
from codepilot.agents.orchestrator import Orchestrator
|
| 17 |
+
|
| 18 |
+
__all__ = [
|
| 19 |
+
"ExplorerAgent",
|
| 20 |
+
"PlannerAgent",
|
| 21 |
+
"CoderAgent",
|
| 22 |
+
"ReviewerAgent",
|
| 23 |
+
"Orchestrator"
|
| 24 |
+
]
|
codepilot/agents/base_agent.py
CHANGED
|
@@ -12,18 +12,22 @@ from codepilot.tools.registry import get_tools, get_tool_function
|
|
| 12 |
class Agent:
|
| 13 |
"""Main agent that executes tasks using LLM and tools"""
|
| 14 |
|
| 15 |
-
def __init__(self, model: str = "
|
| 16 |
"""
|
| 17 |
Initialize the agent
|
| 18 |
|
| 19 |
Args:
|
| 20 |
-
model:
|
| 21 |
max_iterations: Maximum number of LLM calls to prevent infinite loops
|
| 22 |
"""
|
| 23 |
print("🚀 Initializing Agent...")
|
| 24 |
|
| 25 |
-
# Initialize components
|
| 26 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 27 |
self.conversation = ConversationManager()
|
| 28 |
self.tools = get_tools()
|
| 29 |
self.max_iterations = max_iterations
|
|
@@ -52,7 +56,7 @@ class Agent:
|
|
| 52 |
for iteration in range(1, self.max_iterations + 1):
|
| 53 |
print(f"\n--- Iteration {iteration}/{self.max_iterations} ---")
|
| 54 |
|
| 55 |
-
# Call
|
| 56 |
response = self.client.chat(
|
| 57 |
messages=self.conversation.get_messages(),
|
| 58 |
tools=self.tools
|
|
@@ -87,7 +91,10 @@ class Agent:
|
|
| 87 |
for tool_call in tool_calls:
|
| 88 |
self._execute_tool_call(tool_call)
|
| 89 |
|
| 90 |
-
#
|
|
|
|
|
|
|
|
|
|
| 91 |
continue
|
| 92 |
|
| 93 |
else:
|
|
@@ -106,7 +113,7 @@ class Agent:
|
|
| 106 |
Execute a single tool call
|
| 107 |
|
| 108 |
Args:
|
| 109 |
-
tool_call: Tool call object from
|
| 110 |
"""
|
| 111 |
tool_id = tool_call.id
|
| 112 |
tool_name = tool_call.function.name
|
|
|
|
| 12 |
class Agent:
|
| 13 |
"""Main agent that executes tasks using LLM and tools"""
|
| 14 |
|
| 15 |
+
def __init__(self, model: str = "claude-sonnet-4-5-20250929", max_iterations: int = 10):
|
| 16 |
"""
|
| 17 |
Initialize the agent
|
| 18 |
|
| 19 |
Args:
|
| 20 |
+
model: LLM model to use (default: Claude Sonnet 4.5)
|
| 21 |
max_iterations: Maximum number of LLM calls to prevent infinite loops
|
| 22 |
"""
|
| 23 |
print("🚀 Initializing Agent...")
|
| 24 |
|
| 25 |
+
# Initialize components - use Claude by default
|
| 26 |
+
from codepilot.llm.claude_client import ClaudeClient
|
| 27 |
+
if "claude" in model.lower():
|
| 28 |
+
self.client = ClaudeClient(model=model)
|
| 29 |
+
else:
|
| 30 |
+
self.client = OpenAIClient(model=model)
|
| 31 |
self.conversation = ConversationManager()
|
| 32 |
self.tools = get_tools()
|
| 33 |
self.max_iterations = max_iterations
|
|
|
|
| 56 |
for iteration in range(1, self.max_iterations + 1):
|
| 57 |
print(f"\n--- Iteration {iteration}/{self.max_iterations} ---")
|
| 58 |
|
| 59 |
+
# Call LLM with current conversation and tools
|
| 60 |
response = self.client.chat(
|
| 61 |
messages=self.conversation.get_messages(),
|
| 62 |
tools=self.tools
|
|
|
|
| 91 |
for tool_call in tool_calls:
|
| 92 |
self._execute_tool_call(tool_call)
|
| 93 |
|
| 94 |
+
# Trim conversation to prevent context overflow (optimized for Claude's 200K context)
|
| 95 |
+
self.conversation.trim_messages(keep_recent=8)
|
| 96 |
+
|
| 97 |
+
# Continue loop - send results back to LLM
|
| 98 |
continue
|
| 99 |
|
| 100 |
else:
|
|
|
|
| 113 |
Execute a single tool call
|
| 114 |
|
| 115 |
Args:
|
| 116 |
+
tool_call: Tool call object from LLM response
|
| 117 |
"""
|
| 118 |
tool_id = tool_call.id
|
| 119 |
tool_name = tool_call.function.name
|
codepilot/agents/coder_agent.py
CHANGED
|
@@ -1,105 +1,128 @@
|
|
| 1 |
"""
|
| 2 |
-
Coder Agent - Implements code based on plans
|
| 3 |
|
| 4 |
The Coder's job:
|
| 5 |
1. Read the plan from Planner
|
| 6 |
-
2.
|
| 7 |
3. Write code changes to implement the plan
|
| 8 |
-
4.
|
| 9 |
|
| 10 |
-
|
| 11 |
-
-
|
| 12 |
-
-
|
| 13 |
-
-
|
| 14 |
-
- list_files (explore structure)
|
| 15 |
"""
|
| 16 |
|
| 17 |
from codepilot.llm.client import OpenAIClient
|
|
|
|
| 18 |
from codepilot.tools.registry import get_tools, get_tool_function
|
| 19 |
from codepilot.agents.conversation import ConversationManager
|
| 20 |
-
from typing import Dict, Any
|
| 21 |
import json
|
| 22 |
|
| 23 |
|
| 24 |
-
# Coder's specialized system prompt
|
| 25 |
CODER_SYSTEM_PROMPT = """You are an expert software engineer and implementation specialist.
|
| 26 |
|
| 27 |
-
Your ONLY job is to write code that implements the given plan. You do NOT
|
| 28 |
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
2. Search the codebase to find relevant files
|
| 32 |
-
3. Read existing files to understand the current implementation
|
| 33 |
-
4. Write clean, well-structured code that follows the plan
|
| 34 |
-
5. Make incremental changes, one step at a time
|
| 35 |
|
| 36 |
-
|
| 37 |
-
-
|
| 38 |
-
-
|
| 39 |
-
-
|
| 40 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 41 |
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
|
| 47 |
|
| 48 |
-
|
| 49 |
-
-
|
| 50 |
-
-
|
|
|
|
| 51 |
- write_file: Create or modify files
|
| 52 |
-
-
|
| 53 |
-
-
|
| 54 |
-
-
|
| 55 |
-
|
| 56 |
-
|
| 57 |
-
|
| 58 |
-
1.
|
| 59 |
-
2.
|
| 60 |
-
3.
|
| 61 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 62 |
"""
|
| 63 |
|
| 64 |
|
| 65 |
class CoderAgent:
|
| 66 |
"""
|
| 67 |
-
Coder Agent - Implements code based on plans.
|
| 68 |
|
| 69 |
This agent is specialized for coding. It has:
|
| 70 |
-
-
|
| 71 |
-
-
|
| 72 |
-
-
|
| 73 |
"""
|
| 74 |
|
| 75 |
-
def __init__(self, model: str = "
|
| 76 |
"""
|
| 77 |
Initialize Coder agent.
|
| 78 |
|
| 79 |
Args:
|
| 80 |
-
model: LLM model to use
|
| 81 |
"""
|
| 82 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 83 |
self.conversation = ConversationManager()
|
| 84 |
|
| 85 |
-
#
|
|
|
|
| 86 |
self.allowed_tools = [
|
| 87 |
-
"
|
| 88 |
-
"
|
| 89 |
-
"
|
| 90 |
-
"
|
| 91 |
-
"
|
| 92 |
-
"
|
| 93 |
-
"
|
|
|
|
| 94 |
]
|
| 95 |
|
| 96 |
-
def run(
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 97 |
"""
|
| 98 |
Implement the given plan.
|
| 99 |
|
|
|
|
|
|
|
| 100 |
Args:
|
| 101 |
plan: Implementation plan from Planner
|
| 102 |
task: Original task description (for context)
|
|
|
|
| 103 |
review_feedback: Optional feedback from Reviewer if code was rejected
|
| 104 |
|
| 105 |
Returns:
|
|
@@ -111,24 +134,34 @@ class CoderAgent:
|
|
| 111 |
# Add system prompt
|
| 112 |
self.conversation.add_message("system", CODER_SYSTEM_PROMPT)
|
| 113 |
|
| 114 |
-
# Build user prompt with task, plan
|
| 115 |
-
user_prompt = f"""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 116 |
|
| 117 |
-
|
| 118 |
-
{plan}
|
|
|
|
|
|
|
| 119 |
|
| 120 |
# If this is a rework (Reviewer rejected the code), include feedback
|
| 121 |
if review_feedback:
|
| 122 |
-
user_prompt += f"""
|
| 123 |
-
|
| 124 |
-
IMPORTANT - REVIEWER FEEDBACK (CODE WAS REJECTED):
|
| 125 |
{review_feedback}
|
| 126 |
|
| 127 |
Please fix the issues mentioned by the Reviewer and resubmit the code."""
|
| 128 |
else:
|
| 129 |
-
user_prompt += """
|
| 130 |
-
|
| 131 |
-
|
|
|
|
| 132 |
|
| 133 |
self.conversation.add_message("user", user_prompt)
|
| 134 |
|
|
@@ -142,8 +175,8 @@ Please implement this plan step by step. Write clean, well-structured code that
|
|
| 142 |
# Track which files were modified
|
| 143 |
modified_files = {}
|
| 144 |
|
| 145 |
-
# Run coding loop
|
| 146 |
-
max_iterations = 15
|
| 147 |
for iteration in range(max_iterations):
|
| 148 |
# Call LLM
|
| 149 |
response = self.client.chat(
|
|
@@ -163,7 +196,6 @@ Please implement this plan step by step. Write clean, well-structured code that
|
|
| 163 |
|
| 164 |
# Check if done
|
| 165 |
if finish_reason == "stop":
|
| 166 |
-
# Agent finished coding
|
| 167 |
print(f"[CODER] Finished implementation")
|
| 168 |
return modified_files
|
| 169 |
|
|
|
|
| 1 |
"""
|
| 2 |
+
Coder Agent - Implements code based on plans (v3.0)
|
| 3 |
|
| 4 |
The Coder's job:
|
| 5 |
1. Read the plan from Planner
|
| 6 |
+
2. Use exploration context (already searched by Explorer)
|
| 7 |
3. Write code changes to implement the plan
|
| 8 |
+
4. Test in sandbox
|
| 9 |
|
| 10 |
+
v3.0 Changes:
|
| 11 |
+
- Removed search tools (Explorer already searched)
|
| 12 |
+
- Receives exploration_context from orchestrator
|
| 13 |
+
- Focused only on reading/writing/testing
|
|
|
|
| 14 |
"""
|
| 15 |
|
| 16 |
from codepilot.llm.client import OpenAIClient
|
| 17 |
+
from codepilot.llm.claude_client import ClaudeClient
|
| 18 |
from codepilot.tools.registry import get_tools, get_tool_function
|
| 19 |
from codepilot.agents.conversation import ConversationManager
|
| 20 |
+
from typing import Dict, Any, Optional
|
| 21 |
import json
|
| 22 |
|
| 23 |
|
| 24 |
+
# Coder's specialized system prompt (v3.2 - no search, no list_files, uses exploration context)
|
| 25 |
CODER_SYSTEM_PROMPT = """You are an expert software engineer and implementation specialist.
|
| 26 |
|
| 27 |
+
Your ONLY job is to write code that implements the given plan. You do NOT explore or search.
|
| 28 |
|
| 29 |
+
=== CRITICAL: USE THE PROVIDED CONTEXT ===
|
| 30 |
+
The Explorer agent has ALREADY searched the codebase for you. All file paths and code patterns are in the EXPLORATION RESULTS below.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 31 |
|
| 32 |
+
DO NOT:
|
| 33 |
+
- Navigate directories (no list_files)
|
| 34 |
+
- Search for files (no searching)
|
| 35 |
+
- Explore the codebase
|
| 36 |
+
|
| 37 |
+
DO:
|
| 38 |
+
- Use the exact file paths from exploration results
|
| 39 |
+
- Start writing code immediately
|
| 40 |
+
- Follow the plan step by step
|
| 41 |
|
| 42 |
+
=== WORKFLOW ===
|
| 43 |
+
1. Read the exploration results - they contain all file paths you need
|
| 44 |
+
2. If modifying existing code: use get_code_chunk to read the specific function
|
| 45 |
+
3. Write your changes with write_file using paths from exploration
|
| 46 |
+
4. Test in sandbox if needed
|
| 47 |
|
| 48 |
+
=== TOOLS ===
|
| 49 |
+
- get_file_outline: See file structure (use if unsure about a file)
|
| 50 |
+
- get_code_chunk: Read ONE specific function/class
|
| 51 |
+
- read_file: Read entire file (only when rewriting whole file)
|
| 52 |
- write_file: Create or modify files
|
| 53 |
+
- upload_to_sandbox: Upload files for testing
|
| 54 |
+
- run_command_in_sandbox: Run tests in sandbox
|
| 55 |
+
- execute_in_sandbox: Execute Python snippets
|
| 56 |
+
|
| 57 |
+
=== SANDBOX WORKFLOW ===
|
| 58 |
+
When testing in sandbox:
|
| 59 |
+
1. Upload with RELATIVE path: upload_to_sandbox(path="file.py", content=code)
|
| 60 |
+
2. Run with RELATIVE path: run_command_in_sandbox(command="python file.py")
|
| 61 |
+
3. The sandbox CANNOT access /tmp/codepilot_repos/ - use simple filenames!
|
| 62 |
+
|
| 63 |
+
Your code should be:
|
| 64 |
+
- Clean (follow existing code style)
|
| 65 |
+
- Minimal (only change what's necessary)
|
| 66 |
+
- Follow the plan exactly
|
| 67 |
+
|
| 68 |
+
START CODING IMMEDIATELY - do not explore!
|
| 69 |
"""
|
| 70 |
|
| 71 |
|
| 72 |
class CoderAgent:
|
| 73 |
"""
|
| 74 |
+
Coder Agent - Implements code based on plans (v3.0).
|
| 75 |
|
| 76 |
This agent is specialized for coding. It has:
|
| 77 |
+
- NO search tools (Explorer already searched)
|
| 78 |
+
- Receives exploration_context
|
| 79 |
+
- Write access + sandbox execution
|
| 80 |
"""
|
| 81 |
|
| 82 |
+
def __init__(self, model: str = "claude-sonnet-4-5-20250929"):
|
| 83 |
"""
|
| 84 |
Initialize Coder agent.
|
| 85 |
|
| 86 |
Args:
|
| 87 |
+
model: LLM model to use (default: Claude Sonnet 4.5)
|
| 88 |
"""
|
| 89 |
+
# Use Claude client for Claude models, OpenAI client as fallback
|
| 90 |
+
if "claude" in model.lower():
|
| 91 |
+
self.client = ClaudeClient(model=model)
|
| 92 |
+
else:
|
| 93 |
+
self.client = OpenAIClient(model=model)
|
| 94 |
+
|
| 95 |
self.conversation = ConversationManager()
|
| 96 |
|
| 97 |
+
# v3.2: Removed list_files - Explorer provides all paths needed
|
| 98 |
+
# Coder only needs: read, write, and sandbox tools
|
| 99 |
self.allowed_tools = [
|
| 100 |
+
"get_file_outline", # Get file structure without full code
|
| 101 |
+
"get_code_chunk", # Extract specific function/class by name
|
| 102 |
+
"read_file", # Full file contents (use sparingly)
|
| 103 |
+
"write_file", # Create or modify files
|
| 104 |
+
# "list_files" REMOVED - use exploration context instead
|
| 105 |
+
"upload_to_sandbox", # Upload files for testing
|
| 106 |
+
"run_command_in_sandbox", # Run tests in sandbox
|
| 107 |
+
"execute_in_sandbox" # Execute Python snippets
|
| 108 |
]
|
| 109 |
|
| 110 |
+
def run(
|
| 111 |
+
self,
|
| 112 |
+
plan: str,
|
| 113 |
+
task: str,
|
| 114 |
+
exploration_context: Optional[str] = None,
|
| 115 |
+
review_feedback: Optional[str] = None
|
| 116 |
+
) -> Dict[str, str]:
|
| 117 |
"""
|
| 118 |
Implement the given plan.
|
| 119 |
|
| 120 |
+
v3.0: Now receives exploration_context so it doesn't need to search.
|
| 121 |
+
|
| 122 |
Args:
|
| 123 |
plan: Implementation plan from Planner
|
| 124 |
task: Original task description (for context)
|
| 125 |
+
exploration_context: Context gathered by Explorer agent
|
| 126 |
review_feedback: Optional feedback from Reviewer if code was rejected
|
| 127 |
|
| 128 |
Returns:
|
|
|
|
| 134 |
# Add system prompt
|
| 135 |
self.conversation.add_message("system", CODER_SYSTEM_PROMPT)
|
| 136 |
|
| 137 |
+
# Build user prompt with exploration context, task, plan
|
| 138 |
+
user_prompt = f"""=== ORIGINAL TASK ===
|
| 139 |
+
{task}
|
| 140 |
+
|
| 141 |
+
"""
|
| 142 |
+
# Add exploration context if available
|
| 143 |
+
if exploration_context:
|
| 144 |
+
user_prompt += f"""=== EXPLORATION RESULTS (from Explorer agent) ===
|
| 145 |
+
{exploration_context}
|
| 146 |
+
|
| 147 |
+
"""
|
| 148 |
|
| 149 |
+
user_prompt += f"""=== IMPLEMENTATION PLAN (from Planner agent) ===
|
| 150 |
+
{plan}
|
| 151 |
+
|
| 152 |
+
"""
|
| 153 |
|
| 154 |
# If this is a rework (Reviewer rejected the code), include feedback
|
| 155 |
if review_feedback:
|
| 156 |
+
user_prompt += f"""=== REVIEWER FEEDBACK (CODE WAS REJECTED) ===
|
|
|
|
|
|
|
| 157 |
{review_feedback}
|
| 158 |
|
| 159 |
Please fix the issues mentioned by the Reviewer and resubmit the code."""
|
| 160 |
else:
|
| 161 |
+
user_prompt += """Please implement this plan step by step.
|
| 162 |
+
Use the exploration results to understand the codebase structure.
|
| 163 |
+
Write clean, well-structured code that follows the plan.
|
| 164 |
+
Test your code in the sandbox before finishing."""
|
| 165 |
|
| 166 |
self.conversation.add_message("user", user_prompt)
|
| 167 |
|
|
|
|
| 175 |
# Track which files were modified
|
| 176 |
modified_files = {}
|
| 177 |
|
| 178 |
+
# Run coding loop
|
| 179 |
+
max_iterations = 15
|
| 180 |
for iteration in range(max_iterations):
|
| 181 |
# Call LLM
|
| 182 |
response = self.client.chat(
|
|
|
|
| 196 |
|
| 197 |
# Check if done
|
| 198 |
if finish_reason == "stop":
|
|
|
|
| 199 |
print(f"[CODER] Finished implementation")
|
| 200 |
return modified_files
|
| 201 |
|
codepilot/agents/conversation.py
CHANGED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
"""
|
| 2 |
Conversation Manager
|
| 3 |
-
Handles conversation history in
|
| 4 |
"""
|
| 5 |
|
| 6 |
from typing import List, Dict, Any
|
|
@@ -58,13 +58,13 @@ class ConversationManager:
|
|
| 58 |
Add an assistant message with tool calls
|
| 59 |
|
| 60 |
Args:
|
| 61 |
-
tool_calls: List of tool call objects from
|
| 62 |
"""
|
| 63 |
# Extract tool call info for logging
|
| 64 |
tool_names = [tc.function.name for tc in tool_calls]
|
| 65 |
print(f"🔧 Assistant calling tools: {tool_names}")
|
| 66 |
|
| 67 |
-
#
|
| 68 |
self.messages.append({
|
| 69 |
"role": "assistant",
|
| 70 |
"content": None, # No text content when making tool calls
|
|
@@ -86,7 +86,7 @@ class ConversationManager:
|
|
| 86 |
Add a tool execution result to the conversation
|
| 87 |
|
| 88 |
Args:
|
| 89 |
-
tool_call_id: The ID of the tool call (from
|
| 90 |
tool_name: Name of the tool that was executed
|
| 91 |
result: The result string from the tool
|
| 92 |
"""
|
|
@@ -109,6 +109,25 @@ class ConversationManager:
|
|
| 109 |
"""
|
| 110 |
return self.messages
|
| 111 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 112 |
def clear(self):
|
| 113 |
"""Clear all messages from history"""
|
| 114 |
self.messages = []
|
|
|
|
| 1 |
"""
|
| 2 |
Conversation Manager
|
| 3 |
+
Handles conversation history in standard LLM message format
|
| 4 |
"""
|
| 5 |
|
| 6 |
from typing import List, Dict, Any
|
|
|
|
| 58 |
Add an assistant message with tool calls
|
| 59 |
|
| 60 |
Args:
|
| 61 |
+
tool_calls: List of tool call objects from LLM response
|
| 62 |
"""
|
| 63 |
# Extract tool call info for logging
|
| 64 |
tool_names = [tc.function.name for tc in tool_calls]
|
| 65 |
print(f"🔧 Assistant calling tools: {tool_names}")
|
| 66 |
|
| 67 |
+
# Standard tool call format (converted by client for Claude)
|
| 68 |
self.messages.append({
|
| 69 |
"role": "assistant",
|
| 70 |
"content": None, # No text content when making tool calls
|
|
|
|
| 86 |
Add a tool execution result to the conversation
|
| 87 |
|
| 88 |
Args:
|
| 89 |
+
tool_call_id: The ID of the tool call (from LLM)
|
| 90 |
tool_name: Name of the tool that was executed
|
| 91 |
result: The result string from the tool
|
| 92 |
"""
|
|
|
|
| 109 |
"""
|
| 110 |
return self.messages
|
| 111 |
|
| 112 |
+
def trim_messages(self, keep_recent: int = 10):
|
| 113 |
+
"""
|
| 114 |
+
Trim conversation history to prevent context overflow.
|
| 115 |
+
Keeps the system message and the most recent N messages.
|
| 116 |
+
|
| 117 |
+
Args:
|
| 118 |
+
keep_recent: Number of recent messages to keep (default: 10)
|
| 119 |
+
"""
|
| 120 |
+
if len(self.messages) <= keep_recent + 1:
|
| 121 |
+
return # No need to trim
|
| 122 |
+
|
| 123 |
+
# Keep system message (first) + recent messages
|
| 124 |
+
system_msg = [self.messages[0]] if self.messages and self.messages[0].get("role") == "system" else []
|
| 125 |
+
recent_msgs = self.messages[-keep_recent:]
|
| 126 |
+
|
| 127 |
+
old_count = len(self.messages)
|
| 128 |
+
self.messages = system_msg + recent_msgs
|
| 129 |
+
print(f"✂️ Trimmed conversation: {old_count} → {len(self.messages)} messages")
|
| 130 |
+
|
| 131 |
def clear(self):
|
| 132 |
"""Clear all messages from history"""
|
| 133 |
self.messages = []
|
codepilot/agents/explorer_agent.py
ADDED
|
@@ -0,0 +1,168 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Explorer Agent - Lightweight agent for search and exploration queries
|
| 3 |
+
|
| 4 |
+
The Explorer's job:
|
| 5 |
+
1. Search the codebase to find relevant code
|
| 6 |
+
2. Answer questions about the codebase
|
| 7 |
+
3. Explain how code works
|
| 8 |
+
|
| 9 |
+
This agent is used for queries like:
|
| 10 |
+
- "Find the Flask class"
|
| 11 |
+
- "Where is the login function?"
|
| 12 |
+
- "Explain how routing works"
|
| 13 |
+
|
| 14 |
+
It does NOT write code - just explores and explains.
|
| 15 |
+
"""
|
| 16 |
+
|
| 17 |
+
from codepilot.llm.client import OpenAIClient
|
| 18 |
+
from codepilot.llm.claude_client import ClaudeClient
|
| 19 |
+
from codepilot.tools.registry import get_tools, get_tool_function
|
| 20 |
+
from codepilot.agents.conversation import ConversationManager
|
| 21 |
+
import json
|
| 22 |
+
|
| 23 |
+
|
| 24 |
+
# Explorer's specialized system prompt - optimized for token efficiency
|
| 25 |
+
EXPLORER_SYSTEM_PROMPT = """You are a code exploration expert.
|
| 26 |
+
|
| 27 |
+
Your job is to search codebases and answer questions about code.
|
| 28 |
+
You do NOT write code or create plans - just find and explain.
|
| 29 |
+
|
| 30 |
+
=== TOKEN-EFFICIENT WORKFLOW ===
|
| 31 |
+
1. Use search_code or search_repository to find relevant files
|
| 32 |
+
2. Use get_file_outline to see file structure (~50 tokens, NOT full code)
|
| 33 |
+
3. Use get_code_chunk to read ONLY the specific function/class you need
|
| 34 |
+
4. Provide a clear, concise answer
|
| 35 |
+
|
| 36 |
+
NEVER use read_file - it wastes tokens by reading entire files!
|
| 37 |
+
|
| 38 |
+
=== TOOLS ===
|
| 39 |
+
- get_file_outline: See file structure WITHOUT code - USE THIS!
|
| 40 |
+
- get_code_chunk: Read ONE specific function/class - USE THIS!
|
| 41 |
+
- search_code: Grep for exact patterns (e.g., "^class Flask")
|
| 42 |
+
- search_repository: Semantic search (BM25 + embeddings)
|
| 43 |
+
- list_files: List directory contents
|
| 44 |
+
|
| 45 |
+
=== RESPONSE FORMAT ===
|
| 46 |
+
After finding the answer, respond with:
|
| 47 |
+
1. What you found (file path, line numbers)
|
| 48 |
+
2. Brief explanation of how it works
|
| 49 |
+
3. Key code snippets if relevant
|
| 50 |
+
|
| 51 |
+
Be concise. Answer the question directly.
|
| 52 |
+
"""
|
| 53 |
+
|
| 54 |
+
|
| 55 |
+
class ExplorerAgent:
|
| 56 |
+
"""
|
| 57 |
+
Explorer Agent - Lightweight agent for search/exploration queries.
|
| 58 |
+
|
| 59 |
+
This agent is specialized for exploration. It has:
|
| 60 |
+
- Minimal system prompt (token-efficient)
|
| 61 |
+
- Read-only tools (no write access)
|
| 62 |
+
- Fewer iterations (max 5)
|
| 63 |
+
- No read_file (forces use of efficient tools)
|
| 64 |
+
"""
|
| 65 |
+
|
| 66 |
+
def __init__(self, model: str = "claude-sonnet-4-5-20250929"):
|
| 67 |
+
"""
|
| 68 |
+
Initialize Explorer agent.
|
| 69 |
+
|
| 70 |
+
Args:
|
| 71 |
+
model: LLM model to use (default: Claude Sonnet 4.5)
|
| 72 |
+
"""
|
| 73 |
+
# Use Claude client for Claude models, OpenAI client as fallback
|
| 74 |
+
if "claude" in model.lower():
|
| 75 |
+
self.client = ClaudeClient(model=model)
|
| 76 |
+
else:
|
| 77 |
+
self.client = OpenAIClient(model=model)
|
| 78 |
+
|
| 79 |
+
self.conversation = ConversationManager()
|
| 80 |
+
|
| 81 |
+
# Explorer only gets token-efficient read-only tools
|
| 82 |
+
# Intentionally excludes read_file to force efficient tool usage
|
| 83 |
+
self.allowed_tools = [
|
| 84 |
+
"get_file_outline", # File structure without code
|
| 85 |
+
"get_code_chunk", # Specific function/class only
|
| 86 |
+
"search_code", # Grep pattern matching
|
| 87 |
+
"search_repository", # Semantic search
|
| 88 |
+
"list_files" # Directory listing
|
| 89 |
+
]
|
| 90 |
+
|
| 91 |
+
def run(self, query: str) -> str:
|
| 92 |
+
"""
|
| 93 |
+
Explore the codebase to answer a query.
|
| 94 |
+
|
| 95 |
+
Args:
|
| 96 |
+
query: User's question (e.g., "Find the Flask class")
|
| 97 |
+
|
| 98 |
+
Returns:
|
| 99 |
+
Answer as a string
|
| 100 |
+
"""
|
| 101 |
+
# Reset conversation
|
| 102 |
+
self.conversation = ConversationManager()
|
| 103 |
+
|
| 104 |
+
# Add system prompt
|
| 105 |
+
self.conversation.add_message("system", EXPLORER_SYSTEM_PROMPT)
|
| 106 |
+
|
| 107 |
+
# Add user query
|
| 108 |
+
self.conversation.add_message("user", query)
|
| 109 |
+
|
| 110 |
+
# Get only the tools this agent is allowed to use
|
| 111 |
+
all_tools = get_tools()
|
| 112 |
+
explorer_tools = [
|
| 113 |
+
tool for tool in all_tools
|
| 114 |
+
if tool['function']['name'] in self.allowed_tools
|
| 115 |
+
]
|
| 116 |
+
|
| 117 |
+
# Run exploration loop (fewer iterations than other agents)
|
| 118 |
+
max_iterations = 5
|
| 119 |
+
for iteration in range(max_iterations):
|
| 120 |
+
# Call LLM
|
| 121 |
+
response = self.client.chat(
|
| 122 |
+
messages=self.conversation.get_messages(),
|
| 123 |
+
tools=explorer_tools
|
| 124 |
+
)
|
| 125 |
+
|
| 126 |
+
finish_reason = response.choices[0].finish_reason
|
| 127 |
+
message = response.choices[0].message
|
| 128 |
+
|
| 129 |
+
# Add assistant response to conversation
|
| 130 |
+
self.conversation.add_message(
|
| 131 |
+
role="assistant",
|
| 132 |
+
content=message.content,
|
| 133 |
+
tool_calls=message.tool_calls
|
| 134 |
+
)
|
| 135 |
+
|
| 136 |
+
# Check if done
|
| 137 |
+
if finish_reason == "stop":
|
| 138 |
+
# Agent finished exploring
|
| 139 |
+
return message.content
|
| 140 |
+
|
| 141 |
+
# Execute tool calls
|
| 142 |
+
if finish_reason == "tool_calls":
|
| 143 |
+
for tool_call in message.tool_calls:
|
| 144 |
+
tool_name = tool_call.function.name
|
| 145 |
+
tool_args = json.loads(tool_call.function.arguments)
|
| 146 |
+
|
| 147 |
+
print(f"[EXPLORER] Calling tool: {tool_name}({tool_args})")
|
| 148 |
+
|
| 149 |
+
# Execute tool
|
| 150 |
+
tool_func = get_tool_function(tool_name)
|
| 151 |
+
if tool_func:
|
| 152 |
+
result = tool_func(**tool_args)
|
| 153 |
+
else:
|
| 154 |
+
result = f"Error: Tool {tool_name} not found"
|
| 155 |
+
|
| 156 |
+
# Add tool result to conversation
|
| 157 |
+
self.conversation.add_tool_result(
|
| 158 |
+
tool_call_id=tool_call.id,
|
| 159 |
+
tool_name=tool_name,
|
| 160 |
+
result=str(result)
|
| 161 |
+
)
|
| 162 |
+
|
| 163 |
+
# If we hit max iterations, return what we have
|
| 164 |
+
return "I found some information but couldn't complete the search. Please try a more specific query."
|
| 165 |
+
|
| 166 |
+
def get_tool_access(self) -> list:
|
| 167 |
+
"""Return list of tools this agent can access."""
|
| 168 |
+
return self.allowed_tools
|
codepilot/agents/orchestrator.py
CHANGED
|
@@ -8,16 +8,22 @@ The orchestrator is the "brain" that:
|
|
| 8 |
4. Handles the overall task flow
|
| 9 |
"""
|
| 10 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
from enum import Enum
|
| 12 |
from typing import Dict, Any, Optional
|
| 13 |
from dataclasses import dataclass
|
| 14 |
from codepilot.agents.planner_agent import PlannerAgent
|
| 15 |
from codepilot.agents.coder_agent import CoderAgent
|
| 16 |
from codepilot.agents.reviewer_agent import ReviewerAgent
|
|
|
|
| 17 |
|
| 18 |
|
| 19 |
class AgentState(Enum):
|
| 20 |
"""Possible states in the multi-agent workflow"""
|
|
|
|
| 21 |
PLANNING = "planning"
|
| 22 |
CODING = "coding"
|
| 23 |
REVIEWING = "reviewing"
|
|
@@ -33,7 +39,8 @@ class TaskContext:
|
|
| 33 |
Think of this as a clipboard that agents write to and read from.
|
| 34 |
"""
|
| 35 |
task_description: str # Original task from user
|
| 36 |
-
|
|
|
|
| 37 |
code_changes: Optional[Dict[str, str]] = None # Created by Coder
|
| 38 |
review_feedback: Optional[str] = None # Created by Reviewer
|
| 39 |
error_message: Optional[str] = None # Set if something fails
|
|
@@ -48,14 +55,16 @@ class Orchestrator:
|
|
| 48 |
"""
|
| 49 |
Orchestrator manages the multi-agent workflow.
|
| 50 |
|
| 51 |
-
Flow:
|
| 52 |
-
1. Start in
|
| 53 |
-
2. Call
|
| 54 |
-
3. Transition to
|
| 55 |
-
4. Call
|
| 56 |
-
5. Transition to
|
| 57 |
-
6. Call
|
| 58 |
-
7.
|
|
|
|
|
|
|
| 59 |
If rejected → back to CODING (loop)
|
| 60 |
"""
|
| 61 |
|
|
@@ -71,24 +80,178 @@ class Orchestrator:
|
|
| 71 |
self.max_iterations = max_iterations
|
| 72 |
self.context = None
|
| 73 |
|
| 74 |
-
# Create agent instances
|
| 75 |
-
self.
|
| 76 |
-
self.
|
| 77 |
-
self.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 78 |
|
| 79 |
def run(self, task: str) -> Dict[str, Any]:
|
| 80 |
"""
|
| 81 |
Run the multi-agent workflow for a task.
|
| 82 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 83 |
Args:
|
| 84 |
task: User's task description (e.g., "Add a login feature")
|
| 85 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 86 |
Returns:
|
| 87 |
Result dict with status, changes, and messages
|
| 88 |
"""
|
| 89 |
# Initialize context
|
| 90 |
self.context = TaskContext(task_description=task)
|
| 91 |
-
self.state = AgentState.
|
| 92 |
|
| 93 |
# Main state machine loop
|
| 94 |
while self.state not in [AgentState.COMPLETE, AgentState.FAILED]:
|
|
@@ -99,7 +262,10 @@ class Orchestrator:
|
|
| 99 |
break
|
| 100 |
|
| 101 |
# Execute current state
|
| 102 |
-
if self.state == AgentState.
|
|
|
|
|
|
|
|
|
|
| 103 |
self._execute_planning()
|
| 104 |
|
| 105 |
elif self.state == AgentState.CODING:
|
|
@@ -113,22 +279,49 @@ class Orchestrator:
|
|
| 113 |
# Return final result
|
| 114 |
return self._build_result()
|
| 115 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 116 |
def _execute_planning(self):
|
| 117 |
"""
|
| 118 |
Execute planning state: call Planner agent.
|
| 119 |
|
| 120 |
-
Planner's job:
|
| 121 |
-
-
|
| 122 |
-
-
|
| 123 |
-
-
|
| 124 |
|
| 125 |
Transition: Always go to CODING next
|
| 126 |
"""
|
| 127 |
print(f"\n[ORCHESTRATOR] State: PLANNING")
|
| 128 |
-
print(f"[ORCHESTRATOR]
|
| 129 |
|
| 130 |
-
# Call the
|
| 131 |
-
self.context.plan = self.planner.run(
|
|
|
|
|
|
|
|
|
|
| 132 |
|
| 133 |
# Transition to coding
|
| 134 |
self.state = AgentState.CODING
|
|
@@ -138,10 +331,11 @@ class Orchestrator:
|
|
| 138 |
"""
|
| 139 |
Execute coding state: call Coder agent.
|
| 140 |
|
| 141 |
-
Coder's job:
|
| 142 |
-
-
|
| 143 |
-
- Read
|
| 144 |
-
-
|
|
|
|
| 145 |
|
| 146 |
Transition: Always go to REVIEWING next
|
| 147 |
"""
|
|
@@ -149,14 +343,15 @@ class Orchestrator:
|
|
| 149 |
|
| 150 |
# Check if this is a rework (Reviewer rejected previous code)
|
| 151 |
if self.context.review_feedback:
|
| 152 |
-
print(f"[ORCHESTRATOR] Passing plan + REVIEWER FEEDBACK to Coder
|
| 153 |
else:
|
| 154 |
-
print(f"[ORCHESTRATOR] Passing plan to Coder
|
| 155 |
|
| 156 |
-
# Call the
|
| 157 |
self.context.code_changes = self.coder.run(
|
| 158 |
plan=self.context.plan,
|
| 159 |
task=self.context.task_description,
|
|
|
|
| 160 |
review_feedback=self.context.review_feedback
|
| 161 |
)
|
| 162 |
|
|
|
|
| 8 |
4. Handles the overall task flow
|
| 9 |
"""
|
| 10 |
|
| 11 |
+
# VERSION CHECK - If you see this, new code is running!
|
| 12 |
+
ORCHESTRATOR_VERSION = "3.2.0-coder-no-list"
|
| 13 |
+
print(f"[ORCHESTRATOR] ========== LOADING VERSION {ORCHESTRATOR_VERSION} ==========")
|
| 14 |
+
|
| 15 |
from enum import Enum
|
| 16 |
from typing import Dict, Any, Optional
|
| 17 |
from dataclasses import dataclass
|
| 18 |
from codepilot.agents.planner_agent import PlannerAgent
|
| 19 |
from codepilot.agents.coder_agent import CoderAgent
|
| 20 |
from codepilot.agents.reviewer_agent import ReviewerAgent
|
| 21 |
+
from codepilot.agents.explorer_agent import ExplorerAgent
|
| 22 |
|
| 23 |
|
| 24 |
class AgentState(Enum):
|
| 25 |
"""Possible states in the multi-agent workflow"""
|
| 26 |
+
EXPLORING = "exploring" # NEW - Explorer gathers context first
|
| 27 |
PLANNING = "planning"
|
| 28 |
CODING = "coding"
|
| 29 |
REVIEWING = "reviewing"
|
|
|
|
| 39 |
Think of this as a clipboard that agents write to and read from.
|
| 40 |
"""
|
| 41 |
task_description: str # Original task from user
|
| 42 |
+
exploration_context: Optional[str] = None # NEW - Created by Explorer
|
| 43 |
+
plan: Optional[str] = None # Created by Planner (uses exploration_context)
|
| 44 |
code_changes: Optional[Dict[str, str]] = None # Created by Coder
|
| 45 |
review_feedback: Optional[str] = None # Created by Reviewer
|
| 46 |
error_message: Optional[str] = None # Set if something fails
|
|
|
|
| 55 |
"""
|
| 56 |
Orchestrator manages the multi-agent workflow.
|
| 57 |
|
| 58 |
+
Flow (v3.0 - Explorer First):
|
| 59 |
+
1. Start in EXPLORING state
|
| 60 |
+
2. Call Explorer agent → gather codebase context (token-efficient)
|
| 61 |
+
3. Transition to PLANNING state
|
| 62 |
+
4. Call Planner agent (no tools, pure LLM) → get plan based on exploration
|
| 63 |
+
5. Transition to CODING state
|
| 64 |
+
6. Call Coder agent → get code
|
| 65 |
+
7. Transition to REVIEWING state
|
| 66 |
+
8. Call Reviewer agent → get feedback
|
| 67 |
+
9. If approved → COMPLETE
|
| 68 |
If rejected → back to CODING (loop)
|
| 69 |
"""
|
| 70 |
|
|
|
|
| 80 |
self.max_iterations = max_iterations
|
| 81 |
self.context = None
|
| 82 |
|
| 83 |
+
# Create agent instances (using Claude Sonnet 4.5 - LATEST best coding model, 200K context)
|
| 84 |
+
self.explorer = ExplorerAgent(model="claude-sonnet-4-5-20250929") # Lightweight for exploration
|
| 85 |
+
self.planner = PlannerAgent(model="claude-sonnet-4-5-20250929")
|
| 86 |
+
self.coder = CoderAgent(model="claude-sonnet-4-5-20250929")
|
| 87 |
+
self.reviewer = ReviewerAgent(model="claude-sonnet-4-5-20250929")
|
| 88 |
+
|
| 89 |
+
def classify_task(self, task: str) -> str:
|
| 90 |
+
"""
|
| 91 |
+
Classify task as 'explore' or 'code'.
|
| 92 |
+
|
| 93 |
+
Exploration tasks: find, search, explain, what is, where is
|
| 94 |
+
Code tasks: add, create, implement, fix, modify
|
| 95 |
+
|
| 96 |
+
Args:
|
| 97 |
+
task: User's task description (may include context prefix)
|
| 98 |
+
|
| 99 |
+
Returns:
|
| 100 |
+
'explore' or 'code'
|
| 101 |
+
"""
|
| 102 |
+
print(f"[CLASSIFIER] ########## CLASSIFIER v2.0 START ##########")
|
| 103 |
+
print(f"[CLASSIFIER] Raw task length: {len(task)} chars")
|
| 104 |
+
print(f"[CLASSIFIER] Has [REPOSITORY CONTEXT]: {'[REPOSITORY CONTEXT]' in task}")
|
| 105 |
+
|
| 106 |
+
# Extract just the user's query (after any context sections)
|
| 107 |
+
task_to_check = task
|
| 108 |
+
|
| 109 |
+
# If task has repository context, extract just the user query
|
| 110 |
+
if "[REPOSITORY CONTEXT]" in task:
|
| 111 |
+
print(f"[CLASSIFIER] Extracting user query from context...")
|
| 112 |
+
# Split by double newline and take the last non-empty part
|
| 113 |
+
parts = task.split("\n\n")
|
| 114 |
+
print(f"[CLASSIFIER] Found {len(parts)} parts after splitting")
|
| 115 |
+
|
| 116 |
+
# Get the last substantial part (user's actual query)
|
| 117 |
+
for i, part in enumerate(reversed(parts)):
|
| 118 |
+
part = part.strip()
|
| 119 |
+
print(f"[CLASSIFIER] Checking part {i}: '{part[:50]}...' (len={len(part)})")
|
| 120 |
+
if part and not part.startswith("[") and not part.startswith("AVAILABLE"):
|
| 121 |
+
task_to_check = part
|
| 122 |
+
print(f"[CLASSIFIER] Selected user query: '{part[:80]}...'")
|
| 123 |
+
break
|
| 124 |
+
else:
|
| 125 |
+
print(f"[CLASSIFIER] No context prefix, using raw task")
|
| 126 |
+
|
| 127 |
+
task_lower = task_to_check.lower().strip()
|
| 128 |
+
|
| 129 |
+
# Get just the first few words to determine intent
|
| 130 |
+
first_words = task_lower.split()[:5]
|
| 131 |
+
first_part = ' '.join(first_words)
|
| 132 |
+
|
| 133 |
+
print(f"[CLASSIFIER] Final query: '{task_to_check[:100]}'")
|
| 134 |
+
print(f"[CLASSIFIER] First 5 words: '{first_part}'")
|
| 135 |
+
|
| 136 |
+
# EXPLORE patterns - check these FIRST (questions about code)
|
| 137 |
+
# These indicate the user wants to understand/find something, not change it
|
| 138 |
+
explore_starters = [
|
| 139 |
+
"find", "search", "where", "what", "how", "why",
|
| 140 |
+
"explain", "show", "describe", "look", "locate",
|
| 141 |
+
"understand", "tell", "list", "which", "does", "is there",
|
| 142 |
+
"can you find", "can you show", "can you explain",
|
| 143 |
+
"i want to know", "i want to understand", "i want to find",
|
| 144 |
+
"help me find", "help me understand"
|
| 145 |
+
]
|
| 146 |
+
|
| 147 |
+
# Check if query STARTS with an explore pattern
|
| 148 |
+
for pattern in explore_starters:
|
| 149 |
+
if task_lower.startswith(pattern) or first_part.startswith(pattern):
|
| 150 |
+
print(f"[CLASSIFIER] >>>>>> RESULT: EXPLORE (starts with '{pattern}') <<<<<<")
|
| 151 |
+
return "explore"
|
| 152 |
+
|
| 153 |
+
# Also check for question words anywhere in short queries
|
| 154 |
+
if len(task_lower) < 150: # Short queries are usually questions
|
| 155 |
+
question_indicators = ["where is", "what is", "how does", "how do", "how is",
|
| 156 |
+
"what does", "which file", "which function", "which class",
|
| 157 |
+
"is there", "are there", "can you find", "can you show"]
|
| 158 |
+
for indicator in question_indicators:
|
| 159 |
+
if indicator in task_lower:
|
| 160 |
+
print(f"[CLASSIFIER] >>>>>> RESULT: EXPLORE (contains '{indicator}') <<<<<<")
|
| 161 |
+
return "explore"
|
| 162 |
+
|
| 163 |
+
# CODE patterns - these indicate the user wants to modify/create something
|
| 164 |
+
# Use word boundaries to avoid false matches (e.g., "implemented" shouldn't match "implement")
|
| 165 |
+
code_starters = [
|
| 166 |
+
"add", "create", "implement", "fix", "modify", "change",
|
| 167 |
+
"update", "refactor", "write", "build", "delete", "remove",
|
| 168 |
+
"make", "develop", "insert", "append", "edit", "replace"
|
| 169 |
+
]
|
| 170 |
+
|
| 171 |
+
# Check if query STARTS with a code action word
|
| 172 |
+
for pattern in code_starters:
|
| 173 |
+
if task_lower.startswith(pattern + " ") or task_lower.startswith(pattern + "\n"):
|
| 174 |
+
print(f"[CLASSIFIER] >>>>>> RESULT: CODE (starts with '{pattern}') <<<<<<")
|
| 175 |
+
return "code"
|
| 176 |
+
|
| 177 |
+
# Check for action phrases that indicate coding intent
|
| 178 |
+
code_phrases = [
|
| 179 |
+
"i want to add", "i want to create", "i want to implement",
|
| 180 |
+
"i want to fix", "i want to modify", "i want to change",
|
| 181 |
+
"i need to add", "i need to create", "i need to implement",
|
| 182 |
+
"please add", "please create", "please implement", "please fix",
|
| 183 |
+
"can you add", "can you create", "can you implement", "can you fix"
|
| 184 |
+
]
|
| 185 |
+
|
| 186 |
+
for phrase in code_phrases:
|
| 187 |
+
if phrase in task_lower:
|
| 188 |
+
print(f"[CLASSIFIER] >>>>>> RESULT: CODE (contains '{phrase}') <<<<<<")
|
| 189 |
+
return "code"
|
| 190 |
+
|
| 191 |
+
# Default: short queries without action words are likely exploration
|
| 192 |
+
if len(task_lower) < 100:
|
| 193 |
+
print(f"[CLASSIFIER] >>>>>> RESULT: EXPLORE (short query default) <<<<<<")
|
| 194 |
+
return "explore"
|
| 195 |
+
|
| 196 |
+
# Longer queries default to code (probably detailed requirements)
|
| 197 |
+
print(f"[CLASSIFIER] >>>>>> RESULT: CODE (long query default) <<<<<<")
|
| 198 |
+
return "code"
|
| 199 |
|
| 200 |
def run(self, task: str) -> Dict[str, Any]:
|
| 201 |
"""
|
| 202 |
Run the multi-agent workflow for a task.
|
| 203 |
|
| 204 |
+
First classifies the task:
|
| 205 |
+
- 'explore' → Uses lightweight ExplorerAgent only
|
| 206 |
+
- 'code' → Uses full Planner → Coder → Reviewer pipeline
|
| 207 |
+
|
| 208 |
Args:
|
| 209 |
task: User's task description (e.g., "Add a login feature")
|
| 210 |
|
| 211 |
+
Returns:
|
| 212 |
+
Result dict with status, changes, and messages
|
| 213 |
+
"""
|
| 214 |
+
# Classify the task first
|
| 215 |
+
task_type = self.classify_task(task)
|
| 216 |
+
|
| 217 |
+
if task_type == "explore":
|
| 218 |
+
# Use lightweight Explorer agent for search/explain queries
|
| 219 |
+
print(f"\n[ORCHESTRATOR] Task type: EXPLORE (using Explorer agent)")
|
| 220 |
+
print(f"[ORCHESTRATOR] Query: {task}")
|
| 221 |
+
|
| 222 |
+
answer = self.explorer.run(task)
|
| 223 |
+
|
| 224 |
+
return {
|
| 225 |
+
'status': 'complete',
|
| 226 |
+
'success': True,
|
| 227 |
+
'task': task,
|
| 228 |
+
'plan': answer, # Explorer's answer goes in plan field
|
| 229 |
+
'code_changes': None,
|
| 230 |
+
'review_feedback': None,
|
| 231 |
+
'error': None,
|
| 232 |
+
'iterations': 1
|
| 233 |
+
}
|
| 234 |
+
|
| 235 |
+
# Full workflow for code tasks: Planner → Coder → Reviewer
|
| 236 |
+
print(f"\n[ORCHESTRATOR] Task type: CODE (using full workflow)")
|
| 237 |
+
return self._run_full_workflow(task)
|
| 238 |
+
|
| 239 |
+
def _run_full_workflow(self, task: str) -> Dict[str, Any]:
|
| 240 |
+
"""
|
| 241 |
+
Run the full Explorer → Planner → Coder → Reviewer workflow.
|
| 242 |
+
|
| 243 |
+
v3.0: Now starts with Explorer to gather context efficiently,
|
| 244 |
+
then Planner creates plan based on exploration (no tools).
|
| 245 |
+
|
| 246 |
+
Args:
|
| 247 |
+
task: User's task description
|
| 248 |
+
|
| 249 |
Returns:
|
| 250 |
Result dict with status, changes, and messages
|
| 251 |
"""
|
| 252 |
# Initialize context
|
| 253 |
self.context = TaskContext(task_description=task)
|
| 254 |
+
self.state = AgentState.EXPLORING # v3.0: Start with EXPLORING
|
| 255 |
|
| 256 |
# Main state machine loop
|
| 257 |
while self.state not in [AgentState.COMPLETE, AgentState.FAILED]:
|
|
|
|
| 262 |
break
|
| 263 |
|
| 264 |
# Execute current state
|
| 265 |
+
if self.state == AgentState.EXPLORING:
|
| 266 |
+
self._execute_exploring() # NEW - Explorer first
|
| 267 |
+
|
| 268 |
+
elif self.state == AgentState.PLANNING:
|
| 269 |
self._execute_planning()
|
| 270 |
|
| 271 |
elif self.state == AgentState.CODING:
|
|
|
|
| 279 |
# Return final result
|
| 280 |
return self._build_result()
|
| 281 |
|
| 282 |
+
def _execute_exploring(self):
|
| 283 |
+
"""
|
| 284 |
+
Execute exploring state: call Explorer agent to gather context.
|
| 285 |
+
|
| 286 |
+
Explorer's job (v3.0):
|
| 287 |
+
- Search codebase efficiently using token-optimized tools
|
| 288 |
+
- Find relevant files, functions, and patterns
|
| 289 |
+
- Return context summary for Planner to use
|
| 290 |
+
|
| 291 |
+
Transition: Always go to PLANNING next
|
| 292 |
+
"""
|
| 293 |
+
print(f"\n[ORCHESTRATOR] State: EXPLORING")
|
| 294 |
+
print(f"[ORCHESTRATOR] Running Explorer to gather codebase context...")
|
| 295 |
+
|
| 296 |
+
# Run Explorer to gather context (uses token-efficient tools)
|
| 297 |
+
exploration_result = self.explorer.run(self.context.task_description)
|
| 298 |
+
|
| 299 |
+
# Store exploration context for Planner to use
|
| 300 |
+
self.context.exploration_context = exploration_result
|
| 301 |
+
|
| 302 |
+
# Transition to planning
|
| 303 |
+
self.state = AgentState.PLANNING
|
| 304 |
+
print(f"[ORCHESTRATOR] Exploration complete. Transitioning to PLANNING")
|
| 305 |
+
|
| 306 |
def _execute_planning(self):
|
| 307 |
"""
|
| 308 |
Execute planning state: call Planner agent.
|
| 309 |
|
| 310 |
+
Planner's job (v3.0):
|
| 311 |
+
- Receive exploration context from Explorer
|
| 312 |
+
- Create step-by-step plan based on exploration (NO TOOLS)
|
| 313 |
+
- Pure LLM reasoning - no searching
|
| 314 |
|
| 315 |
Transition: Always go to CODING next
|
| 316 |
"""
|
| 317 |
print(f"\n[ORCHESTRATOR] State: PLANNING")
|
| 318 |
+
print(f"[ORCHESTRATOR] Using exploration context to create plan (no tools)...")
|
| 319 |
|
| 320 |
+
# Call the Planner with exploration context (v3.0: Planner has no tools)
|
| 321 |
+
self.context.plan = self.planner.run(
|
| 322 |
+
task=self.context.task_description,
|
| 323 |
+
exploration_context=self.context.exploration_context
|
| 324 |
+
)
|
| 325 |
|
| 326 |
# Transition to coding
|
| 327 |
self.state = AgentState.CODING
|
|
|
|
| 331 |
"""
|
| 332 |
Execute coding state: call Coder agent.
|
| 333 |
|
| 334 |
+
Coder's job (v3.0):
|
| 335 |
+
- Receive exploration context and plan
|
| 336 |
+
- Read/write files to implement the plan
|
| 337 |
+
- Test in sandbox
|
| 338 |
+
- NO searching (Explorer already did that)
|
| 339 |
|
| 340 |
Transition: Always go to REVIEWING next
|
| 341 |
"""
|
|
|
|
| 343 |
|
| 344 |
# Check if this is a rework (Reviewer rejected previous code)
|
| 345 |
if self.context.review_feedback:
|
| 346 |
+
print(f"[ORCHESTRATOR] Passing exploration + plan + REVIEWER FEEDBACK to Coder...")
|
| 347 |
else:
|
| 348 |
+
print(f"[ORCHESTRATOR] Passing exploration context + plan to Coder (no search needed)...")
|
| 349 |
|
| 350 |
+
# Call the Coder with exploration context (v3.0: Coder doesn't search)
|
| 351 |
self.context.code_changes = self.coder.run(
|
| 352 |
plan=self.context.plan,
|
| 353 |
task=self.context.task_description,
|
| 354 |
+
exploration_context=self.context.exploration_context,
|
| 355 |
review_feedback=self.context.review_feedback
|
| 356 |
)
|
| 357 |
|
codepilot/agents/planner_agent.py
CHANGED
|
@@ -1,157 +1,128 @@
|
|
| 1 |
"""
|
| 2 |
-
Planner Agent - Creates implementation plans
|
| 3 |
|
| 4 |
The Planner's job:
|
| 5 |
-
1.
|
| 6 |
-
2.
|
| 7 |
-
3.
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
-
|
| 11 |
-
-
|
| 12 |
-
-
|
|
|
|
| 13 |
"""
|
| 14 |
|
| 15 |
from codepilot.llm.client import OpenAIClient
|
| 16 |
-
from codepilot.
|
| 17 |
from codepilot.agents.conversation import ConversationManager
|
| 18 |
-
from typing import
|
| 19 |
-
import json
|
| 20 |
|
| 21 |
|
| 22 |
-
# Planner's
|
| 23 |
PLANNER_SYSTEM_PROMPT = """You are a senior software architect and planning expert.
|
| 24 |
|
| 25 |
-
Your ONLY job is to create detailed implementation plans
|
| 26 |
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
2. Identify which files need to be modified or created
|
| 30 |
-
3. Break down the task into clear, specific steps
|
| 31 |
-
4. Consider dependencies and potential risks
|
| 32 |
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 38 |
|
| 39 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 40 |
|
| 41 |
-
|
| 42 |
-
- search_codebase: Search for existing code (use this first!)
|
| 43 |
-
- read_file: Read specific files to understand them
|
| 44 |
-
- list_files: Explore directory structure
|
| 45 |
-
|
| 46 |
-
You do NOT have write_file or run_command - you only plan, never execute.
|
| 47 |
"""
|
| 48 |
|
| 49 |
|
| 50 |
class PlannerAgent:
|
| 51 |
"""
|
| 52 |
-
Planner Agent - Creates implementation plans.
|
| 53 |
|
| 54 |
This agent is specialized for planning. It has:
|
| 55 |
-
-
|
| 56 |
-
-
|
| 57 |
-
- Single
|
|
|
|
| 58 |
"""
|
| 59 |
|
| 60 |
-
def __init__(self, model: str = "
|
| 61 |
"""
|
| 62 |
Initialize Planner agent.
|
| 63 |
|
| 64 |
Args:
|
| 65 |
-
model: LLM model to use
|
| 66 |
"""
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
|
| 70 |
-
|
| 71 |
-
|
| 72 |
-
"search_codebase",
|
| 73 |
-
"read_file",
|
| 74 |
-
"list_files"
|
| 75 |
-
]
|
| 76 |
|
| 77 |
-
def run(self, task: str) -> str:
|
| 78 |
"""
|
| 79 |
-
Create a plan for the given task.
|
|
|
|
|
|
|
| 80 |
|
| 81 |
Args:
|
| 82 |
task: Task description (e.g., "Add login feature")
|
|
|
|
| 83 |
|
| 84 |
Returns:
|
| 85 |
Detailed implementation plan as a string
|
| 86 |
"""
|
| 87 |
-
|
| 88 |
-
|
| 89 |
-
|
| 90 |
-
|
| 91 |
-
|
| 92 |
-
|
| 93 |
-
|
| 94 |
-
|
| 95 |
-
|
| 96 |
-
|
| 97 |
-
|
| 98 |
-
|
| 99 |
-
|
| 100 |
-
|
| 101 |
-
|
| 102 |
-
|
| 103 |
-
|
| 104 |
-
|
| 105 |
-
|
| 106 |
-
|
| 107 |
-
|
| 108 |
-
|
| 109 |
-
|
| 110 |
-
|
| 111 |
-
|
| 112 |
-
|
| 113 |
-
|
| 114 |
-
|
| 115 |
-
|
| 116 |
-
|
| 117 |
-
|
| 118 |
-
#
|
| 119 |
-
|
| 120 |
-
|
| 121 |
-
|
| 122 |
-
|
| 123 |
-
|
| 124 |
-
|
| 125 |
-
# Check if done
|
| 126 |
-
if finish_reason == "stop":
|
| 127 |
-
# Agent finished planning
|
| 128 |
-
return message.content
|
| 129 |
-
|
| 130 |
-
# Execute tool calls
|
| 131 |
-
if finish_reason == "tool_calls":
|
| 132 |
-
for tool_call in message.tool_calls:
|
| 133 |
-
tool_name = tool_call.function.name
|
| 134 |
-
tool_args = json.loads(tool_call.function.arguments)
|
| 135 |
-
|
| 136 |
-
print(f"[PLANNER] Calling tool: {tool_name}({tool_args})")
|
| 137 |
-
|
| 138 |
-
# Execute tool
|
| 139 |
-
tool_func = get_tool_function(tool_name)
|
| 140 |
-
if tool_func:
|
| 141 |
-
result = tool_func(**tool_args)
|
| 142 |
-
else:
|
| 143 |
-
result = f"Error: Tool {tool_name} not found"
|
| 144 |
-
|
| 145 |
-
# Add tool result to conversation
|
| 146 |
-
self.conversation.add_tool_result(
|
| 147 |
-
tool_call_id=tool_call.id,
|
| 148 |
-
tool_name=tool_name,
|
| 149 |
-
result=str(result)
|
| 150 |
-
)
|
| 151 |
-
|
| 152 |
-
# If we hit max iterations, return what we have
|
| 153 |
-
return "Error: Planner exceeded max iterations"
|
| 154 |
|
| 155 |
def get_tool_access(self) -> list:
|
| 156 |
"""Return list of tools this agent can access."""
|
| 157 |
-
return
|
|
|
|
| 1 |
"""
|
| 2 |
+
Planner Agent - Creates implementation plans (v3.0 - Pure LLM, No Tools)
|
| 3 |
|
| 4 |
The Planner's job:
|
| 5 |
+
1. Receive exploration context from Explorer agent
|
| 6 |
+
2. Create a detailed, step-by-step implementation plan
|
| 7 |
+
3. NO searching - Explorer already did that
|
| 8 |
+
|
| 9 |
+
v3.0 Changes:
|
| 10 |
+
- Removed all tools (pure LLM reasoning)
|
| 11 |
+
- Receives exploration_context from Explorer
|
| 12 |
+
- Single LLM call instead of tool loop
|
| 13 |
+
- ~90% token reduction vs v2.0
|
| 14 |
"""
|
| 15 |
|
| 16 |
from codepilot.llm.client import OpenAIClient
|
| 17 |
+
from codepilot.llm.claude_client import ClaudeClient
|
| 18 |
from codepilot.agents.conversation import ConversationManager
|
| 19 |
+
from typing import Optional
|
|
|
|
| 20 |
|
| 21 |
|
| 22 |
+
# Planner's system prompt (v3.0 - no tools, just planning)
|
| 23 |
PLANNER_SYSTEM_PROMPT = """You are a senior software architect and planning expert.
|
| 24 |
|
| 25 |
+
Your ONLY job is to create detailed implementation plans based on the exploration context provided.
|
| 26 |
|
| 27 |
+
You do NOT have any tools. The Explorer agent has already searched the codebase for you.
|
| 28 |
+
Use the EXPLORATION RESULTS to understand the codebase structure and create your plan.
|
|
|
|
|
|
|
|
|
|
| 29 |
|
| 30 |
+
=== YOUR PLAN SHOULD INCLUDE ===
|
| 31 |
+
1. OVERVIEW: Brief summary of what needs to be done
|
| 32 |
+
2. FILES TO MODIFY: List each file with specific changes needed
|
| 33 |
+
3. IMPLEMENTATION STEPS: Ordered steps with exact details:
|
| 34 |
+
- File path
|
| 35 |
+
- Function/class to modify or create
|
| 36 |
+
- What code to add/change
|
| 37 |
+
- Line numbers if provided in exploration
|
| 38 |
+
4. TESTING: How to verify the changes work
|
| 39 |
|
| 40 |
+
=== PLAN QUALITY REQUIREMENTS ===
|
| 41 |
+
- Be SPECIFIC: Include exact file names, function names, line numbers
|
| 42 |
+
- Be ORDERED: Steps should build on each other logically
|
| 43 |
+
- Be COMPLETE: Cover all aspects of the task
|
| 44 |
+
- Be CONCISE: Don't repeat information from exploration
|
| 45 |
|
| 46 |
+
You do NOT write code - just create the plan for the Coder agent to follow.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 47 |
"""
|
| 48 |
|
| 49 |
|
| 50 |
class PlannerAgent:
|
| 51 |
"""
|
| 52 |
+
Planner Agent - Creates implementation plans (v3.0).
|
| 53 |
|
| 54 |
This agent is specialized for planning. It has:
|
| 55 |
+
- NO tools (pure LLM reasoning)
|
| 56 |
+
- Receives exploration context from Explorer
|
| 57 |
+
- Single LLM call (no iteration loop)
|
| 58 |
+
- Maximum token efficiency
|
| 59 |
"""
|
| 60 |
|
| 61 |
+
def __init__(self, model: str = "claude-sonnet-4-5-20250929"):
|
| 62 |
"""
|
| 63 |
Initialize Planner agent.
|
| 64 |
|
| 65 |
Args:
|
| 66 |
+
model: LLM model to use (default: Claude Sonnet 4.5)
|
| 67 |
"""
|
| 68 |
+
# Use Claude client for Claude models, OpenAI client as fallback
|
| 69 |
+
if "claude" in model.lower():
|
| 70 |
+
self.client = ClaudeClient(model=model)
|
| 71 |
+
else:
|
| 72 |
+
self.client = OpenAIClient(model=model)
|
|
|
|
|
|
|
|
|
|
|
|
|
| 73 |
|
| 74 |
+
def run(self, task: str, exploration_context: Optional[str] = None) -> str:
|
| 75 |
"""
|
| 76 |
+
Create a plan for the given task using exploration context.
|
| 77 |
+
|
| 78 |
+
v3.0: No tools - pure LLM reasoning based on Explorer's findings.
|
| 79 |
|
| 80 |
Args:
|
| 81 |
task: Task description (e.g., "Add login feature")
|
| 82 |
+
exploration_context: Context gathered by Explorer agent
|
| 83 |
|
| 84 |
Returns:
|
| 85 |
Detailed implementation plan as a string
|
| 86 |
"""
|
| 87 |
+
print(f"[PLANNER] Creating plan based on exploration context (no tools)")
|
| 88 |
+
|
| 89 |
+
# Build the prompt with exploration context
|
| 90 |
+
if exploration_context:
|
| 91 |
+
user_prompt = f"""=== EXPLORATION RESULTS ===
|
| 92 |
+
{exploration_context}
|
| 93 |
+
|
| 94 |
+
=== TASK ===
|
| 95 |
+
{task}
|
| 96 |
+
|
| 97 |
+
Based on the exploration results above, create a detailed implementation plan.
|
| 98 |
+
Include specific file paths, function names, and step-by-step instructions for the Coder agent.
|
| 99 |
+
"""
|
| 100 |
+
else:
|
| 101 |
+
# Fallback if no exploration context (shouldn't happen in v3.0)
|
| 102 |
+
user_prompt = f"""=== TASK ===
|
| 103 |
+
{task}
|
| 104 |
+
|
| 105 |
+
Create a detailed implementation plan for this task.
|
| 106 |
+
Note: No exploration context was provided, so make reasonable assumptions about the codebase structure.
|
| 107 |
+
"""
|
| 108 |
+
|
| 109 |
+
# Create conversation with system prompt and user message
|
| 110 |
+
conversation = ConversationManager()
|
| 111 |
+
conversation.add_message("system", PLANNER_SYSTEM_PROMPT)
|
| 112 |
+
conversation.add_message("user", user_prompt)
|
| 113 |
+
|
| 114 |
+
# Single LLM call - no tools, no iteration loop
|
| 115 |
+
response = self.client.chat(
|
| 116 |
+
messages=conversation.get_messages(),
|
| 117 |
+
tools=None, # NO TOOLS - pure reasoning
|
| 118 |
+
max_tokens=2000 # Enough for a detailed plan
|
| 119 |
+
)
|
| 120 |
+
|
| 121 |
+
plan = response.choices[0].message.content
|
| 122 |
+
print(f"[PLANNER] Plan created successfully")
|
| 123 |
+
|
| 124 |
+
return plan
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 125 |
|
| 126 |
def get_tool_access(self) -> list:
|
| 127 |
"""Return list of tools this agent can access."""
|
| 128 |
+
return [] # v3.0: No tools
|
codepilot/agents/reviewer_agent.py
CHANGED
|
@@ -13,6 +13,7 @@ Tools it has access to:
|
|
| 13 |
"""
|
| 14 |
|
| 15 |
from codepilot.llm.client import OpenAIClient
|
|
|
|
| 16 |
from codepilot.tools.registry import get_tools, get_tool_function
|
| 17 |
from codepilot.agents.conversation import ConversationManager
|
| 18 |
from typing import Dict, Any, Tuple
|
|
@@ -24,41 +25,37 @@ REVIEWER_SYSTEM_PROMPT = """You are a senior code reviewer and quality assurance
|
|
| 24 |
|
| 25 |
Your ONLY job is to review code changes and provide feedback. You do NOT write code yourself.
|
| 26 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 27 |
When given code changes:
|
| 28 |
-
1.
|
| 29 |
-
2.
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
- Missing error handling
|
| 33 |
-
- Poor naming or unclear code
|
| 34 |
-
- Code that doesn't match the plan
|
| 35 |
-
3. Decide: APPROVE or REJECT
|
| 36 |
-
4. If rejecting, provide specific, actionable feedback
|
| 37 |
-
|
| 38 |
-
Your review should be:
|
| 39 |
-
- Thorough (check all aspects of the code)
|
| 40 |
-
- Specific (point to exact issues with line numbers if possible)
|
| 41 |
-
- Constructive (explain WHY something is wrong and HOW to fix it)
|
| 42 |
-
- Fair (don't reject for minor style issues)
|
| 43 |
|
| 44 |
DECISION CRITERIA:
|
| 45 |
-
✅ APPROVE if:
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
-
|
| 50 |
-
-
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
|
| 55 |
-
- Doesn't implement the plan
|
| 56 |
-
- Missing critical error handling
|
| 57 |
-
- Code is unclear or confusing
|
| 58 |
-
|
| 59 |
-
Tools available to you:
|
| 60 |
-
- read_file: Read files to understand full context
|
| 61 |
-
- search_codebase: Check for similar patterns in the codebase
|
| 62 |
|
| 63 |
You do NOT have write_file - you only review, never modify code.
|
| 64 |
"""
|
|
@@ -74,20 +71,27 @@ class ReviewerAgent:
|
|
| 74 |
- Single responsibility (review only)
|
| 75 |
"""
|
| 76 |
|
| 77 |
-
def __init__(self, model: str = "
|
| 78 |
"""
|
| 79 |
Initialize Reviewer agent.
|
| 80 |
|
| 81 |
Args:
|
| 82 |
-
model: LLM model to use
|
| 83 |
"""
|
| 84 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 85 |
self.conversation = ConversationManager()
|
| 86 |
|
| 87 |
# Reviewer only gets read-only tools
|
| 88 |
self.allowed_tools = [
|
| 89 |
-
"
|
| 90 |
-
"
|
|
|
|
|
|
|
| 91 |
]
|
| 92 |
|
| 93 |
def run(self, code_changes: Dict[str, str], plan: str, task: str) -> Tuple[bool, str]:
|
|
|
|
| 13 |
"""
|
| 14 |
|
| 15 |
from codepilot.llm.client import OpenAIClient
|
| 16 |
+
from codepilot.llm.claude_client import ClaudeClient
|
| 17 |
from codepilot.tools.registry import get_tools, get_tool_function
|
| 18 |
from codepilot.agents.conversation import ConversationManager
|
| 19 |
from typing import Dict, Any, Tuple
|
|
|
|
| 25 |
|
| 26 |
Your ONLY job is to review code changes and provide feedback. You do NOT write code yourself.
|
| 27 |
|
| 28 |
+
=== CRITICAL: TOKEN-EFFICIENT FILE READING ===
|
| 29 |
+
1. NEVER use read_file as your first choice!
|
| 30 |
+
2. ALWAYS use get_file_outline FIRST to see file structure (~50 tokens vs ~2000 tokens)
|
| 31 |
+
3. THEN use get_code_chunk to read ONLY the specific function/class you need to review
|
| 32 |
+
4. ONLY use read_file if you absolutely need the ENTIRE file (rare!)
|
| 33 |
+
|
| 34 |
+
CORRECT workflow:
|
| 35 |
+
get_file_outline("file.py") → See structure
|
| 36 |
+
get_code_chunk("file.py", "my_func") → Review just that function
|
| 37 |
+
|
| 38 |
+
WRONG workflow:
|
| 39 |
+
read_file("file.py") → Wastes 2000+ tokens!
|
| 40 |
+
|
| 41 |
+
=== REVIEW WORKFLOW ===
|
| 42 |
When given code changes:
|
| 43 |
+
1. The code changes are already provided in the prompt - review those first
|
| 44 |
+
2. If you need more context, use get_file_outline then get_code_chunk
|
| 45 |
+
3. Check for: bugs, security issues, missing error handling, plan compliance
|
| 46 |
+
4. Decide: APPROVE or REJECT
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 47 |
|
| 48 |
DECISION CRITERIA:
|
| 49 |
+
✅ APPROVE if: Code works, no security issues, follows plan, has error handling
|
| 50 |
+
❌ REJECT if: Has bugs, security issues, doesn't follow plan, unclear code
|
| 51 |
+
|
| 52 |
+
=== TOOLS ===
|
| 53 |
+
- get_file_outline: Get file structure WITHOUT code - USE THIS FIRST!
|
| 54 |
+
- get_code_chunk: Extract ONE specific function/class - USE THIS SECOND!
|
| 55 |
+
- read_file: Read ENTIRE file - AVOID THIS!
|
| 56 |
+
- search_repository: Find similar patterns
|
| 57 |
+
|
| 58 |
+
End your review with: "DECISION: APPROVE" or "DECISION: REJECT"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 59 |
|
| 60 |
You do NOT have write_file - you only review, never modify code.
|
| 61 |
"""
|
|
|
|
| 71 |
- Single responsibility (review only)
|
| 72 |
"""
|
| 73 |
|
| 74 |
+
def __init__(self, model: str = "claude-sonnet-4-5-20250929"):
|
| 75 |
"""
|
| 76 |
Initialize Reviewer agent.
|
| 77 |
|
| 78 |
Args:
|
| 79 |
+
model: LLM model to use (default: Claude Sonnet 4.5)
|
| 80 |
"""
|
| 81 |
+
# Use Claude client for Claude models, OpenAI client as fallback
|
| 82 |
+
if "claude" in model.lower():
|
| 83 |
+
self.client = ClaudeClient(model=model)
|
| 84 |
+
else:
|
| 85 |
+
self.client = OpenAIClient(model=model)
|
| 86 |
+
|
| 87 |
self.conversation = ConversationManager()
|
| 88 |
|
| 89 |
# Reviewer only gets read-only tools
|
| 90 |
self.allowed_tools = [
|
| 91 |
+
"get_file_outline", # Get file structure without full code (token-efficient!)
|
| 92 |
+
"get_code_chunk", # Extract specific function/class by name
|
| 93 |
+
"read_file", # Full file contents (use sparingly)
|
| 94 |
+
"search_repository"
|
| 95 |
]
|
| 96 |
|
| 97 |
def run(self, code_changes: Dict[str, str], plan: str, task: str) -> Tuple[bool, str]:
|
codepilot/context/indexer.py
CHANGED
|
@@ -48,11 +48,15 @@ class CodebaseIndexer:
|
|
| 48 |
# Skip unwanted directories (modify dirs in-place)
|
| 49 |
dirs[:] = [d for d in dirs if d not in [
|
| 50 |
'__pycache__', 'venv', 'node_modules', '.git',
|
| 51 |
-
'.pytest_cache', '.mypy_cache'
|
| 52 |
]]
|
| 53 |
|
| 54 |
# Process each file
|
| 55 |
for file in files:
|
|
|
|
|
|
|
|
|
|
|
|
|
| 56 |
# Check if file has matching extension
|
| 57 |
if any(file.endswith(ext) for ext in file_extensions):
|
| 58 |
file_path = os.path.join(root, file)
|
|
|
|
| 48 |
# Skip unwanted directories (modify dirs in-place)
|
| 49 |
dirs[:] = [d for d in dirs if d not in [
|
| 50 |
'__pycache__', 'venv', 'node_modules', '.git',
|
| 51 |
+
'.pytest_cache', '.mypy_cache', 'tests', 'test'
|
| 52 |
]]
|
| 53 |
|
| 54 |
# Process each file
|
| 55 |
for file in files:
|
| 56 |
+
# Skip test files
|
| 57 |
+
if file.startswith('test_') or file.endswith('_test.py'):
|
| 58 |
+
continue
|
| 59 |
+
|
| 60 |
# Check if file has matching extension
|
| 61 |
if any(file.endswith(ext) for ext in file_extensions):
|
| 62 |
file_path = os.path.join(root, file)
|
codepilot/llm/claude_client.py
ADDED
|
@@ -0,0 +1,235 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Claude Client Wrapper
|
| 3 |
+
Handles all communication with Anthropic's Claude API
|
| 4 |
+
"""
|
| 5 |
+
|
| 6 |
+
import os
|
| 7 |
+
import json
|
| 8 |
+
from dotenv import load_dotenv
|
| 9 |
+
from anthropic import Anthropic
|
| 10 |
+
from typing import List, Dict, Optional
|
| 11 |
+
|
| 12 |
+
load_dotenv()
|
| 13 |
+
|
| 14 |
+
|
| 15 |
+
class ClaudeClient:
|
| 16 |
+
"""Wrapper for Anthropic Claude API calls"""
|
| 17 |
+
|
| 18 |
+
def __init__(self, model: str = "claude-sonnet-4-5-20250929"):
|
| 19 |
+
"""
|
| 20 |
+
Initialize Claude client
|
| 21 |
+
|
| 22 |
+
Args:
|
| 23 |
+
model: Claude model to use (default: claude-3-5-sonnet-20241022)
|
| 24 |
+
"""
|
| 25 |
+
self.api_key = os.getenv('ANTHROPIC_API_KEY')
|
| 26 |
+
|
| 27 |
+
if not self.api_key:
|
| 28 |
+
raise ValueError("ANTHROPIC_API_KEY not found in environment variables")
|
| 29 |
+
|
| 30 |
+
self.client = Anthropic(api_key=self.api_key)
|
| 31 |
+
self.model = model
|
| 32 |
+
|
| 33 |
+
print(f"✅ Claude Client initialized with model: {self.model}")
|
| 34 |
+
|
| 35 |
+
def chat(
|
| 36 |
+
self,
|
| 37 |
+
messages: List[Dict[str, str]],
|
| 38 |
+
tools: Optional[List[Dict]] = None,
|
| 39 |
+
temperature: float = 0.7,
|
| 40 |
+
max_tokens: int = 1000
|
| 41 |
+
):
|
| 42 |
+
"""
|
| 43 |
+
Send a chat completion request to Claude
|
| 44 |
+
|
| 45 |
+
Args:
|
| 46 |
+
messages: List of message dicts with 'role' and 'content'
|
| 47 |
+
tools: Optional list of tool definitions for function calling
|
| 48 |
+
temperature: Randomness (0-1, lower = more focused)
|
| 49 |
+
max_tokens: Maximum tokens in response
|
| 50 |
+
|
| 51 |
+
Returns:
|
| 52 |
+
Response object compatible with OpenAI format
|
| 53 |
+
"""
|
| 54 |
+
try:
|
| 55 |
+
# Separate system message from conversation and convert tool messages
|
| 56 |
+
system_message = ""
|
| 57 |
+
conversation_messages = []
|
| 58 |
+
pending_tool_results = []
|
| 59 |
+
|
| 60 |
+
for msg in messages:
|
| 61 |
+
if msg.get("role") == "system":
|
| 62 |
+
system_message = msg.get("content", "")
|
| 63 |
+
elif msg.get("role") == "tool":
|
| 64 |
+
# Collect tool results to group them
|
| 65 |
+
pending_tool_results.append({
|
| 66 |
+
"type": "tool_result",
|
| 67 |
+
"tool_use_id": msg.get("tool_call_id"),
|
| 68 |
+
"content": msg.get("content", "")
|
| 69 |
+
})
|
| 70 |
+
elif msg.get("role") == "assistant" and msg.get("tool_calls"):
|
| 71 |
+
# Convert OpenAI tool_calls to Claude tool_use format
|
| 72 |
+
# Flush pending tool results first
|
| 73 |
+
if pending_tool_results:
|
| 74 |
+
conversation_messages.append({
|
| 75 |
+
"role": "user",
|
| 76 |
+
"content": pending_tool_results
|
| 77 |
+
})
|
| 78 |
+
pending_tool_results = []
|
| 79 |
+
|
| 80 |
+
# Convert tool_calls to content blocks with tool_use
|
| 81 |
+
content_blocks = []
|
| 82 |
+
if msg.get("content"):
|
| 83 |
+
content_blocks.append({"type": "text", "text": msg.get("content")})
|
| 84 |
+
|
| 85 |
+
for tc in msg.get("tool_calls", []):
|
| 86 |
+
# Handle both dict and object formats
|
| 87 |
+
if isinstance(tc, dict):
|
| 88 |
+
tool_id = tc.get("id")
|
| 89 |
+
func = tc.get("function", {})
|
| 90 |
+
func_name = func.get("name")
|
| 91 |
+
func_args = func.get("arguments", "{}")
|
| 92 |
+
else:
|
| 93 |
+
tool_id = tc.id
|
| 94 |
+
func_name = tc.function.name
|
| 95 |
+
func_args = tc.function.arguments
|
| 96 |
+
|
| 97 |
+
content_blocks.append({
|
| 98 |
+
"type": "tool_use",
|
| 99 |
+
"id": tool_id,
|
| 100 |
+
"name": func_name,
|
| 101 |
+
"input": json.loads(func_args) if isinstance(func_args, str) else func_args
|
| 102 |
+
})
|
| 103 |
+
|
| 104 |
+
conversation_messages.append({
|
| 105 |
+
"role": "assistant",
|
| 106 |
+
"content": content_blocks
|
| 107 |
+
})
|
| 108 |
+
else:
|
| 109 |
+
# Flush pending tool results before adding non-tool message
|
| 110 |
+
if pending_tool_results:
|
| 111 |
+
conversation_messages.append({
|
| 112 |
+
"role": "user",
|
| 113 |
+
"content": pending_tool_results
|
| 114 |
+
})
|
| 115 |
+
pending_tool_results = []
|
| 116 |
+
conversation_messages.append(msg)
|
| 117 |
+
|
| 118 |
+
# Flush any remaining tool results
|
| 119 |
+
if pending_tool_results:
|
| 120 |
+
conversation_messages.append({
|
| 121 |
+
"role": "user",
|
| 122 |
+
"content": pending_tool_results
|
| 123 |
+
})
|
| 124 |
+
|
| 125 |
+
# Build request parameters
|
| 126 |
+
request_params = {
|
| 127 |
+
"model": self.model,
|
| 128 |
+
"messages": conversation_messages,
|
| 129 |
+
"temperature": temperature,
|
| 130 |
+
"max_tokens": max_tokens
|
| 131 |
+
}
|
| 132 |
+
|
| 133 |
+
# Add system message if present
|
| 134 |
+
if system_message:
|
| 135 |
+
request_params["system"] = system_message
|
| 136 |
+
|
| 137 |
+
# Add tools if provided (convert from OpenAI format to Claude format)
|
| 138 |
+
if tools:
|
| 139 |
+
claude_tools = self._convert_tools_to_claude_format(tools)
|
| 140 |
+
request_params["tools"] = claude_tools
|
| 141 |
+
|
| 142 |
+
# Make API call
|
| 143 |
+
response = self.client.messages.create(**request_params)
|
| 144 |
+
|
| 145 |
+
# Convert Claude response to OpenAI-compatible format
|
| 146 |
+
openai_compatible_response = self._convert_to_openai_format(response)
|
| 147 |
+
|
| 148 |
+
# Print token usage for cost tracking
|
| 149 |
+
usage = response.usage
|
| 150 |
+
print(f"📊 Tokens: {usage.input_tokens} prompt + {usage.output_tokens} completion = {usage.input_tokens + usage.output_tokens} total")
|
| 151 |
+
|
| 152 |
+
return openai_compatible_response
|
| 153 |
+
|
| 154 |
+
except Exception as e:
|
| 155 |
+
print(f"❌ Claude API Error: {e}")
|
| 156 |
+
raise
|
| 157 |
+
|
| 158 |
+
def _convert_tools_to_claude_format(self, openai_tools: List[Dict]) -> List[Dict]:
|
| 159 |
+
"""Convert OpenAI tool format to Claude tool format"""
|
| 160 |
+
claude_tools = []
|
| 161 |
+
|
| 162 |
+
for tool in openai_tools:
|
| 163 |
+
if tool.get("type") == "function":
|
| 164 |
+
func = tool.get("function", {})
|
| 165 |
+
claude_tool = {
|
| 166 |
+
"name": func.get("name"),
|
| 167 |
+
"description": func.get("description"),
|
| 168 |
+
"input_schema": func.get("parameters", {})
|
| 169 |
+
}
|
| 170 |
+
claude_tools.append(claude_tool)
|
| 171 |
+
|
| 172 |
+
return claude_tools
|
| 173 |
+
|
| 174 |
+
def _convert_to_openai_format(self, claude_response):
|
| 175 |
+
"""Convert Claude response to OpenAI-compatible format"""
|
| 176 |
+
import json
|
| 177 |
+
|
| 178 |
+
# Create simple dict-based objects that provide attribute access
|
| 179 |
+
class DictObject(dict):
|
| 180 |
+
"""Object that behaves like both dict and object (JSON serializable)"""
|
| 181 |
+
def __init__(self, **kwargs):
|
| 182 |
+
super().__init__(kwargs)
|
| 183 |
+
self.__dict__ = self
|
| 184 |
+
|
| 185 |
+
# Extract content and tool calls from Claude response
|
| 186 |
+
content_parts = []
|
| 187 |
+
tool_calls = []
|
| 188 |
+
|
| 189 |
+
for block in claude_response.content:
|
| 190 |
+
if block.type == "text":
|
| 191 |
+
content_parts.append(block.text)
|
| 192 |
+
elif block.type == "tool_use":
|
| 193 |
+
# Convert to OpenAI tool call format
|
| 194 |
+
tool_call = DictObject(
|
| 195 |
+
id=block.id,
|
| 196 |
+
type="function",
|
| 197 |
+
function=DictObject(
|
| 198 |
+
name=block.name,
|
| 199 |
+
arguments=json.dumps(block.input)
|
| 200 |
+
)
|
| 201 |
+
)
|
| 202 |
+
tool_calls.append(tool_call)
|
| 203 |
+
|
| 204 |
+
# Determine finish reason
|
| 205 |
+
finish_reason = "stop"
|
| 206 |
+
if claude_response.stop_reason == "tool_use":
|
| 207 |
+
finish_reason = "tool_calls"
|
| 208 |
+
elif claude_response.stop_reason == "max_tokens":
|
| 209 |
+
finish_reason = "length"
|
| 210 |
+
|
| 211 |
+
# Build message
|
| 212 |
+
message = DictObject(
|
| 213 |
+
role="assistant",
|
| 214 |
+
content="\n".join(content_parts) if content_parts else None,
|
| 215 |
+
tool_calls=tool_calls if tool_calls else None
|
| 216 |
+
)
|
| 217 |
+
|
| 218 |
+
# Build choice
|
| 219 |
+
choice = DictObject(
|
| 220 |
+
message=message,
|
| 221 |
+
finish_reason=finish_reason
|
| 222 |
+
)
|
| 223 |
+
|
| 224 |
+
# Build usage
|
| 225 |
+
usage = DictObject(
|
| 226 |
+
prompt_tokens=claude_response.usage.input_tokens,
|
| 227 |
+
completion_tokens=claude_response.usage.output_tokens,
|
| 228 |
+
total_tokens=claude_response.usage.input_tokens + claude_response.usage.output_tokens
|
| 229 |
+
)
|
| 230 |
+
|
| 231 |
+
# Build response
|
| 232 |
+
return DictObject(
|
| 233 |
+
choices=[choice],
|
| 234 |
+
usage=usage
|
| 235 |
+
)
|
codepilot/llm/client.py
CHANGED
|
@@ -36,7 +36,7 @@ class OpenAIClient:
|
|
| 36 |
messages: List[Dict[str, str]],
|
| 37 |
tools: Optional[List[Dict]] = None,
|
| 38 |
temperature: float = 0.7,
|
| 39 |
-
max_tokens: int =
|
| 40 |
) -> openai.types.chat.ChatCompletion:
|
| 41 |
"""
|
| 42 |
Send a chat completion request to OpenAI
|
|
|
|
| 36 |
messages: List[Dict[str, str]],
|
| 37 |
tools: Optional[List[Dict]] = None,
|
| 38 |
temperature: float = 0.7,
|
| 39 |
+
max_tokens: int = 800
|
| 40 |
) -> openai.types.chat.ChatCompletion:
|
| 41 |
"""
|
| 42 |
Send a chat completion request to OpenAI
|
codepilot/tools/context_tools.py
CHANGED
|
@@ -86,7 +86,8 @@ def index_codebase(path: str = ".") -> str:
|
|
| 86 |
})
|
| 87 |
|
| 88 |
# Create and index hybrid retriever
|
| 89 |
-
|
|
|
|
| 90 |
retrieval_stats = _hybrid_retriever.index_documents(documents)
|
| 91 |
|
| 92 |
# Return summary
|
|
@@ -141,3 +142,20 @@ def search_codebase(query: str, top_k: int = 5) -> str:
|
|
| 141 |
output.append(f" {line}")
|
| 142 |
|
| 143 |
return '\n'.join(output)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 86 |
})
|
| 87 |
|
| 88 |
# Create and index hybrid retriever
|
| 89 |
+
# Weights tuned for code search: heavily favor BM25 (exact matches) over embeddings (semantic)
|
| 90 |
+
_hybrid_retriever = HybridRetriever(bm25_weight=0.85, embedding_weight=0.15)
|
| 91 |
retrieval_stats = _hybrid_retriever.index_documents(documents)
|
| 92 |
|
| 93 |
# Return summary
|
|
|
|
| 142 |
output.append(f" {line}")
|
| 143 |
|
| 144 |
return '\n'.join(output)
|
| 145 |
+
|
| 146 |
+
|
| 147 |
+
def search_repository(query: str, top_k: int = 5) -> str:
|
| 148 |
+
"""
|
| 149 |
+
Search the cloned GitHub repository using hybrid retrieval (BM25 + embeddings).
|
| 150 |
+
|
| 151 |
+
This is a wrapper that provides semantic search over cloned repositories.
|
| 152 |
+
Uses the same hybrid search as search_codebase.
|
| 153 |
+
|
| 154 |
+
Args:
|
| 155 |
+
query: What to search for (e.g., "Flask application class", "error handling")
|
| 156 |
+
top_k: Number of results to return (default: 5)
|
| 157 |
+
|
| 158 |
+
Returns:
|
| 159 |
+
Formatted search results with file paths, scores, and code snippets
|
| 160 |
+
"""
|
| 161 |
+
return search_codebase(query, top_k)
|
codepilot/tools/file_tools.py
CHANGED
|
@@ -6,20 +6,35 @@ import subprocess
|
|
| 6 |
import os
|
| 7 |
|
| 8 |
|
| 9 |
-
def read_file(path):
|
| 10 |
"""
|
| 11 |
Reads and returns the contents of a file.
|
|
|
|
| 12 |
|
| 13 |
Args:
|
| 14 |
path: File path to read
|
|
|
|
| 15 |
|
| 16 |
Returns:
|
| 17 |
str: File contents or error message
|
| 18 |
"""
|
| 19 |
try:
|
| 20 |
with open(path, 'r') as f:
|
| 21 |
-
|
| 22 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 23 |
except FileNotFoundError:
|
| 24 |
return f"Error: File '{path}' not found."
|
| 25 |
except PermissionError:
|
|
|
|
| 6 |
import os
|
| 7 |
|
| 8 |
|
| 9 |
+
def read_file(path, max_lines=200):
|
| 10 |
"""
|
| 11 |
Reads and returns the contents of a file.
|
| 12 |
+
Truncates large files to prevent context overflow.
|
| 13 |
|
| 14 |
Args:
|
| 15 |
path: File path to read
|
| 16 |
+
max_lines: Maximum lines to return (default 100)
|
| 17 |
|
| 18 |
Returns:
|
| 19 |
str: File contents or error message
|
| 20 |
"""
|
| 21 |
try:
|
| 22 |
with open(path, 'r') as f:
|
| 23 |
+
lines = f.readlines()
|
| 24 |
+
|
| 25 |
+
total_lines = len(lines)
|
| 26 |
+
|
| 27 |
+
# Truncate if too large
|
| 28 |
+
if total_lines > max_lines:
|
| 29 |
+
# Keep first half and last half
|
| 30 |
+
keep = max_lines // 2
|
| 31 |
+
content = ''.join(lines[:keep])
|
| 32 |
+
content += f'\n... [truncated {total_lines - max_lines} lines] ...\n\n'
|
| 33 |
+
content += ''.join(lines[-keep:])
|
| 34 |
+
return f"Successfully read file '{path}' ({total_lines} lines, showing first/last {keep}):\n\n{content}"
|
| 35 |
+
else:
|
| 36 |
+
content = ''.join(lines)
|
| 37 |
+
return f"Successfully read file '{path}':\n\n{content}"
|
| 38 |
except FileNotFoundError:
|
| 39 |
return f"Error: File '{path}' not found."
|
| 40 |
except PermissionError:
|
codepilot/tools/github_tools.py
ADDED
|
@@ -0,0 +1,211 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
GitHub Repository Tools
|
| 3 |
+
Handles cloning and managing public GitHub repositories for CodePilot sessions
|
| 4 |
+
"""
|
| 5 |
+
|
| 6 |
+
import os
|
| 7 |
+
import re
|
| 8 |
+
import shutil
|
| 9 |
+
import subprocess
|
| 10 |
+
import tempfile
|
| 11 |
+
from typing import Optional, Tuple
|
| 12 |
+
import uuid
|
| 13 |
+
|
| 14 |
+
|
| 15 |
+
def extract_github_url(text: str) -> Optional[str]:
|
| 16 |
+
"""
|
| 17 |
+
Extract a GitHub repository URL from text.
|
| 18 |
+
|
| 19 |
+
Supports formats:
|
| 20 |
+
- https://github.com/user/repo
|
| 21 |
+
- https://github.com/user/repo.git
|
| 22 |
+
- github.com/user/repo
|
| 23 |
+
- http://github.com/user/repo
|
| 24 |
+
|
| 25 |
+
Returns:
|
| 26 |
+
GitHub URL if found, None otherwise
|
| 27 |
+
"""
|
| 28 |
+
# Pattern to match GitHub URLs
|
| 29 |
+
pattern = r'(?:https?://)?(?:www\.)?github\.com/([a-zA-Z0-9_-]+)/([a-zA-Z0-9_.-]+)(?:\.git)?'
|
| 30 |
+
match = re.search(pattern, text)
|
| 31 |
+
|
| 32 |
+
if match:
|
| 33 |
+
user = match.group(1)
|
| 34 |
+
repo = match.group(2).rstrip('.git')
|
| 35 |
+
return f"https://github.com/{user}/{repo}.git"
|
| 36 |
+
|
| 37 |
+
return None
|
| 38 |
+
|
| 39 |
+
|
| 40 |
+
def get_repo_name(github_url: str) -> str:
|
| 41 |
+
"""Extract repository name from GitHub URL."""
|
| 42 |
+
# Remove .git suffix if present
|
| 43 |
+
url = github_url.rstrip('.git')
|
| 44 |
+
# Get the last part of the URL
|
| 45 |
+
return url.split('/')[-1]
|
| 46 |
+
|
| 47 |
+
|
| 48 |
+
def clone_repository(github_url: str, base_dir: Optional[str] = None) -> Tuple[bool, str, str]:
|
| 49 |
+
"""
|
| 50 |
+
Clone a public GitHub repository to a temporary directory.
|
| 51 |
+
|
| 52 |
+
Args:
|
| 53 |
+
github_url: The GitHub repository URL
|
| 54 |
+
base_dir: Optional base directory for cloning (default: system temp)
|
| 55 |
+
|
| 56 |
+
Returns:
|
| 57 |
+
Tuple of (success: bool, path_or_error: str, repo_name: str)
|
| 58 |
+
"""
|
| 59 |
+
repo_name = get_repo_name(github_url)
|
| 60 |
+
|
| 61 |
+
# Create a unique session directory
|
| 62 |
+
session_id = str(uuid.uuid4())[:8]
|
| 63 |
+
|
| 64 |
+
if base_dir is None:
|
| 65 |
+
# Use /tmp for cloud environments (more space than tempfile default)
|
| 66 |
+
base_dir = "/tmp/codepilot_repos"
|
| 67 |
+
|
| 68 |
+
# Ensure base directory exists
|
| 69 |
+
os.makedirs(base_dir, exist_ok=True)
|
| 70 |
+
|
| 71 |
+
# Create session-specific directory
|
| 72 |
+
session_dir = os.path.join(base_dir, f"{repo_name}_{session_id}")
|
| 73 |
+
|
| 74 |
+
try:
|
| 75 |
+
# Clone with depth=1 for faster cloning (only latest commit)
|
| 76 |
+
result = subprocess.run(
|
| 77 |
+
["git", "clone", "--depth", "1", github_url, session_dir],
|
| 78 |
+
capture_output=True,
|
| 79 |
+
text=True,
|
| 80 |
+
timeout=120 # 2 minute timeout
|
| 81 |
+
)
|
| 82 |
+
|
| 83 |
+
if result.returncode != 0:
|
| 84 |
+
error_msg = result.stderr or "Unknown error during clone"
|
| 85 |
+
# Clean up failed clone
|
| 86 |
+
if os.path.exists(session_dir):
|
| 87 |
+
shutil.rmtree(session_dir, ignore_errors=True)
|
| 88 |
+
return False, f"Clone failed: {error_msg}", repo_name
|
| 89 |
+
|
| 90 |
+
return True, session_dir, repo_name
|
| 91 |
+
|
| 92 |
+
except subprocess.TimeoutExpired:
|
| 93 |
+
# Clean up on timeout
|
| 94 |
+
if os.path.exists(session_dir):
|
| 95 |
+
shutil.rmtree(session_dir, ignore_errors=True)
|
| 96 |
+
return False, "Clone timed out (repository may be too large)", repo_name
|
| 97 |
+
|
| 98 |
+
except Exception as e:
|
| 99 |
+
# Clean up on any error
|
| 100 |
+
if os.path.exists(session_dir):
|
| 101 |
+
shutil.rmtree(session_dir, ignore_errors=True)
|
| 102 |
+
return False, f"Clone error: {str(e)}", repo_name
|
| 103 |
+
|
| 104 |
+
|
| 105 |
+
def cleanup_repository(repo_path: str) -> bool:
|
| 106 |
+
"""
|
| 107 |
+
Clean up a cloned repository.
|
| 108 |
+
|
| 109 |
+
Args:
|
| 110 |
+
repo_path: Path to the cloned repository
|
| 111 |
+
|
| 112 |
+
Returns:
|
| 113 |
+
True if cleanup successful, False otherwise
|
| 114 |
+
"""
|
| 115 |
+
try:
|
| 116 |
+
if os.path.exists(repo_path):
|
| 117 |
+
shutil.rmtree(repo_path)
|
| 118 |
+
return True
|
| 119 |
+
except Exception:
|
| 120 |
+
return False
|
| 121 |
+
|
| 122 |
+
|
| 123 |
+
def get_repo_info(repo_path: str) -> dict:
|
| 124 |
+
"""
|
| 125 |
+
Get basic information about a cloned repository.
|
| 126 |
+
|
| 127 |
+
Args:
|
| 128 |
+
repo_path: Path to the cloned repository
|
| 129 |
+
|
| 130 |
+
Returns:
|
| 131 |
+
Dictionary with repo info
|
| 132 |
+
"""
|
| 133 |
+
info = {
|
| 134 |
+
"path": repo_path,
|
| 135 |
+
"name": os.path.basename(repo_path).split('_')[0], # Remove session ID
|
| 136 |
+
"files": [],
|
| 137 |
+
"total_files": 0,
|
| 138 |
+
"languages": set()
|
| 139 |
+
}
|
| 140 |
+
|
| 141 |
+
# File extension to language mapping
|
| 142 |
+
ext_to_lang = {
|
| 143 |
+
'.py': 'Python',
|
| 144 |
+
'.js': 'JavaScript',
|
| 145 |
+
'.ts': 'TypeScript',
|
| 146 |
+
'.tsx': 'TypeScript',
|
| 147 |
+
'.jsx': 'JavaScript',
|
| 148 |
+
'.java': 'Java',
|
| 149 |
+
'.go': 'Go',
|
| 150 |
+
'.rs': 'Rust',
|
| 151 |
+
'.cpp': 'C++',
|
| 152 |
+
'.c': 'C',
|
| 153 |
+
'.h': 'C/C++',
|
| 154 |
+
'.rb': 'Ruby',
|
| 155 |
+
'.php': 'PHP',
|
| 156 |
+
'.swift': 'Swift',
|
| 157 |
+
'.kt': 'Kotlin',
|
| 158 |
+
'.cs': 'C#',
|
| 159 |
+
'.html': 'HTML',
|
| 160 |
+
'.css': 'CSS',
|
| 161 |
+
'.scss': 'SCSS',
|
| 162 |
+
'.md': 'Markdown',
|
| 163 |
+
'.json': 'JSON',
|
| 164 |
+
'.yaml': 'YAML',
|
| 165 |
+
'.yml': 'YAML',
|
| 166 |
+
}
|
| 167 |
+
|
| 168 |
+
# Walk the repository
|
| 169 |
+
for root, dirs, files in os.walk(repo_path):
|
| 170 |
+
# Skip hidden directories and common non-code directories
|
| 171 |
+
dirs[:] = [d for d in dirs if not d.startswith('.') and d not in ['node_modules', 'venv', '__pycache__', 'dist', 'build']]
|
| 172 |
+
|
| 173 |
+
for file in files:
|
| 174 |
+
if not file.startswith('.'):
|
| 175 |
+
info["total_files"] += 1
|
| 176 |
+
ext = os.path.splitext(file)[1].lower()
|
| 177 |
+
if ext in ext_to_lang:
|
| 178 |
+
info["languages"].add(ext_to_lang[ext])
|
| 179 |
+
|
| 180 |
+
# Store relative path
|
| 181 |
+
rel_path = os.path.relpath(os.path.join(root, file), repo_path)
|
| 182 |
+
info["files"].append(rel_path)
|
| 183 |
+
|
| 184 |
+
info["languages"] = list(info["languages"])
|
| 185 |
+
|
| 186 |
+
return info
|
| 187 |
+
|
| 188 |
+
|
| 189 |
+
def validate_github_url(url: str) -> Tuple[bool, str]:
|
| 190 |
+
"""
|
| 191 |
+
Validate that a URL is a valid public GitHub repository.
|
| 192 |
+
|
| 193 |
+
Args:
|
| 194 |
+
url: The URL to validate
|
| 195 |
+
|
| 196 |
+
Returns:
|
| 197 |
+
Tuple of (is_valid: bool, message: str)
|
| 198 |
+
"""
|
| 199 |
+
if not url:
|
| 200 |
+
return False, "No URL provided"
|
| 201 |
+
|
| 202 |
+
# Check if it's a GitHub URL
|
| 203 |
+
if 'github.com' not in url.lower():
|
| 204 |
+
return False, "Not a GitHub URL"
|
| 205 |
+
|
| 206 |
+
# Extract and validate format
|
| 207 |
+
extracted = extract_github_url(url)
|
| 208 |
+
if not extracted:
|
| 209 |
+
return False, "Invalid GitHub URL format. Expected: github.com/user/repo"
|
| 210 |
+
|
| 211 |
+
return True, extracted
|
codepilot/tools/registry.py
CHANGED
|
@@ -5,6 +5,7 @@ Maps tool names to their implementations and schemas
|
|
| 5 |
|
| 6 |
import os
|
| 7 |
from codepilot.tools.file_tools import read_file, write_file, run_command, search_code, list_files, git_status
|
|
|
|
| 8 |
from codepilot.sandbox.sandbox_tools import (
|
| 9 |
create_sandbox,
|
| 10 |
close_sandbox,
|
|
@@ -14,29 +15,120 @@ from codepilot.sandbox.sandbox_tools import (
|
|
| 14 |
)
|
| 15 |
from typing import Callable, List, Dict, Optional
|
| 16 |
|
| 17 |
-
#
|
| 18 |
-
|
| 19 |
-
_IS_PRODUCTION = os.getenv('RENDER_SERVICE_NAME') or os.getenv('RENDER') or os.getenv('SPACE_ID') or os.getenv('PORT')
|
| 20 |
|
| 21 |
-
#
|
| 22 |
-
|
| 23 |
-
from codepilot.tools.context_tools import search_codebase, index_codebase
|
| 24 |
-
else:
|
| 25 |
-
# Provide stub functions for production to avoid import errors
|
| 26 |
-
def search_codebase(query: str, top_k: int = 5) -> str:
|
| 27 |
-
return "⚠️ Codebase search is disabled in cloud mode (resource constraints)"
|
| 28 |
|
| 29 |
-
|
| 30 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 31 |
|
| 32 |
|
| 33 |
-
# Tool schemas for
|
| 34 |
TOOLS = [
|
| 35 |
{
|
| 36 |
"type": "function",
|
| 37 |
"function": {
|
| 38 |
"name": "read_file",
|
| 39 |
-
"description": "
|
| 40 |
"parameters": {
|
| 41 |
"type": "object",
|
| 42 |
"properties": {
|
|
@@ -225,7 +317,67 @@ TOOLS = [
|
|
| 225 |
"required": ["code"]
|
| 226 |
}
|
| 227 |
}
|
| 228 |
-
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 229 |
]
|
| 230 |
|
| 231 |
|
|
@@ -241,7 +393,10 @@ TOOL_FUNCTIONS = {
|
|
| 241 |
"index_codebase": index_codebase,
|
| 242 |
"upload_to_sandbox": upload_to_sandbox,
|
| 243 |
"execute_in_sandbox": execute_in_sandbox,
|
| 244 |
-
"run_command_in_sandbox": run_command_in_sandbox
|
|
|
|
|
|
|
|
|
|
| 245 |
}
|
| 246 |
|
| 247 |
|
|
@@ -250,7 +405,7 @@ def get_tools() -> List[Dict]:
|
|
| 250 |
Get all available tool schemas
|
| 251 |
|
| 252 |
Returns:
|
| 253 |
-
List of tool schema dictionaries for
|
| 254 |
"""
|
| 255 |
return TOOLS
|
| 256 |
|
|
|
|
| 5 |
|
| 6 |
import os
|
| 7 |
from codepilot.tools.file_tools import read_file, write_file, run_command, search_code, list_files, git_status
|
| 8 |
+
from codepilot.context.parser import CodeParser
|
| 9 |
from codepilot.sandbox.sandbox_tools import (
|
| 10 |
create_sandbox,
|
| 11 |
close_sandbox,
|
|
|
|
| 15 |
)
|
| 16 |
from typing import Callable, List, Dict, Optional
|
| 17 |
|
| 18 |
+
# Full context tools with BM25 + embeddings (requires 16GB+ RAM)
|
| 19 |
+
from codepilot.tools.context_tools import search_codebase, index_codebase, search_repository
|
|
|
|
| 20 |
|
| 21 |
+
# Initialize parser for file outline tools
|
| 22 |
+
_parser = CodeParser()
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 23 |
|
| 24 |
+
|
| 25 |
+
def get_file_outline(path: str) -> str:
|
| 26 |
+
"""
|
| 27 |
+
Get the structure/outline of a Python file without reading full contents.
|
| 28 |
+
Returns classes, functions, methods with their signatures and docstrings.
|
| 29 |
+
Much more token-efficient than read_file for understanding file structure.
|
| 30 |
+
|
| 31 |
+
Args:
|
| 32 |
+
path: Path to the Python file
|
| 33 |
+
|
| 34 |
+
Returns:
|
| 35 |
+
Formatted outline showing file structure
|
| 36 |
+
"""
|
| 37 |
+
result = _parser.parse_file(path)
|
| 38 |
+
|
| 39 |
+
if result.get('parse_errors'):
|
| 40 |
+
return f"Error: {result['parse_errors'][0]}"
|
| 41 |
+
|
| 42 |
+
lines = [f"# {path} ({result.get('total_lines', 0)} lines)\n"]
|
| 43 |
+
|
| 44 |
+
# Show imports summary
|
| 45 |
+
imports = result.get('imports', [])
|
| 46 |
+
if imports:
|
| 47 |
+
modules = set()
|
| 48 |
+
for imp in imports:
|
| 49 |
+
if imp['type'] == 'import':
|
| 50 |
+
modules.add(imp['name'].split('.')[0])
|
| 51 |
+
else:
|
| 52 |
+
mod = imp.get('module', '')
|
| 53 |
+
if mod:
|
| 54 |
+
modules.add(mod.split('.')[0])
|
| 55 |
+
lines.append(f"Imports: {', '.join(sorted(modules)[:10])}")
|
| 56 |
+
if len(modules) > 10:
|
| 57 |
+
lines.append(f" ... and {len(modules) - 10} more")
|
| 58 |
+
lines.append("")
|
| 59 |
+
|
| 60 |
+
# Show classes with methods
|
| 61 |
+
for cls in result.get('classes', []):
|
| 62 |
+
bases = f"({', '.join(cls['bases'])})" if cls['bases'] else ""
|
| 63 |
+
decorators = '\n'.join(f"@{d}" for d in cls.get('decorators', []))
|
| 64 |
+
if decorators:
|
| 65 |
+
lines.append(decorators)
|
| 66 |
+
lines.append(f"class {cls['name']}{bases}: # lines {cls['start_line']}-{cls['end_line']}")
|
| 67 |
+
if cls.get('docstring'):
|
| 68 |
+
# Truncate long docstrings
|
| 69 |
+
doc = cls['docstring'][:150].replace('\n', ' ')
|
| 70 |
+
if len(cls['docstring']) > 150:
|
| 71 |
+
doc += "..."
|
| 72 |
+
lines.append(f' """{doc}"""')
|
| 73 |
+
for method in cls.get('methods', []):
|
| 74 |
+
async_prefix = "async " if method.get('is_async') else ""
|
| 75 |
+
lines.append(f" {async_prefix}def {method['name']}() # line {method['line']}")
|
| 76 |
+
lines.append("")
|
| 77 |
+
|
| 78 |
+
# Show standalone functions
|
| 79 |
+
standalone_funcs = [f for f in result.get('functions', [])
|
| 80 |
+
if not any(f['start_line'] >= c['start_line'] and f['end_line'] <= c['end_line']
|
| 81 |
+
for c in result.get('classes', []))]
|
| 82 |
+
|
| 83 |
+
for func in standalone_funcs:
|
| 84 |
+
params = ', '.join(func.get('parameters', []))
|
| 85 |
+
async_prefix = "async " if func.get('is_async') else ""
|
| 86 |
+
decorators = '\n'.join(f"@{d}" for d in func.get('decorators', []))
|
| 87 |
+
if decorators:
|
| 88 |
+
lines.append(decorators)
|
| 89 |
+
lines.append(f"{async_prefix}def {func['name']}({params}): # lines {func['start_line']}-{func['end_line']}")
|
| 90 |
+
if func.get('docstring'):
|
| 91 |
+
doc = func['docstring'][:100].replace('\n', ' ')
|
| 92 |
+
if len(func['docstring']) > 100:
|
| 93 |
+
doc += "..."
|
| 94 |
+
lines.append(f' """{doc}"""')
|
| 95 |
+
lines.append("")
|
| 96 |
+
|
| 97 |
+
# Show globals
|
| 98 |
+
globals_list = result.get('globals', [])
|
| 99 |
+
if globals_list:
|
| 100 |
+
lines.append("# Global variables:")
|
| 101 |
+
for g in globals_list[:10]:
|
| 102 |
+
lines.append(f" {g['name']}: {g.get('type', 'unknown')} # line {g['line']}")
|
| 103 |
+
if len(globals_list) > 10:
|
| 104 |
+
lines.append(f" ... and {len(globals_list) - 10} more")
|
| 105 |
+
|
| 106 |
+
return '\n'.join(lines)
|
| 107 |
+
|
| 108 |
+
|
| 109 |
+
def get_code_chunk(path: str, name: str) -> str:
|
| 110 |
+
"""
|
| 111 |
+
Extract a specific function or class from a file by name.
|
| 112 |
+
Use this when you need to see the implementation of a specific function/class
|
| 113 |
+
after using get_file_outline to identify what you need.
|
| 114 |
+
|
| 115 |
+
Args:
|
| 116 |
+
path: Path to the Python file
|
| 117 |
+
name: Name of the function or class to extract
|
| 118 |
+
|
| 119 |
+
Returns:
|
| 120 |
+
The complete code for the specified function/class with relevant imports
|
| 121 |
+
"""
|
| 122 |
+
return _parser.extract_code_chunk(path, name)
|
| 123 |
|
| 124 |
|
| 125 |
+
# Tool schemas for LLM function calling (compatible with Claude and OpenAI)
|
| 126 |
TOOLS = [
|
| 127 |
{
|
| 128 |
"type": "function",
|
| 129 |
"function": {
|
| 130 |
"name": "read_file",
|
| 131 |
+
"description": "WARNING: This reads the ENTIRE file which wastes tokens! PREFER get_file_outline (for structure) or get_code_chunk (for specific function/class) instead. Only use read_file when you absolutely need the complete file contents.",
|
| 132 |
"parameters": {
|
| 133 |
"type": "object",
|
| 134 |
"properties": {
|
|
|
|
| 317 |
"required": ["code"]
|
| 318 |
}
|
| 319 |
}
|
| 320 |
+
},
|
| 321 |
+
{
|
| 322 |
+
"type": "function",
|
| 323 |
+
"function": {
|
| 324 |
+
"name": "search_repository",
|
| 325 |
+
"description": "Search the cloned GitHub repository using hybrid retrieval (BM25 + semantic embeddings). Use this to find functions, classes, or code patterns in the cloned repo. More powerful than search_code - finds both exact matches AND semantically related code.",
|
| 326 |
+
"parameters": {
|
| 327 |
+
"type": "object",
|
| 328 |
+
"properties": {
|
| 329 |
+
"query": {
|
| 330 |
+
"type": "string",
|
| 331 |
+
"description": "What to search for. Can be natural language (e.g., 'authentication logic', 'error handling') or specific terms (e.g., 'Flask class', 'route decorator')"
|
| 332 |
+
},
|
| 333 |
+
"top_k": {
|
| 334 |
+
"type": "integer",
|
| 335 |
+
"description": "Number of results to return (default: 5, max: 20)",
|
| 336 |
+
"default": 5
|
| 337 |
+
}
|
| 338 |
+
},
|
| 339 |
+
"required": ["query"]
|
| 340 |
+
}
|
| 341 |
+
}
|
| 342 |
+
},
|
| 343 |
+
{
|
| 344 |
+
"type": "function",
|
| 345 |
+
"function": {
|
| 346 |
+
"name": "get_file_outline",
|
| 347 |
+
"description": "Get the structure/outline of a Python file WITHOUT reading full contents. Returns classes, functions, methods with signatures and docstrings. Use this FIRST to understand a file's structure before using read_file or get_code_chunk. Much more token-efficient than read_file (~50 tokens vs ~2000 tokens for a typical file).",
|
| 348 |
+
"parameters": {
|
| 349 |
+
"type": "object",
|
| 350 |
+
"properties": {
|
| 351 |
+
"path": {
|
| 352 |
+
"type": "string",
|
| 353 |
+
"description": "Path to the Python file to outline"
|
| 354 |
+
}
|
| 355 |
+
},
|
| 356 |
+
"required": ["path"]
|
| 357 |
+
}
|
| 358 |
+
}
|
| 359 |
+
},
|
| 360 |
+
{
|
| 361 |
+
"type": "function",
|
| 362 |
+
"function": {
|
| 363 |
+
"name": "get_code_chunk",
|
| 364 |
+
"description": "Extract a specific function or class from a file by name. Use this after get_file_outline to read just the code you need instead of the entire file. Returns the complete implementation with relevant imports.",
|
| 365 |
+
"parameters": {
|
| 366 |
+
"type": "object",
|
| 367 |
+
"properties": {
|
| 368 |
+
"path": {
|
| 369 |
+
"type": "string",
|
| 370 |
+
"description": "Path to the Python file"
|
| 371 |
+
},
|
| 372 |
+
"name": {
|
| 373 |
+
"type": "string",
|
| 374 |
+
"description": "Name of the function or class to extract (e.g., 'MyClass', 'my_function')"
|
| 375 |
+
}
|
| 376 |
+
},
|
| 377 |
+
"required": ["path", "name"]
|
| 378 |
+
}
|
| 379 |
+
}
|
| 380 |
+
},
|
| 381 |
]
|
| 382 |
|
| 383 |
|
|
|
|
| 393 |
"index_codebase": index_codebase,
|
| 394 |
"upload_to_sandbox": upload_to_sandbox,
|
| 395 |
"execute_in_sandbox": execute_in_sandbox,
|
| 396 |
+
"run_command_in_sandbox": run_command_in_sandbox,
|
| 397 |
+
"search_repository": search_repository,
|
| 398 |
+
"get_file_outline": get_file_outline,
|
| 399 |
+
"get_code_chunk": get_code_chunk
|
| 400 |
}
|
| 401 |
|
| 402 |
|
|
|
|
| 405 |
Get all available tool schemas
|
| 406 |
|
| 407 |
Returns:
|
| 408 |
+
List of tool schema dictionaries for LLM function calling
|
| 409 |
"""
|
| 410 |
return TOOLS
|
| 411 |
|
requirements.txt
CHANGED
|
@@ -1,8 +1,9 @@
|
|
| 1 |
-
#
|
| 2 |
-
#
|
| 3 |
|
| 4 |
# Core
|
| 5 |
openai>=1.0.0
|
|
|
|
| 6 |
python-dotenv>=1.2.0
|
| 7 |
|
| 8 |
# E2B Sandbox
|
|
@@ -12,11 +13,16 @@ e2b-code-interpreter>=2.4.0
|
|
| 12 |
langchain>=0.3.0
|
| 13 |
langgraph>=0.2.0
|
| 14 |
|
| 15 |
-
#
|
| 16 |
rank-bm25>=0.2.2
|
|
|
|
|
|
|
| 17 |
|
| 18 |
# Chainlit UI
|
| 19 |
chainlit>=1.0.0
|
| 20 |
|
| 21 |
# For dependency graphs
|
| 22 |
networkx>=3.0
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Full deployment requirements with embeddings support
|
| 2 |
+
# For HuggingFace Spaces with 16GB+ RAM
|
| 3 |
|
| 4 |
# Core
|
| 5 |
openai>=1.0.0
|
| 6 |
+
anthropic>=0.25.0
|
| 7 |
python-dotenv>=1.2.0
|
| 8 |
|
| 9 |
# E2B Sandbox
|
|
|
|
| 13 |
langchain>=0.3.0
|
| 14 |
langgraph>=0.2.0
|
| 15 |
|
| 16 |
+
# Search - BM25 + Embeddings
|
| 17 |
rank-bm25>=0.2.2
|
| 18 |
+
sentence-transformers>=2.2.0
|
| 19 |
+
chromadb>=0.4.0
|
| 20 |
|
| 21 |
# Chainlit UI
|
| 22 |
chainlit>=1.0.0
|
| 23 |
|
| 24 |
# For dependency graphs
|
| 25 |
networkx>=3.0
|
| 26 |
+
|
| 27 |
+
# Tree-sitter for AST parsing
|
| 28 |
+
tree-sitter>=0.20.0
|