ayushm98 commited on
Commit
45bf590
·
1 Parent(s): 561b52e

v3.2: Complete multi-agent workflow with Explorer-first architecture

Browse files

Features:
- ExplorerAgent for token-efficient codebase exploration
- Task classifier (explore vs code queries)
- Claude Sonnet 4.5 integration (200K context)
- Token-efficient tools (get_file_outline, get_code_chunk)
- Optimized workflow: Explorer → Planner → Coder → Reviewer
- Version tracking for debugging deployments

Architecture:
- Explorer does all searching (BM25 + embeddings)
- Planner creates plans (no tools, pure LLM)
- Coder implements (no search, uses exploration context)
- Reviewer validates (read-only tools)

.gitignore ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Environment
2
+ .env
3
+ .env.local
4
+
5
+ # Python
6
+ __pycache__/
7
+ *.py[cod]
8
+ *$py.class
9
+ *.so
10
+ .Python
11
+ venv/
12
+ env/
13
+ .venv/
14
+
15
+ # IDE
16
+ .idea/
17
+ .vscode/
18
+ *.swp
19
+ *.swo
20
+
21
+ # OS
22
+ .DS_Store
23
+ Thumbs.db
24
+
25
+ # Project specific
26
+ .codepilot_cache/
27
+ .chainlit/
28
+
29
+ # Claude Code
30
+ .claude/
31
+
32
+ # Test files
33
+ manual_tests/
34
+
35
+ # Logs
36
+ *.log
CLAUDE.md ADDED
@@ -0,0 +1,171 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # CLAUDE.md
2
+
3
+ This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4
+
5
+ ## Project Overview
6
+
7
+ **CodePilot** - An autonomous AI coding agent that takes GitHub issues, understands codebases, writes code in sandboxed environments, and creates pull requests autonomously.
8
+
9
+ **Tech Stack:** Python 3.11+, Claude Sonnet 4.5 (Anthropic API), E2B sandboxed execution, LangChain/LangGraph
10
+ **Development Timeline:** 24-week phased implementation (currently in Phase 5: Chainlit UI - COMPLETE)
11
+
12
+ ## Architecture
13
+
14
+ This project follows a layered architecture (planned - see devon-project-plan.md for full roadmap):
15
+
16
+ ```
17
+ Multi-Agent System (Planner, Coder, Reviewer)
18
+
19
+ Context Engine (Hybrid Retrieval, AST-aware chunking)
20
+
21
+ Tool Layer (read_file, write_file, run_command, search_code)
22
+
23
+ E2B Sandbox (Isolated code execution)
24
+ ```
25
+
26
+ ## Current Implementation Status
27
+
28
+ **✅ COMPLETED PHASES:**
29
+
30
+ **Phase 1: Foundation (Weeks 1-3)**
31
+ - ✅ LLM client wrapper (`codepilot/llm/claude_client.py`) - Claude API with tool calling
32
+ - ✅ Tool registry (`codepilot/tools/registry.py`) - Function calling infrastructure
33
+ - ✅ Base agent (`codepilot/agents/base_agent.py`) - Core ReAct loop
34
+ - ✅ Core tools: `read_file`, `write_file`, `run_command`, `search_codebase`, `list_files`
35
+
36
+ **Phase 2: Context Engineering (Weeks 4-8)**
37
+ - ✅ BM25 keyword search (`codepilot/context/bm25_search.py`)
38
+ - ✅ Dense embeddings (`codepilot/context/embeddings.py`) - sentence-transformers
39
+ - ✅ Hybrid retrieval (`codepilot/context/retrieval.py`) - Combined BM25 + semantic search
40
+ - ✅ Code parser (`codepilot/context/parser.py`) - AST-aware chunking
41
+ - ✅ Codebase indexer (`codepilot/context/indexer.py`) - Full codebase indexing
42
+ - ✅ Context selector (`codepilot/context/selector.py`) - Smart context selection
43
+ - ✅ Context tools: `index_codebase`, `search_codebase`, `get_relevant_context`
44
+
45
+ **Phase 3: Multi-Agent Architecture (Weeks 9-12)**
46
+ - ✅ Planner agent (`codepilot/agents/planner_agent.py`) - Creates implementation plans
47
+ - ✅ Coder agent (`codepilot/agents/coder_agent.py`) - Writes and tests code
48
+ - ✅ Reviewer agent (`codepilot/agents/reviewer_agent.py`) - Code review and approval
49
+ - ✅ Orchestrator (`codepilot/agents/orchestrator.py`) - State machine coordination
50
+
51
+ **Phase 4: E2B Sandbox Integration (Weeks 13-14)**
52
+ - ✅ E2B sandbox manager (`codepilot/sandbox/e2b_sandbox.py`) - Isolated execution
53
+ - ✅ Sandbox tools (`codepilot/sandbox/sandbox_tools.py`) - upload, execute, run commands
54
+ - ✅ Integration with Coder agent - Automatic sandbox testing workflow
55
+
56
+ **Phase 5: Chainlit UI (Weeks 15-16)**
57
+ - ✅ Chainlit application (`chainlit_app.py`) - Interactive chat interface
58
+ - ✅ Real-time workflow visualization with Chainlit Steps
59
+ - ✅ Detailed agent progress tracking (Planner → Coder → Reviewer)
60
+ - ✅ Code preview and test results display
61
+ - ✅ User guide (`CHAINLIT_GUIDE.md`)
62
+
63
+ **NEXT PHASES:**
64
+
65
+ **Phase 6: GitHub Integration (Weeks 17-18)** - Not started
66
+ - GitHub webhooks for issue tracking
67
+ - Automated PR creation
68
+ - Branch management
69
+
70
+ **Phase 7: Evals & Benchmarks (Weeks 19-21)** - Not started
71
+ - SWE-bench evaluation
72
+ - Custom test suite
73
+
74
+ **Phase 8: Production Hardening (Weeks 22-24)** - Not started
75
+ - Error handling and retries
76
+ - Logging and monitoring
77
+ - Deployment configuration
78
+
79
+ ## Development Commands
80
+
81
+ **Setup:**
82
+ ```bash
83
+ python -m venv venv
84
+ source venv/bin/activate # On Windows: venv\Scripts\activate
85
+ pip install -r requirements.txt
86
+ ```
87
+
88
+ **Verify setup:**
89
+ ```bash
90
+ python test_setup.py # Checks that API keys are loaded correctly
91
+ ```
92
+
93
+ **Run Chainlit UI (Phase 5):**
94
+ ```bash
95
+ chainlit run chainlit_app.py
96
+ # Opens at http://localhost:8000
97
+ # See CHAINLIT_GUIDE.md for full usage guide
98
+ ```
99
+
100
+ **Test individual phases:**
101
+ ```bash
102
+ # Phase 2: Context Engineering
103
+ python test_context.py
104
+
105
+ # Phase 3: Multi-Agent Workflow
106
+ python test_multi_agent.py
107
+
108
+ # Phase 4: E2B Sandbox
109
+ python test_sandbox.py
110
+ python test_workflow_with_sandbox.py
111
+ ```
112
+
113
+ **Environment variables required in .env:**
114
+ ```
115
+ ANTHROPIC_API_KEY=sk-ant-...
116
+ E2B_API_KEY=e2b_...
117
+ ```
118
+
119
+ ## Project Phases (from devon-project-plan.md)
120
+
121
+ 1. **Phase 1 (Weeks 1-3):** Foundation - Basic agent loop, tool calling, LLM abstraction
122
+ 2. **Phase 2 (Weeks 4-8):** Context Engineering - Hybrid retrieval (BM25 + dense), AST-aware chunking
123
+ 3. **Phase 3 (Weeks 9-12):** Multi-Agent Architecture - Orchestrator with specialized agents
124
+ 4. **Phase 4 (Weeks 13-14):** E2B Sandbox Integration
125
+ 5. **Phase 5 (Weeks 15-16):** Chainlit UI
126
+ 6. **Phase 6 (Weeks 17-18):** GitHub Integration (webhooks, PRs)
127
+ 7. **Phase 7 (Weeks 19-21):** Evals & Benchmarks (SWE-bench)
128
+ 8. **Phase 8 (Weeks 22-24):** Production Hardening
129
+
130
+ ## Key Design Principles
131
+
132
+ **From the project plan:**
133
+ - **Focus on Context Engineering:** This is the differentiator, not UI/UX
134
+ - **ReAct Pattern:** Reason about what to do, Act with tools, observe results, repeat
135
+ - **AST-Aware Processing:** Parse code structurally, not as text (tree-sitter for multi-language support)
136
+ - **Hybrid Retrieval:** Combine BM25 (exact matches) + dense embeddings (semantic search)
137
+ - **Sandboxed Execution:** All code runs in E2B containers, never on host
138
+ - **Multi-Agent Orchestration:** Specialized agents (Planner, Coder, Reviewer) coordinated by orchestrator
139
+
140
+ ## Tool Schema Format
141
+
142
+ Tools follow Claude/Anthropic function calling format:
143
+ ```python
144
+ {
145
+ "type": "function",
146
+ "function": {
147
+ "name": "tool_name",
148
+ "description": "Clear description for LLM to understand when to use",
149
+ "parameters": {
150
+ "type": "object",
151
+ "properties": {...},
152
+ "required": [...]
153
+ }
154
+ }
155
+ }
156
+ ```
157
+
158
+ ## Implementation Notes
159
+
160
+ - All tool functions return formatted strings (success messages or errors)
161
+ - `write_file` auto-creates parent directories if needed
162
+ - `run_command` has 30-second timeout to prevent hanging
163
+ - Error handling uses specific exceptions (FileNotFoundError, PermissionError) before generic fallback
164
+
165
+ ## Important Files
166
+
167
+ - `devon-project-plan.md` - Complete 24-week implementation roadmap with architectural details
168
+ - `codepilot/llm/claude_client.py` - Claude API wrapper with tool calling
169
+ - `codepilot/agents/orchestrator.py` - Multi-agent state machine
170
+ - `requirements.txt` - Python dependencies (anthropic, e2b-code-interpreter, langchain, langgraph)
171
+ - `.env` - API keys (not committed, in .gitignore)
Dockerfile CHANGED
@@ -1,4 +1,5 @@
1
  # HuggingFace Spaces Dockerfile for CodePilot
 
2
  FROM python:3.11-slim
3
 
4
  # Set working directory
@@ -19,7 +20,7 @@ ENV HOME=/home/user \
19
  WORKDIR $HOME/app
20
 
21
  # Copy requirements first (for better caching)
22
- COPY --chown=user requirements-cloud.txt ./requirements.txt
23
 
24
  # Install Python dependencies
25
  RUN pip install --no-cache-dir --upgrade pip && \
 
1
  # HuggingFace Spaces Dockerfile for CodePilot
2
+ # BUILD_VERSION: 7 (v3.2 coder no list_files)
3
  FROM python:3.11-slim
4
 
5
  # Set working directory
 
20
  WORKDIR $HOME/app
21
 
22
  # Copy requirements first (for better caching)
23
+ COPY --chown=user requirements.txt ./requirements.txt
24
 
25
  # Install Python dependencies
26
  RUN pip install --no-cache-dir --upgrade pip && \
README.md CHANGED
@@ -56,7 +56,7 @@ User Request
56
  ## Tech Stack
57
 
58
  - **Python** - Core language
59
- - **OpenAI GPT-4** - LLM for agent reasoning
60
  - **LangChain/LangGraph** - Agent orchestration
61
  - **E2B** - Sandboxed code execution
62
  - **Chainlit** - Chat UI
@@ -65,7 +65,7 @@ User Request
65
 
66
  | Variable | Description |
67
  |----------|-------------|
68
- | `OPENAI_API_KEY` | Your OpenAI API key |
69
  | `E2B_API_KEY` | Your E2B sandbox API key |
70
 
71
  ## License
 
56
  ## Tech Stack
57
 
58
  - **Python** - Core language
59
+ - **Claude Sonnet 4.5** - LLM for agent reasoning (Anthropic API)
60
  - **LangChain/LangGraph** - Agent orchestration
61
  - **E2B** - Sandboxed code execution
62
  - **Chainlit** - Chat UI
 
65
 
66
  | Variable | Description |
67
  |----------|-------------|
68
+ | `ANTHROPIC_API_KEY` | Your Anthropic API key |
69
  | `E2B_API_KEY` | Your E2B sandbox API key |
70
 
71
  ## License
chainlit_app.py CHANGED
@@ -17,16 +17,34 @@ from contextlib import redirect_stdout, redirect_stderr
17
  import asyncio
18
  from concurrent.futures import ThreadPoolExecutor
19
 
20
- # Check if running in production BEFORE importing heavy dependencies
21
- # Detects: Render, HuggingFace Spaces, or any cloud with PORT env var
22
- IS_PRODUCTION = os.getenv('RENDER_SERVICE_NAME') or os.getenv('RENDER') or os.getenv('SPACE_ID') or os.getenv('PORT')
23
-
24
- # Only import heavy ML dependencies in local development
25
- if not IS_PRODUCTION:
26
- from codepilot.tools.context_tools import index_codebase
27
-
28
- # Import orchestrator (lighter weight)
29
- from codepilot.agents.orchestrator import Orchestrator
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30
 
31
 
32
  # Authentication disabled for now - uncomment to enable password protection
@@ -56,54 +74,39 @@ async def start():
56
  print("[CHAINLIT] on_chat_start triggered") # Debug log
57
 
58
  await cl.Message(
59
- content="# 🤖 CodePilot - Autonomous AI Coding Agent\n\n"
 
60
  "I can help you write code, fix bugs, and implement features!\n\n"
61
- "**How it works:**\n"
62
- "1. 🤔 **Planner** - Searches codebase and creates implementation plan\n"
63
- "2. 💻 **Coder** - Writes code locally, uploads to sandbox, runs tests\n"
64
- "3. 👁️ **Reviewer** - Reviews tested code and decides approval\n\n"
65
- "**What I can do:**\n"
66
- "- Write new functions and features\n"
67
- "- Fix bugs and add error handling\n"
68
- "- Create tests and verify code works\n"
69
- "- Search and understand your codebase\n\n"
70
- "**Ready!** What would you like me to build?"
71
  ).send()
72
 
73
  print("[CHAINLIT] Welcome message sent") # Debug log
74
 
75
- # Skip indexing on deployment to avoid startup issues (using module-level constant)
76
- if IS_PRODUCTION:
77
- print(f"[CHAINLIT] Running in production mode (PORT={os.getenv('PORT')}) - skipping codebase indexing")
78
- await cl.Message(content="ℹ️ Running in cloud mode - codebase indexing disabled").send()
79
- cl.user_session.set("orchestrator", Orchestrator(max_iterations=3))
80
- cl.user_session.set("ready", True)
81
- print("[CHAINLIT] Orchestrator created, ready=True")
82
- return
83
-
84
- # Index codebase in background (only in local development)
85
- index_msg = await cl.Message(content="🔍 Indexing codebase...").send()
86
-
87
- try:
88
- # Get project root
89
- project_root = os.path.dirname(os.path.abspath(__file__))
90
- index_result = index_codebase(project_root)
91
 
92
- # Update message content
93
- index_msg.content = f"✅ Codebase indexed!\n```\n{index_result}\n```"
94
- await index_msg.update()
 
 
95
 
96
- # Store orchestrator in session (reduced iterations to save API credits)
97
- cl.user_session.set("orchestrator", Orchestrator(max_iterations=3))
98
- cl.user_session.set("ready", True)
99
 
100
- except Exception as e:
101
- # Update message content
102
- index_msg.content = f"⚠️ Indexing failed (will continue anyway):\n```\n{str(e)}\n```"
103
- await index_msg.update()
104
- # Still create orchestrator even if indexing fails
105
- cl.user_session.set("orchestrator", Orchestrator(max_iterations=10))
106
- cl.user_session.set("ready", True)
 
107
 
108
 
109
  @cl.on_message
@@ -112,12 +115,112 @@ async def main(message: cl.Message):
112
 
113
  # Check if ready
114
  if not cl.user_session.get("ready"):
115
- await cl.Message(content="⚠️ System is still initializing, please wait...").send()
116
  return
117
 
118
  # Get orchestrator
119
  orchestrator: Orchestrator = cl.user_session.get("orchestrator")
120
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
121
  # Create a message for streaming logs
122
  log_msg = cl.Message(content="")
123
  await log_msg.send()
@@ -130,10 +233,10 @@ async def main(message: cl.Message):
130
  """Run orchestrator in thread and capture output."""
131
  try:
132
  with redirect_stdout(captured_output), redirect_stderr(captured_output):
133
- return orchestrator.run(message.content)
134
  except Exception as e:
135
  # Capture any exceptions from orchestrator
136
- print(f"Error in orchestrator: {str(e)}")
137
  import traceback
138
  traceback.print_exc()
139
  raise
@@ -165,10 +268,10 @@ async def main(message: cl.Message):
165
  filtered_lines = []
166
  for line in accumulated_logs.split('\n'):
167
  # Extract token usage before filtering (only count each line once!)
168
- if '📊 Tokens:' in line and line not in seen_token_lines:
169
  seen_token_lines.add(line) # Mark as counted
170
  try:
171
- # Parse: "📊 Tokens: 505 prompt + 20 completion = 525 total"
172
  parts = line.split('Tokens:')[1].strip()
173
  prompt = int(parts.split('prompt')[0].strip())
174
  completion = int(parts.split('+')[1].split('completion')[0].strip())
@@ -179,24 +282,25 @@ async def main(message: cl.Message):
179
  pass
180
 
181
  # Skip token counts, progress bars, and verbose details
182
- if any(skip in line for skip in ['📊 Tokens:', 'Batches:', '|##', 'it/s]']):
183
  continue
184
  # Keep important lines
185
  if any(keep in line for keep in [
186
- '[ORCHESTRATOR]', '[PLANNER]', '[CODER]', '[REVIEWER]',
187
- 'Calling tool:', 'Tool', 'Transitioning', 'APPROVED', 'REJECTED'
 
188
  ]):
189
  filtered_lines.append(line)
190
 
191
  filtered_output = '\n'.join(filtered_lines)
192
 
193
- # Calculate cost (GPT-3.5-turbo pricing: $0.0015/1K input, $0.002/1K output)
194
- input_cost = (total_prompt_tokens / 1000) * 0.0015
195
- output_cost = (total_completion_tokens / 1000) * 0.002
196
  total_cost = input_cost + output_cost
197
 
198
  # Add usage summary to logs
199
- usage_summary = f"\n\n💰 CREDITS USED:\n"
200
  usage_summary += f" Input: {total_prompt_tokens:,} tokens (${input_cost:.4f})\n"
201
  usage_summary += f" Output: {total_completion_tokens:,} tokens (${output_cost:.4f})\n"
202
  usage_summary += f" Total: {total_tokens:,} tokens (${total_cost:.4f})"
@@ -212,42 +316,42 @@ async def main(message: cl.Message):
212
  final_logs = captured_output.getvalue()
213
 
214
  # Update with final logs
215
- log_msg.content = f"## 📋 Execution Log\n```\n{final_logs}\n```"
216
  await log_msg.update()
217
 
218
  # Send results summary
219
  summary_lines = []
220
 
221
  if result.get('plan'):
222
- summary_lines.append("## 🤔 Planner")
223
- summary_lines.append(f"Plan created ({len(result['plan'])} chars)\n")
224
 
225
  if result.get('code_changes'):
226
- summary_lines.append("## 💻 Coder")
227
- summary_lines.append(f"Created {len(result['code_changes'])} file(s):")
228
  for file_path in result['code_changes'].keys():
229
  summary_lines.append(f" - {file_path}")
230
  summary_lines.append("")
231
 
232
  if result.get('review_feedback'):
233
- summary_lines.append("## 👁️ Reviewer")
234
  if result.get('success'):
235
- summary_lines.append("Code approved")
236
  else:
237
- summary_lines.append("⚠️ Needs revision")
238
  summary_lines.append("")
239
 
240
- summary_lines.append("## 🎯 Result")
241
  if result.get('success'):
242
- summary_lines.append(f"**Success** (Iterations: {result.get('iterations', 'N/A')})")
243
  else:
244
- summary_lines.append(f"⚠️ **Incomplete** (Iterations: {result.get('iterations', 'N/A')})")
245
 
246
- # Add final cost summary
247
- summary_lines.append("\n## 💰 API Credits Used (GPT-3.5-Turbo)")
248
  summary_lines.append(f"**Total Tokens:** {total_tokens:,}")
249
- summary_lines.append(f"- Input: {total_prompt_tokens:,} tokens (${(total_prompt_tokens/1000)*0.0015:.4f})")
250
- summary_lines.append(f"- Output: {total_completion_tokens:,} tokens (${(total_completion_tokens/1000)*0.002:.4f})")
251
  summary_lines.append(f"\n**Estimated Cost:** ${total_cost:.4f}")
252
 
253
  await cl.Message(content="\n".join(summary_lines)).send()
@@ -258,9 +362,9 @@ async def main(message: cl.Message):
258
  error_type = type(e).__name__
259
 
260
  if "rate_limit" in error_message.lower() or "429" in error_message:
261
- user_message = f"""## ⏱️ Rate Limit Reached
262
 
263
- OpenAI API rate limit exceeded. This happens when too many requests are made in a short time.
264
 
265
  **What to do:**
266
  - Wait a few minutes and try again
@@ -272,15 +376,15 @@ OpenAI API rate limit exceeded. This happens when too many requests are made in
272
  {error_message}
273
  ```
274
  """
275
- elif "insufficient_quota" in error_message.lower():
276
- user_message = f"""## 💳 API Credits Exhausted
277
 
278
- Your OpenAI API credits have been exhausted.
279
 
280
  **What to do:**
281
- - Add credits to your OpenAI account at https://platform.openai.com/account/billing
282
- - Check your usage at https://platform.openai.com/usage
283
- - Current model: GPT-3.5-turbo (~$0.02 per task)
284
 
285
  **Error details:**
286
  ```
@@ -288,13 +392,13 @@ Your OpenAI API credits have been exhausted.
288
  ```
289
  """
290
  elif "api_key" in error_message.lower() or "authentication" in error_message.lower():
291
- user_message = f"""## 🔑 API Key Error
292
 
293
- There's an issue with your OpenAI API key.
294
 
295
  **What to do:**
296
- - Verify your OPENAI_API_KEY in .env file
297
- - Check that the key is valid at https://platform.openai.com/api-keys
298
  - Restart the application after updating .env
299
 
300
  **Error details:**
@@ -303,7 +407,7 @@ There's an issue with your OpenAI API key.
303
  ```
304
  """
305
  elif "timeout" in error_message.lower():
306
- user_message = f"""## Request Timeout
307
 
308
  The operation took too long and timed out.
309
 
@@ -319,7 +423,7 @@ The operation took too long and timed out.
319
  """
320
  else:
321
  # Generic error with helpful context
322
- user_message = f"""## Error Occurred
323
 
324
  An unexpected error occurred during execution.
325
 
 
17
  import asyncio
18
  from concurrent.futures import ThreadPoolExecutor
19
 
20
+ # ============================================================
21
+ # STARTUP VERSION CHECK - Change this to detect if rebuild worked
22
+ # ============================================================
23
+ APP_VERSION = "3.2.0-coder-no-list"
24
+ BUILD_ID = "2024-12-19-v6"
25
+ print("=" * 60)
26
+ print(f"[STARTUP] CodePilot Chainlit App")
27
+ print(f"[STARTUP] APP_VERSION: {APP_VERSION}")
28
+ print(f"[STARTUP] BUILD_ID: {BUILD_ID}")
29
+ print("=" * 60)
30
+ # ============================================================
31
+
32
+ # Import full context tools (embeddings + BM25) - requires 16GB+ RAM
33
+ from codepilot.tools.context_tools import index_codebase
34
+
35
+ # Import orchestrator
36
+ from codepilot.agents.orchestrator import Orchestrator, ORCHESTRATOR_VERSION
37
+
38
+ # Print orchestrator version for debugging
39
+ print(f"[STARTUP] ORCHESTRATOR_VERSION: {ORCHESTRATOR_VERSION}")
40
+
41
+ # Import GitHub tools for repo cloning
42
+ from codepilot.tools.github_tools import (
43
+ extract_github_url,
44
+ clone_repository,
45
+ get_repo_info,
46
+ cleanup_repository
47
+ )
48
 
49
 
50
  # Authentication disabled for now - uncomment to enable password protection
 
74
  print("[CHAINLIT] on_chat_start triggered") # Debug log
75
 
76
  await cl.Message(
77
+ content=f"# CodePilot - Autonomous AI Coding Agent\n\n"
78
+ f"**Version:** `{APP_VERSION}` | **Build:** `{BUILD_ID}`\n\n"
79
  "I can help you write code, fix bugs, and implement features!\n\n"
80
+ "**How to use:**\n"
81
+ "1. Paste a **public GitHub URL** and I'll clone and analyze it\n"
82
+ "2. Tell me what you want to build or fix\n"
83
+ "3. Watch my agents (Planner > Coder > Reviewer) work!\n\n"
84
+ "**Example:**\n"
85
+ "```\nAnalyze https://github.com/user/repo and add error handling to the API endpoints\n```\n\n"
86
+ "**Ready!** Paste a GitHub URL or describe your task."
 
 
 
87
  ).send()
88
 
89
  print("[CHAINLIT] Welcome message sent") # Debug log
90
 
91
+ # Initialize session variables
92
+ cl.user_session.set("repo_path", None)
93
+ cl.user_session.set("repo_info", None)
 
 
 
 
 
 
 
 
 
 
 
 
 
94
 
95
+ # Skip self-indexing - agents will only work with cloned GitHub repos
96
+ # Create orchestrator and mark as ready
97
+ cl.user_session.set("orchestrator", Orchestrator(max_iterations=3))
98
+ cl.user_session.set("ready", True)
99
+ print("[CHAINLIT] Orchestrator created, ready for GitHub repos")
100
 
 
 
 
101
 
102
+ @cl.on_chat_end
103
+ async def end():
104
+ """Cleanup when chat ends."""
105
+ # Clean up any cloned repositories
106
+ repo_path = cl.user_session.get("repo_path")
107
+ if repo_path:
108
+ print(f"[CHAINLIT] Cleaning up repo: {repo_path}")
109
+ cleanup_repository(repo_path)
110
 
111
 
112
  @cl.on_message
 
115
 
116
  # Check if ready
117
  if not cl.user_session.get("ready"):
118
+ await cl.Message(content="System is still initializing, please wait...").send()
119
  return
120
 
121
  # Get orchestrator
122
  orchestrator: Orchestrator = cl.user_session.get("orchestrator")
123
 
124
+ # Check for GitHub URL in message
125
+ github_url = extract_github_url(message.content)
126
+ task_context = ""
127
+
128
+ if github_url:
129
+ # Clone the repository
130
+ clone_msg = await cl.Message(content=f"Cloning repository: `{github_url}`...").send()
131
+
132
+ success, result, repo_name = clone_repository(github_url)
133
+
134
+ if success:
135
+ repo_path = result
136
+ repo_info = get_repo_info(repo_path)
137
+
138
+ # Store in session
139
+ cl.user_session.set("repo_path", repo_path)
140
+ cl.user_session.set("repo_info", repo_info)
141
+
142
+ # Index the repository for search (full BM25 + embeddings)
143
+ try:
144
+ index_result = index_codebase(repo_path)
145
+ print(f"[CHAINLIT] Repository indexed: {index_result}")
146
+ except Exception as e:
147
+ print(f"[CHAINLIT] Indexing failed (non-critical): {e}")
148
+
149
+ # Create context for the task (limited to avoid token overflow)
150
+ languages = ", ".join(repo_info["languages"][:5]) if repo_info["languages"] else "Unknown"
151
+ # Only include first 20 files to keep context small
152
+ sample_files = repo_info["files"][:20] if repo_info["files"] else []
153
+ files_preview = "\n".join(f" - {f}" for f in sample_files)
154
+ if len(repo_info["files"]) > 20:
155
+ files_preview += f"\n ... and {len(repo_info['files']) - 20} more files"
156
+
157
+ task_context = f"""
158
+ [REPOSITORY CONTEXT]
159
+ Repository: {repo_name}
160
+ Path: {repo_path}
161
+ Total Files: {repo_info['total_files']}
162
+ Languages: {languages}
163
+ Sample Files:
164
+ {files_preview}
165
+
166
+ AVAILABLE TOOLS:
167
+ - search_repository: Search this cloned repository using BM25 keyword matching (use this to find functions, classes, or code patterns in the Flask repo)
168
+ - read_file: Read a specific file (use full path: {repo_path}/filename.py)
169
+ - search_code: Grep for exact pattern matches in the repository
170
+ """
171
+ # Update clone message
172
+ clone_msg.content = f"**Repository cloned successfully!**\n\n" \
173
+ f"- **Name:** {repo_name}\n" \
174
+ f"- **Files:** {repo_info['total_files']}\n" \
175
+ f"- **Languages:** {languages}\n" \
176
+ f"- **Path:** `{repo_path}`"
177
+ await clone_msg.update()
178
+
179
+ else:
180
+ # Clone failed
181
+ clone_msg.content = f"**Failed to clone repository**\n\n{result}\n\n" \
182
+ f"Make sure the repository is public and the URL is correct."
183
+ await clone_msg.update()
184
+ return
185
+
186
+ # Check if we have a repo from previous message
187
+ elif cl.user_session.get("repo_path"):
188
+ repo_path = cl.user_session.get("repo_path")
189
+ repo_info = cl.user_session.get("repo_info")
190
+ if repo_info:
191
+ languages = ", ".join(repo_info["languages"][:5]) if repo_info["languages"] else "Unknown"
192
+ task_context = f"""
193
+ [REPOSITORY CONTEXT]
194
+ Repository: {repo_info['name']}
195
+ Path: {repo_path}
196
+ Total Files: {repo_info['total_files']}
197
+ Languages: {languages}
198
+
199
+ AVAILABLE TOOLS:
200
+ - search_repository: Search this cloned repository using BM25 keyword matching (use this to find functions, classes, or code patterns in the Flask repo)
201
+ - read_file: Read a specific file (use full path: {repo_path}/filename.py)
202
+ - search_code: Grep for exact pattern matches in the repository
203
+ """
204
+
205
+ # Prepare the full task with context
206
+ # Remove the GitHub URL from the message to get just the user's query
207
+ user_query = message.content
208
+ print(f"[DEBUG] Original message.content: '{message.content}'")
209
+ print(f"[DEBUG] GitHub URL found: '{github_url}'")
210
+
211
+ if github_url:
212
+ # Remove the URL from the message to get the actual task
213
+ import re
214
+ user_query = re.sub(r'https?://github\.com/[^\s]+', '', user_query).strip()
215
+ print(f"[DEBUG] After URL removal: '{user_query}'")
216
+
217
+ full_task = task_context + "\n\n" + user_query if task_context else user_query
218
+
219
+ print(f"[DEBUG] task_context exists: {bool(task_context)}")
220
+ print(f"[DEBUG] task_context length: {len(task_context) if task_context else 0}")
221
+ print(f"[DEBUG] Final user_query: '{user_query}'")
222
+ print(f"[DEBUG] Full task (first 500 chars): '{full_task[:500]}...'")
223
+
224
  # Create a message for streaming logs
225
  log_msg = cl.Message(content="")
226
  await log_msg.send()
 
233
  """Run orchestrator in thread and capture output."""
234
  try:
235
  with redirect_stdout(captured_output), redirect_stderr(captured_output):
236
+ return orchestrator.run(full_task)
237
  except Exception as e:
238
  # Capture any exceptions from orchestrator
239
+ print(f"Error in orchestrator: {str(e)}")
240
  import traceback
241
  traceback.print_exc()
242
  raise
 
268
  filtered_lines = []
269
  for line in accumulated_logs.split('\n'):
270
  # Extract token usage before filtering (only count each line once!)
271
+ if 'Tokens:' in line and line not in seen_token_lines:
272
  seen_token_lines.add(line) # Mark as counted
273
  try:
274
+ # Parse: "Tokens: 505 prompt + 20 completion = 525 total"
275
  parts = line.split('Tokens:')[1].strip()
276
  prompt = int(parts.split('prompt')[0].strip())
277
  completion = int(parts.split('+')[1].split('completion')[0].strip())
 
282
  pass
283
 
284
  # Skip token counts, progress bars, and verbose details
285
+ if any(skip in line for skip in ['Tokens:', 'Batches:', '|##', 'it/s]']):
286
  continue
287
  # Keep important lines
288
  if any(keep in line for keep in [
289
+ '[CLASSIFIER]', '[ORCHESTRATOR]', '[PLANNER]', '[CODER]', '[REVIEWER]',
290
+ '[EXPLORER]', 'Calling tool:', 'Tool', 'Transitioning', 'APPROVED', 'REJECTED',
291
+ '[GITHUB]', 'Cloning', 'Repository'
292
  ]):
293
  filtered_lines.append(line)
294
 
295
  filtered_output = '\n'.join(filtered_lines)
296
 
297
+ # Calculate cost (Claude Sonnet 4.5 pricing: $3/1M input, $15/1M output)
298
+ input_cost = (total_prompt_tokens / 1000000) * 3.0
299
+ output_cost = (total_completion_tokens / 1000000) * 15.0
300
  total_cost = input_cost + output_cost
301
 
302
  # Add usage summary to logs
303
+ usage_summary = f"\n\nCREDITS USED:\n"
304
  usage_summary += f" Input: {total_prompt_tokens:,} tokens (${input_cost:.4f})\n"
305
  usage_summary += f" Output: {total_completion_tokens:,} tokens (${output_cost:.4f})\n"
306
  usage_summary += f" Total: {total_tokens:,} tokens (${total_cost:.4f})"
 
316
  final_logs = captured_output.getvalue()
317
 
318
  # Update with final logs
319
+ log_msg.content = f"## Execution Log\n```\n{final_logs}\n```"
320
  await log_msg.update()
321
 
322
  # Send results summary
323
  summary_lines = []
324
 
325
  if result.get('plan'):
326
+ summary_lines.append("## Planner")
327
+ summary_lines.append(f"Plan created ({len(result['plan'])} chars)\n")
328
 
329
  if result.get('code_changes'):
330
+ summary_lines.append("## Coder")
331
+ summary_lines.append(f"Created {len(result['code_changes'])} file(s):")
332
  for file_path in result['code_changes'].keys():
333
  summary_lines.append(f" - {file_path}")
334
  summary_lines.append("")
335
 
336
  if result.get('review_feedback'):
337
+ summary_lines.append("## Reviewer")
338
  if result.get('success'):
339
+ summary_lines.append("Code approved")
340
  else:
341
+ summary_lines.append("Needs revision")
342
  summary_lines.append("")
343
 
344
+ summary_lines.append("## Result")
345
  if result.get('success'):
346
+ summary_lines.append(f"**Success** (Iterations: {result.get('iterations', 'N/A')})")
347
  else:
348
+ summary_lines.append(f"**Incomplete** (Iterations: {result.get('iterations', 'N/A')})")
349
 
350
+ # Add final cost summary (Claude Sonnet 4.5 pricing: $3/1M input, $15/1M output)
351
+ summary_lines.append("\n## API Credits Used (Claude Sonnet 4.5)")
352
  summary_lines.append(f"**Total Tokens:** {total_tokens:,}")
353
+ summary_lines.append(f"- Input: {total_prompt_tokens:,} tokens (${(total_prompt_tokens/1000000)*3.0:.4f})")
354
+ summary_lines.append(f"- Output: {total_completion_tokens:,} tokens (${(total_completion_tokens/1000000)*15.0:.4f})")
355
  summary_lines.append(f"\n**Estimated Cost:** ${total_cost:.4f}")
356
 
357
  await cl.Message(content="\n".join(summary_lines)).send()
 
362
  error_type = type(e).__name__
363
 
364
  if "rate_limit" in error_message.lower() or "429" in error_message:
365
+ user_message = f"""## Rate Limit Reached
366
 
367
+ Claude API rate limit exceeded. This happens when too many requests are made in a short time.
368
 
369
  **What to do:**
370
  - Wait a few minutes and try again
 
376
  {error_message}
377
  ```
378
  """
379
+ elif "insufficient_quota" in error_message.lower() or "credit" in error_message.lower():
380
+ user_message = f"""## API Credits Exhausted
381
 
382
+ Your Anthropic API credits have been exhausted.
383
 
384
  **What to do:**
385
+ - Add credits to your Anthropic account at https://console.anthropic.com/settings/billing
386
+ - Check your usage at https://console.anthropic.com/settings/usage
387
+ - Current model: Claude Sonnet 4.5 (~$0.20 per task)
388
 
389
  **Error details:**
390
  ```
 
392
  ```
393
  """
394
  elif "api_key" in error_message.lower() or "authentication" in error_message.lower():
395
+ user_message = f"""## API Key Error
396
 
397
+ There's an issue with your Anthropic API key.
398
 
399
  **What to do:**
400
+ - Verify your ANTHROPIC_API_KEY in .env file
401
+ - Check that the key is valid at https://console.anthropic.com/settings/keys
402
  - Restart the application after updating .env
403
 
404
  **Error details:**
 
407
  ```
408
  """
409
  elif "timeout" in error_message.lower():
410
+ user_message = f"""## Request Timeout
411
 
412
  The operation took too long and timed out.
413
 
 
423
  """
424
  else:
425
  # Generic error with helpful context
426
+ user_message = f"""## Error Occurred
427
 
428
  An unexpected error occurred during execution.
429
 
codepilot/agents/__init__.py CHANGED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ CodePilot Agents Module
3
+
4
+ This module contains all agent implementations:
5
+ - ExplorerAgent: Lightweight agent for search/exploration queries
6
+ - PlannerAgent: Creates implementation plans
7
+ - CoderAgent: Implements code based on plans
8
+ - ReviewerAgent: Reviews code for quality
9
+ - Orchestrator: Routes tasks and manages multi-agent workflow
10
+ """
11
+
12
+ from codepilot.agents.explorer_agent import ExplorerAgent
13
+ from codepilot.agents.planner_agent import PlannerAgent
14
+ from codepilot.agents.coder_agent import CoderAgent
15
+ from codepilot.agents.reviewer_agent import ReviewerAgent
16
+ from codepilot.agents.orchestrator import Orchestrator
17
+
18
+ __all__ = [
19
+ "ExplorerAgent",
20
+ "PlannerAgent",
21
+ "CoderAgent",
22
+ "ReviewerAgent",
23
+ "Orchestrator"
24
+ ]
codepilot/agents/base_agent.py CHANGED
@@ -12,18 +12,22 @@ from codepilot.tools.registry import get_tools, get_tool_function
12
  class Agent:
13
  """Main agent that executes tasks using LLM and tools"""
14
 
15
- def __init__(self, model: str = "gpt-3.5-turbo", max_iterations: int = 10):
16
  """
17
  Initialize the agent
18
 
19
  Args:
20
- model: OpenAI model to use
21
  max_iterations: Maximum number of LLM calls to prevent infinite loops
22
  """
23
  print("🚀 Initializing Agent...")
24
 
25
- # Initialize components
26
- self.client = OpenAIClient(model=model)
 
 
 
 
27
  self.conversation = ConversationManager()
28
  self.tools = get_tools()
29
  self.max_iterations = max_iterations
@@ -52,7 +56,7 @@ class Agent:
52
  for iteration in range(1, self.max_iterations + 1):
53
  print(f"\n--- Iteration {iteration}/{self.max_iterations} ---")
54
 
55
- # Call OpenAI with current conversation and tools
56
  response = self.client.chat(
57
  messages=self.conversation.get_messages(),
58
  tools=self.tools
@@ -87,7 +91,10 @@ class Agent:
87
  for tool_call in tool_calls:
88
  self._execute_tool_call(tool_call)
89
 
90
- # Continue loop - send results back to OpenAI
 
 
 
91
  continue
92
 
93
  else:
@@ -106,7 +113,7 @@ class Agent:
106
  Execute a single tool call
107
 
108
  Args:
109
- tool_call: Tool call object from OpenAI response
110
  """
111
  tool_id = tool_call.id
112
  tool_name = tool_call.function.name
 
12
  class Agent:
13
  """Main agent that executes tasks using LLM and tools"""
14
 
15
+ def __init__(self, model: str = "claude-sonnet-4-5-20250929", max_iterations: int = 10):
16
  """
17
  Initialize the agent
18
 
19
  Args:
20
+ model: LLM model to use (default: Claude Sonnet 4.5)
21
  max_iterations: Maximum number of LLM calls to prevent infinite loops
22
  """
23
  print("🚀 Initializing Agent...")
24
 
25
+ # Initialize components - use Claude by default
26
+ from codepilot.llm.claude_client import ClaudeClient
27
+ if "claude" in model.lower():
28
+ self.client = ClaudeClient(model=model)
29
+ else:
30
+ self.client = OpenAIClient(model=model)
31
  self.conversation = ConversationManager()
32
  self.tools = get_tools()
33
  self.max_iterations = max_iterations
 
56
  for iteration in range(1, self.max_iterations + 1):
57
  print(f"\n--- Iteration {iteration}/{self.max_iterations} ---")
58
 
59
+ # Call LLM with current conversation and tools
60
  response = self.client.chat(
61
  messages=self.conversation.get_messages(),
62
  tools=self.tools
 
91
  for tool_call in tool_calls:
92
  self._execute_tool_call(tool_call)
93
 
94
+ # Trim conversation to prevent context overflow (optimized for Claude's 200K context)
95
+ self.conversation.trim_messages(keep_recent=8)
96
+
97
+ # Continue loop - send results back to LLM
98
  continue
99
 
100
  else:
 
113
  Execute a single tool call
114
 
115
  Args:
116
+ tool_call: Tool call object from LLM response
117
  """
118
  tool_id = tool_call.id
119
  tool_name = tool_call.function.name
codepilot/agents/coder_agent.py CHANGED
@@ -1,105 +1,128 @@
1
  """
2
- Coder Agent - Implements code based on plans
3
 
4
  The Coder's job:
5
  1. Read the plan from Planner
6
- 2. Search/read existing code to understand it
7
  3. Write code changes to implement the plan
8
- 4. Follow best practices and coding standards
9
 
10
- Tools it has access to:
11
- - search_codebase (find relevant files)
12
- - read_file (understand existing code)
13
- - write_file (implement changes)
14
- - list_files (explore structure)
15
  """
16
 
17
  from codepilot.llm.client import OpenAIClient
 
18
  from codepilot.tools.registry import get_tools, get_tool_function
19
  from codepilot.agents.conversation import ConversationManager
20
- from typing import Dict, Any
21
  import json
22
 
23
 
24
- # Coder's specialized system prompt
25
  CODER_SYSTEM_PROMPT = """You are an expert software engineer and implementation specialist.
26
 
27
- Your ONLY job is to write code that implements the given plan. You do NOT create plans yourself.
28
 
29
- When given a plan:
30
- 1. Read and understand each step carefully
31
- 2. Search the codebase to find relevant files
32
- 3. Read existing files to understand the current implementation
33
- 4. Write clean, well-structured code that follows the plan
34
- 5. Make incremental changes, one step at a time
35
 
36
- Your code should be:
37
- - Clean and readable (follow existing code style)
38
- - Well-tested (add error handling)
39
- - Documented (add comments for complex logic)
40
- - Minimal (only change what's necessary)
 
 
 
 
41
 
42
- IMPORTANT RULES:
43
- - Follow the plan exactly - don't add extra features
44
- - Match the existing code style in each file
45
- - Test your changes mentally before writing
46
- - If you need clarification on the plan, state what's unclear
47
 
48
- Tools available to you:
49
- - search_codebase: Find existing code
50
- - read_file: Understand current implementation
 
51
  - write_file: Create or modify files
52
- - list_files: Explore directory structure
53
- - upload_to_sandbox: Upload files to isolated testing environment
54
- - run_command_in_sandbox: Run commands safely in sandbox (e.g., pytest, python test.py)
55
- - execute_in_sandbox: Execute Python code snippets for quick testing
56
-
57
- IMPORTANT: Always test your code in the sandbox before submitting!
58
- 1. Write the file locally (write_file)
59
- 2. Upload to sandbox (upload_to_sandbox)
60
- 3. Run tests in sandbox (run_command_in_sandbox)
61
- 4. Fix any issues before marking as complete
 
 
 
 
 
 
62
  """
63
 
64
 
65
  class CoderAgent:
66
  """
67
- Coder Agent - Implements code based on plans.
68
 
69
  This agent is specialized for coding. It has:
70
- - Custom system prompt (engineer mindset)
71
- - Write access tools (can modify files)
72
- - Single responsibility (implementation only)
73
  """
74
 
75
- def __init__(self, model: str = "gpt-3.5-turbo"):
76
  """
77
  Initialize Coder agent.
78
 
79
  Args:
80
- model: LLM model to use
81
  """
82
- self.client = OpenAIClient(model=model)
 
 
 
 
 
83
  self.conversation = ConversationManager()
84
 
85
- # Coder gets read + write tools + sandbox execution (safe testing)
 
86
  self.allowed_tools = [
87
- "search_codebase",
88
- "read_file",
89
- "write_file",
90
- "list_files",
91
- "upload_to_sandbox",
92
- "run_command_in_sandbox",
93
- "execute_in_sandbox"
 
94
  ]
95
 
96
- def run(self, plan: str, task: str, review_feedback: str = None) -> Dict[str, str]:
 
 
 
 
 
 
97
  """
98
  Implement the given plan.
99
 
 
 
100
  Args:
101
  plan: Implementation plan from Planner
102
  task: Original task description (for context)
 
103
  review_feedback: Optional feedback from Reviewer if code was rejected
104
 
105
  Returns:
@@ -111,24 +134,34 @@ class CoderAgent:
111
  # Add system prompt
112
  self.conversation.add_message("system", CODER_SYSTEM_PROMPT)
113
 
114
- # Build user prompt with task, plan, and optionally review feedback
115
- user_prompt = f"""Original Task: {task}
 
 
 
 
 
 
 
 
 
116
 
117
- Implementation Plan:
118
- {plan}"""
 
 
119
 
120
  # If this is a rework (Reviewer rejected the code), include feedback
121
  if review_feedback:
122
- user_prompt += f"""
123
-
124
- IMPORTANT - REVIEWER FEEDBACK (CODE WAS REJECTED):
125
  {review_feedback}
126
 
127
  Please fix the issues mentioned by the Reviewer and resubmit the code."""
128
  else:
129
- user_prompt += """
130
-
131
- Please implement this plan step by step. Write clean, well-structured code that follows the plan."""
 
132
 
133
  self.conversation.add_message("user", user_prompt)
134
 
@@ -142,8 +175,8 @@ Please implement this plan step by step. Write clean, well-structured code that
142
  # Track which files were modified
143
  modified_files = {}
144
 
145
- # Run coding loop (agent reads code, writes changes)
146
- max_iterations = 15 # Coder might need more iterations than planner
147
  for iteration in range(max_iterations):
148
  # Call LLM
149
  response = self.client.chat(
@@ -163,7 +196,6 @@ Please implement this plan step by step. Write clean, well-structured code that
163
 
164
  # Check if done
165
  if finish_reason == "stop":
166
- # Agent finished coding
167
  print(f"[CODER] Finished implementation")
168
  return modified_files
169
 
 
1
  """
2
+ Coder Agent - Implements code based on plans (v3.0)
3
 
4
  The Coder's job:
5
  1. Read the plan from Planner
6
+ 2. Use exploration context (already searched by Explorer)
7
  3. Write code changes to implement the plan
8
+ 4. Test in sandbox
9
 
10
+ v3.0 Changes:
11
+ - Removed search tools (Explorer already searched)
12
+ - Receives exploration_context from orchestrator
13
+ - Focused only on reading/writing/testing
 
14
  """
15
 
16
  from codepilot.llm.client import OpenAIClient
17
+ from codepilot.llm.claude_client import ClaudeClient
18
  from codepilot.tools.registry import get_tools, get_tool_function
19
  from codepilot.agents.conversation import ConversationManager
20
+ from typing import Dict, Any, Optional
21
  import json
22
 
23
 
24
+ # Coder's specialized system prompt (v3.2 - no search, no list_files, uses exploration context)
25
  CODER_SYSTEM_PROMPT = """You are an expert software engineer and implementation specialist.
26
 
27
+ Your ONLY job is to write code that implements the given plan. You do NOT explore or search.
28
 
29
+ === CRITICAL: USE THE PROVIDED CONTEXT ===
30
+ The Explorer agent has ALREADY searched the codebase for you. All file paths and code patterns are in the EXPLORATION RESULTS below.
 
 
 
 
31
 
32
+ DO NOT:
33
+ - Navigate directories (no list_files)
34
+ - Search for files (no searching)
35
+ - Explore the codebase
36
+
37
+ DO:
38
+ - Use the exact file paths from exploration results
39
+ - Start writing code immediately
40
+ - Follow the plan step by step
41
 
42
+ === WORKFLOW ===
43
+ 1. Read the exploration results - they contain all file paths you need
44
+ 2. If modifying existing code: use get_code_chunk to read the specific function
45
+ 3. Write your changes with write_file using paths from exploration
46
+ 4. Test in sandbox if needed
47
 
48
+ === TOOLS ===
49
+ - get_file_outline: See file structure (use if unsure about a file)
50
+ - get_code_chunk: Read ONE specific function/class
51
+ - read_file: Read entire file (only when rewriting whole file)
52
  - write_file: Create or modify files
53
+ - upload_to_sandbox: Upload files for testing
54
+ - run_command_in_sandbox: Run tests in sandbox
55
+ - execute_in_sandbox: Execute Python snippets
56
+
57
+ === SANDBOX WORKFLOW ===
58
+ When testing in sandbox:
59
+ 1. Upload with RELATIVE path: upload_to_sandbox(path="file.py", content=code)
60
+ 2. Run with RELATIVE path: run_command_in_sandbox(command="python file.py")
61
+ 3. The sandbox CANNOT access /tmp/codepilot_repos/ - use simple filenames!
62
+
63
+ Your code should be:
64
+ - Clean (follow existing code style)
65
+ - Minimal (only change what's necessary)
66
+ - Follow the plan exactly
67
+
68
+ START CODING IMMEDIATELY - do not explore!
69
  """
70
 
71
 
72
  class CoderAgent:
73
  """
74
+ Coder Agent - Implements code based on plans (v3.0).
75
 
76
  This agent is specialized for coding. It has:
77
+ - NO search tools (Explorer already searched)
78
+ - Receives exploration_context
79
+ - Write access + sandbox execution
80
  """
81
 
82
+ def __init__(self, model: str = "claude-sonnet-4-5-20250929"):
83
  """
84
  Initialize Coder agent.
85
 
86
  Args:
87
+ model: LLM model to use (default: Claude Sonnet 4.5)
88
  """
89
+ # Use Claude client for Claude models, OpenAI client as fallback
90
+ if "claude" in model.lower():
91
+ self.client = ClaudeClient(model=model)
92
+ else:
93
+ self.client = OpenAIClient(model=model)
94
+
95
  self.conversation = ConversationManager()
96
 
97
+ # v3.2: Removed list_files - Explorer provides all paths needed
98
+ # Coder only needs: read, write, and sandbox tools
99
  self.allowed_tools = [
100
+ "get_file_outline", # Get file structure without full code
101
+ "get_code_chunk", # Extract specific function/class by name
102
+ "read_file", # Full file contents (use sparingly)
103
+ "write_file", # Create or modify files
104
+ # "list_files" REMOVED - use exploration context instead
105
+ "upload_to_sandbox", # Upload files for testing
106
+ "run_command_in_sandbox", # Run tests in sandbox
107
+ "execute_in_sandbox" # Execute Python snippets
108
  ]
109
 
110
+ def run(
111
+ self,
112
+ plan: str,
113
+ task: str,
114
+ exploration_context: Optional[str] = None,
115
+ review_feedback: Optional[str] = None
116
+ ) -> Dict[str, str]:
117
  """
118
  Implement the given plan.
119
 
120
+ v3.0: Now receives exploration_context so it doesn't need to search.
121
+
122
  Args:
123
  plan: Implementation plan from Planner
124
  task: Original task description (for context)
125
+ exploration_context: Context gathered by Explorer agent
126
  review_feedback: Optional feedback from Reviewer if code was rejected
127
 
128
  Returns:
 
134
  # Add system prompt
135
  self.conversation.add_message("system", CODER_SYSTEM_PROMPT)
136
 
137
+ # Build user prompt with exploration context, task, plan
138
+ user_prompt = f"""=== ORIGINAL TASK ===
139
+ {task}
140
+
141
+ """
142
+ # Add exploration context if available
143
+ if exploration_context:
144
+ user_prompt += f"""=== EXPLORATION RESULTS (from Explorer agent) ===
145
+ {exploration_context}
146
+
147
+ """
148
 
149
+ user_prompt += f"""=== IMPLEMENTATION PLAN (from Planner agent) ===
150
+ {plan}
151
+
152
+ """
153
 
154
  # If this is a rework (Reviewer rejected the code), include feedback
155
  if review_feedback:
156
+ user_prompt += f"""=== REVIEWER FEEDBACK (CODE WAS REJECTED) ===
 
 
157
  {review_feedback}
158
 
159
  Please fix the issues mentioned by the Reviewer and resubmit the code."""
160
  else:
161
+ user_prompt += """Please implement this plan step by step.
162
+ Use the exploration results to understand the codebase structure.
163
+ Write clean, well-structured code that follows the plan.
164
+ Test your code in the sandbox before finishing."""
165
 
166
  self.conversation.add_message("user", user_prompt)
167
 
 
175
  # Track which files were modified
176
  modified_files = {}
177
 
178
+ # Run coding loop
179
+ max_iterations = 15
180
  for iteration in range(max_iterations):
181
  # Call LLM
182
  response = self.client.chat(
 
196
 
197
  # Check if done
198
  if finish_reason == "stop":
 
199
  print(f"[CODER] Finished implementation")
200
  return modified_files
201
 
codepilot/agents/conversation.py CHANGED
@@ -1,6 +1,6 @@
1
  """
2
  Conversation Manager
3
- Handles conversation history in OpenAI's message format
4
  """
5
 
6
  from typing import List, Dict, Any
@@ -58,13 +58,13 @@ class ConversationManager:
58
  Add an assistant message with tool calls
59
 
60
  Args:
61
- tool_calls: List of tool call objects from OpenAI response
62
  """
63
  # Extract tool call info for logging
64
  tool_names = [tc.function.name for tc in tool_calls]
65
  print(f"🔧 Assistant calling tools: {tool_names}")
66
 
67
- # OpenAI requires this specific format
68
  self.messages.append({
69
  "role": "assistant",
70
  "content": None, # No text content when making tool calls
@@ -86,7 +86,7 @@ class ConversationManager:
86
  Add a tool execution result to the conversation
87
 
88
  Args:
89
- tool_call_id: The ID of the tool call (from OpenAI)
90
  tool_name: Name of the tool that was executed
91
  result: The result string from the tool
92
  """
@@ -109,6 +109,25 @@ class ConversationManager:
109
  """
110
  return self.messages
111
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
112
  def clear(self):
113
  """Clear all messages from history"""
114
  self.messages = []
 
1
  """
2
  Conversation Manager
3
+ Handles conversation history in standard LLM message format
4
  """
5
 
6
  from typing import List, Dict, Any
 
58
  Add an assistant message with tool calls
59
 
60
  Args:
61
+ tool_calls: List of tool call objects from LLM response
62
  """
63
  # Extract tool call info for logging
64
  tool_names = [tc.function.name for tc in tool_calls]
65
  print(f"🔧 Assistant calling tools: {tool_names}")
66
 
67
+ # Standard tool call format (converted by client for Claude)
68
  self.messages.append({
69
  "role": "assistant",
70
  "content": None, # No text content when making tool calls
 
86
  Add a tool execution result to the conversation
87
 
88
  Args:
89
+ tool_call_id: The ID of the tool call (from LLM)
90
  tool_name: Name of the tool that was executed
91
  result: The result string from the tool
92
  """
 
109
  """
110
  return self.messages
111
 
112
+ def trim_messages(self, keep_recent: int = 10):
113
+ """
114
+ Trim conversation history to prevent context overflow.
115
+ Keeps the system message and the most recent N messages.
116
+
117
+ Args:
118
+ keep_recent: Number of recent messages to keep (default: 10)
119
+ """
120
+ if len(self.messages) <= keep_recent + 1:
121
+ return # No need to trim
122
+
123
+ # Keep system message (first) + recent messages
124
+ system_msg = [self.messages[0]] if self.messages and self.messages[0].get("role") == "system" else []
125
+ recent_msgs = self.messages[-keep_recent:]
126
+
127
+ old_count = len(self.messages)
128
+ self.messages = system_msg + recent_msgs
129
+ print(f"✂️ Trimmed conversation: {old_count} → {len(self.messages)} messages")
130
+
131
  def clear(self):
132
  """Clear all messages from history"""
133
  self.messages = []
codepilot/agents/explorer_agent.py ADDED
@@ -0,0 +1,168 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Explorer Agent - Lightweight agent for search and exploration queries
3
+
4
+ The Explorer's job:
5
+ 1. Search the codebase to find relevant code
6
+ 2. Answer questions about the codebase
7
+ 3. Explain how code works
8
+
9
+ This agent is used for queries like:
10
+ - "Find the Flask class"
11
+ - "Where is the login function?"
12
+ - "Explain how routing works"
13
+
14
+ It does NOT write code - just explores and explains.
15
+ """
16
+
17
+ from codepilot.llm.client import OpenAIClient
18
+ from codepilot.llm.claude_client import ClaudeClient
19
+ from codepilot.tools.registry import get_tools, get_tool_function
20
+ from codepilot.agents.conversation import ConversationManager
21
+ import json
22
+
23
+
24
+ # Explorer's specialized system prompt - optimized for token efficiency
25
+ EXPLORER_SYSTEM_PROMPT = """You are a code exploration expert.
26
+
27
+ Your job is to search codebases and answer questions about code.
28
+ You do NOT write code or create plans - just find and explain.
29
+
30
+ === TOKEN-EFFICIENT WORKFLOW ===
31
+ 1. Use search_code or search_repository to find relevant files
32
+ 2. Use get_file_outline to see file structure (~50 tokens, NOT full code)
33
+ 3. Use get_code_chunk to read ONLY the specific function/class you need
34
+ 4. Provide a clear, concise answer
35
+
36
+ NEVER use read_file - it wastes tokens by reading entire files!
37
+
38
+ === TOOLS ===
39
+ - get_file_outline: See file structure WITHOUT code - USE THIS!
40
+ - get_code_chunk: Read ONE specific function/class - USE THIS!
41
+ - search_code: Grep for exact patterns (e.g., "^class Flask")
42
+ - search_repository: Semantic search (BM25 + embeddings)
43
+ - list_files: List directory contents
44
+
45
+ === RESPONSE FORMAT ===
46
+ After finding the answer, respond with:
47
+ 1. What you found (file path, line numbers)
48
+ 2. Brief explanation of how it works
49
+ 3. Key code snippets if relevant
50
+
51
+ Be concise. Answer the question directly.
52
+ """
53
+
54
+
55
+ class ExplorerAgent:
56
+ """
57
+ Explorer Agent - Lightweight agent for search/exploration queries.
58
+
59
+ This agent is specialized for exploration. It has:
60
+ - Minimal system prompt (token-efficient)
61
+ - Read-only tools (no write access)
62
+ - Fewer iterations (max 5)
63
+ - No read_file (forces use of efficient tools)
64
+ """
65
+
66
+ def __init__(self, model: str = "claude-sonnet-4-5-20250929"):
67
+ """
68
+ Initialize Explorer agent.
69
+
70
+ Args:
71
+ model: LLM model to use (default: Claude Sonnet 4.5)
72
+ """
73
+ # Use Claude client for Claude models, OpenAI client as fallback
74
+ if "claude" in model.lower():
75
+ self.client = ClaudeClient(model=model)
76
+ else:
77
+ self.client = OpenAIClient(model=model)
78
+
79
+ self.conversation = ConversationManager()
80
+
81
+ # Explorer only gets token-efficient read-only tools
82
+ # Intentionally excludes read_file to force efficient tool usage
83
+ self.allowed_tools = [
84
+ "get_file_outline", # File structure without code
85
+ "get_code_chunk", # Specific function/class only
86
+ "search_code", # Grep pattern matching
87
+ "search_repository", # Semantic search
88
+ "list_files" # Directory listing
89
+ ]
90
+
91
+ def run(self, query: str) -> str:
92
+ """
93
+ Explore the codebase to answer a query.
94
+
95
+ Args:
96
+ query: User's question (e.g., "Find the Flask class")
97
+
98
+ Returns:
99
+ Answer as a string
100
+ """
101
+ # Reset conversation
102
+ self.conversation = ConversationManager()
103
+
104
+ # Add system prompt
105
+ self.conversation.add_message("system", EXPLORER_SYSTEM_PROMPT)
106
+
107
+ # Add user query
108
+ self.conversation.add_message("user", query)
109
+
110
+ # Get only the tools this agent is allowed to use
111
+ all_tools = get_tools()
112
+ explorer_tools = [
113
+ tool for tool in all_tools
114
+ if tool['function']['name'] in self.allowed_tools
115
+ ]
116
+
117
+ # Run exploration loop (fewer iterations than other agents)
118
+ max_iterations = 5
119
+ for iteration in range(max_iterations):
120
+ # Call LLM
121
+ response = self.client.chat(
122
+ messages=self.conversation.get_messages(),
123
+ tools=explorer_tools
124
+ )
125
+
126
+ finish_reason = response.choices[0].finish_reason
127
+ message = response.choices[0].message
128
+
129
+ # Add assistant response to conversation
130
+ self.conversation.add_message(
131
+ role="assistant",
132
+ content=message.content,
133
+ tool_calls=message.tool_calls
134
+ )
135
+
136
+ # Check if done
137
+ if finish_reason == "stop":
138
+ # Agent finished exploring
139
+ return message.content
140
+
141
+ # Execute tool calls
142
+ if finish_reason == "tool_calls":
143
+ for tool_call in message.tool_calls:
144
+ tool_name = tool_call.function.name
145
+ tool_args = json.loads(tool_call.function.arguments)
146
+
147
+ print(f"[EXPLORER] Calling tool: {tool_name}({tool_args})")
148
+
149
+ # Execute tool
150
+ tool_func = get_tool_function(tool_name)
151
+ if tool_func:
152
+ result = tool_func(**tool_args)
153
+ else:
154
+ result = f"Error: Tool {tool_name} not found"
155
+
156
+ # Add tool result to conversation
157
+ self.conversation.add_tool_result(
158
+ tool_call_id=tool_call.id,
159
+ tool_name=tool_name,
160
+ result=str(result)
161
+ )
162
+
163
+ # If we hit max iterations, return what we have
164
+ return "I found some information but couldn't complete the search. Please try a more specific query."
165
+
166
+ def get_tool_access(self) -> list:
167
+ """Return list of tools this agent can access."""
168
+ return self.allowed_tools
codepilot/agents/orchestrator.py CHANGED
@@ -8,16 +8,22 @@ The orchestrator is the "brain" that:
8
  4. Handles the overall task flow
9
  """
10
 
 
 
 
 
11
  from enum import Enum
12
  from typing import Dict, Any, Optional
13
  from dataclasses import dataclass
14
  from codepilot.agents.planner_agent import PlannerAgent
15
  from codepilot.agents.coder_agent import CoderAgent
16
  from codepilot.agents.reviewer_agent import ReviewerAgent
 
17
 
18
 
19
  class AgentState(Enum):
20
  """Possible states in the multi-agent workflow"""
 
21
  PLANNING = "planning"
22
  CODING = "coding"
23
  REVIEWING = "reviewing"
@@ -33,7 +39,8 @@ class TaskContext:
33
  Think of this as a clipboard that agents write to and read from.
34
  """
35
  task_description: str # Original task from user
36
- plan: Optional[str] = None # Created by Planner
 
37
  code_changes: Optional[Dict[str, str]] = None # Created by Coder
38
  review_feedback: Optional[str] = None # Created by Reviewer
39
  error_message: Optional[str] = None # Set if something fails
@@ -48,14 +55,16 @@ class Orchestrator:
48
  """
49
  Orchestrator manages the multi-agent workflow.
50
 
51
- Flow:
52
- 1. Start in PLANNING state
53
- 2. Call Planner agent → get plan
54
- 3. Transition to CODING state
55
- 4. Call Coder agent → get code
56
- 5. Transition to REVIEWING state
57
- 6. Call Reviewer agent → get feedback
58
- 7. If approved COMPLETE
 
 
59
  If rejected → back to CODING (loop)
60
  """
61
 
@@ -71,24 +80,178 @@ class Orchestrator:
71
  self.max_iterations = max_iterations
72
  self.context = None
73
 
74
- # Create agent instances
75
- self.planner = PlannerAgent()
76
- self.coder = CoderAgent()
77
- self.reviewer = ReviewerAgent()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
78
 
79
  def run(self, task: str) -> Dict[str, Any]:
80
  """
81
  Run the multi-agent workflow for a task.
82
 
 
 
 
 
83
  Args:
84
  task: User's task description (e.g., "Add a login feature")
85
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
86
  Returns:
87
  Result dict with status, changes, and messages
88
  """
89
  # Initialize context
90
  self.context = TaskContext(task_description=task)
91
- self.state = AgentState.PLANNING
92
 
93
  # Main state machine loop
94
  while self.state not in [AgentState.COMPLETE, AgentState.FAILED]:
@@ -99,7 +262,10 @@ class Orchestrator:
99
  break
100
 
101
  # Execute current state
102
- if self.state == AgentState.PLANNING:
 
 
 
103
  self._execute_planning()
104
 
105
  elif self.state == AgentState.CODING:
@@ -113,22 +279,49 @@ class Orchestrator:
113
  # Return final result
114
  return self._build_result()
115
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
116
  def _execute_planning(self):
117
  """
118
  Execute planning state: call Planner agent.
119
 
120
- Planner's job:
121
- - Understand the task
122
- - Search codebase for relevant files
123
- - Create step-by-step plan
124
 
125
  Transition: Always go to CODING next
126
  """
127
  print(f"\n[ORCHESTRATOR] State: PLANNING")
128
- print(f"[ORCHESTRATOR] Task: {self.context.task_description}")
129
 
130
- # Call the real Planner agent!
131
- self.context.plan = self.planner.run(self.context.task_description)
 
 
 
132
 
133
  # Transition to coding
134
  self.state = AgentState.CODING
@@ -138,10 +331,11 @@ class Orchestrator:
138
  """
139
  Execute coding state: call Coder agent.
140
 
141
- Coder's job:
142
- - Read the plan
143
- - Read relevant files
144
- - Write code changes
 
145
 
146
  Transition: Always go to REVIEWING next
147
  """
@@ -149,14 +343,15 @@ class Orchestrator:
149
 
150
  # Check if this is a rework (Reviewer rejected previous code)
151
  if self.context.review_feedback:
152
- print(f"[ORCHESTRATOR] Passing plan + REVIEWER FEEDBACK to Coder agent...")
153
  else:
154
- print(f"[ORCHESTRATOR] Passing plan to Coder agent...")
155
 
156
- # Call the real Coder agent (with review feedback if available)!
157
  self.context.code_changes = self.coder.run(
158
  plan=self.context.plan,
159
  task=self.context.task_description,
 
160
  review_feedback=self.context.review_feedback
161
  )
162
 
 
8
  4. Handles the overall task flow
9
  """
10
 
11
+ # VERSION CHECK - If you see this, new code is running!
12
+ ORCHESTRATOR_VERSION = "3.2.0-coder-no-list"
13
+ print(f"[ORCHESTRATOR] ========== LOADING VERSION {ORCHESTRATOR_VERSION} ==========")
14
+
15
  from enum import Enum
16
  from typing import Dict, Any, Optional
17
  from dataclasses import dataclass
18
  from codepilot.agents.planner_agent import PlannerAgent
19
  from codepilot.agents.coder_agent import CoderAgent
20
  from codepilot.agents.reviewer_agent import ReviewerAgent
21
+ from codepilot.agents.explorer_agent import ExplorerAgent
22
 
23
 
24
  class AgentState(Enum):
25
  """Possible states in the multi-agent workflow"""
26
+ EXPLORING = "exploring" # NEW - Explorer gathers context first
27
  PLANNING = "planning"
28
  CODING = "coding"
29
  REVIEWING = "reviewing"
 
39
  Think of this as a clipboard that agents write to and read from.
40
  """
41
  task_description: str # Original task from user
42
+ exploration_context: Optional[str] = None # NEW - Created by Explorer
43
+ plan: Optional[str] = None # Created by Planner (uses exploration_context)
44
  code_changes: Optional[Dict[str, str]] = None # Created by Coder
45
  review_feedback: Optional[str] = None # Created by Reviewer
46
  error_message: Optional[str] = None # Set if something fails
 
55
  """
56
  Orchestrator manages the multi-agent workflow.
57
 
58
+ Flow (v3.0 - Explorer First):
59
+ 1. Start in EXPLORING state
60
+ 2. Call Explorer agent → gather codebase context (token-efficient)
61
+ 3. Transition to PLANNING state
62
+ 4. Call Planner agent (no tools, pure LLM) → get plan based on exploration
63
+ 5. Transition to CODING state
64
+ 6. Call Coder agent → get code
65
+ 7. Transition to REVIEWING state
66
+ 8. Call Reviewer agent → get feedback
67
+ 9. If approved → COMPLETE
68
  If rejected → back to CODING (loop)
69
  """
70
 
 
80
  self.max_iterations = max_iterations
81
  self.context = None
82
 
83
+ # Create agent instances (using Claude Sonnet 4.5 - LATEST best coding model, 200K context)
84
+ self.explorer = ExplorerAgent(model="claude-sonnet-4-5-20250929") # Lightweight for exploration
85
+ self.planner = PlannerAgent(model="claude-sonnet-4-5-20250929")
86
+ self.coder = CoderAgent(model="claude-sonnet-4-5-20250929")
87
+ self.reviewer = ReviewerAgent(model="claude-sonnet-4-5-20250929")
88
+
89
+ def classify_task(self, task: str) -> str:
90
+ """
91
+ Classify task as 'explore' or 'code'.
92
+
93
+ Exploration tasks: find, search, explain, what is, where is
94
+ Code tasks: add, create, implement, fix, modify
95
+
96
+ Args:
97
+ task: User's task description (may include context prefix)
98
+
99
+ Returns:
100
+ 'explore' or 'code'
101
+ """
102
+ print(f"[CLASSIFIER] ########## CLASSIFIER v2.0 START ##########")
103
+ print(f"[CLASSIFIER] Raw task length: {len(task)} chars")
104
+ print(f"[CLASSIFIER] Has [REPOSITORY CONTEXT]: {'[REPOSITORY CONTEXT]' in task}")
105
+
106
+ # Extract just the user's query (after any context sections)
107
+ task_to_check = task
108
+
109
+ # If task has repository context, extract just the user query
110
+ if "[REPOSITORY CONTEXT]" in task:
111
+ print(f"[CLASSIFIER] Extracting user query from context...")
112
+ # Split by double newline and take the last non-empty part
113
+ parts = task.split("\n\n")
114
+ print(f"[CLASSIFIER] Found {len(parts)} parts after splitting")
115
+
116
+ # Get the last substantial part (user's actual query)
117
+ for i, part in enumerate(reversed(parts)):
118
+ part = part.strip()
119
+ print(f"[CLASSIFIER] Checking part {i}: '{part[:50]}...' (len={len(part)})")
120
+ if part and not part.startswith("[") and not part.startswith("AVAILABLE"):
121
+ task_to_check = part
122
+ print(f"[CLASSIFIER] Selected user query: '{part[:80]}...'")
123
+ break
124
+ else:
125
+ print(f"[CLASSIFIER] No context prefix, using raw task")
126
+
127
+ task_lower = task_to_check.lower().strip()
128
+
129
+ # Get just the first few words to determine intent
130
+ first_words = task_lower.split()[:5]
131
+ first_part = ' '.join(first_words)
132
+
133
+ print(f"[CLASSIFIER] Final query: '{task_to_check[:100]}'")
134
+ print(f"[CLASSIFIER] First 5 words: '{first_part}'")
135
+
136
+ # EXPLORE patterns - check these FIRST (questions about code)
137
+ # These indicate the user wants to understand/find something, not change it
138
+ explore_starters = [
139
+ "find", "search", "where", "what", "how", "why",
140
+ "explain", "show", "describe", "look", "locate",
141
+ "understand", "tell", "list", "which", "does", "is there",
142
+ "can you find", "can you show", "can you explain",
143
+ "i want to know", "i want to understand", "i want to find",
144
+ "help me find", "help me understand"
145
+ ]
146
+
147
+ # Check if query STARTS with an explore pattern
148
+ for pattern in explore_starters:
149
+ if task_lower.startswith(pattern) or first_part.startswith(pattern):
150
+ print(f"[CLASSIFIER] >>>>>> RESULT: EXPLORE (starts with '{pattern}') <<<<<<")
151
+ return "explore"
152
+
153
+ # Also check for question words anywhere in short queries
154
+ if len(task_lower) < 150: # Short queries are usually questions
155
+ question_indicators = ["where is", "what is", "how does", "how do", "how is",
156
+ "what does", "which file", "which function", "which class",
157
+ "is there", "are there", "can you find", "can you show"]
158
+ for indicator in question_indicators:
159
+ if indicator in task_lower:
160
+ print(f"[CLASSIFIER] >>>>>> RESULT: EXPLORE (contains '{indicator}') <<<<<<")
161
+ return "explore"
162
+
163
+ # CODE patterns - these indicate the user wants to modify/create something
164
+ # Use word boundaries to avoid false matches (e.g., "implemented" shouldn't match "implement")
165
+ code_starters = [
166
+ "add", "create", "implement", "fix", "modify", "change",
167
+ "update", "refactor", "write", "build", "delete", "remove",
168
+ "make", "develop", "insert", "append", "edit", "replace"
169
+ ]
170
+
171
+ # Check if query STARTS with a code action word
172
+ for pattern in code_starters:
173
+ if task_lower.startswith(pattern + " ") or task_lower.startswith(pattern + "\n"):
174
+ print(f"[CLASSIFIER] >>>>>> RESULT: CODE (starts with '{pattern}') <<<<<<")
175
+ return "code"
176
+
177
+ # Check for action phrases that indicate coding intent
178
+ code_phrases = [
179
+ "i want to add", "i want to create", "i want to implement",
180
+ "i want to fix", "i want to modify", "i want to change",
181
+ "i need to add", "i need to create", "i need to implement",
182
+ "please add", "please create", "please implement", "please fix",
183
+ "can you add", "can you create", "can you implement", "can you fix"
184
+ ]
185
+
186
+ for phrase in code_phrases:
187
+ if phrase in task_lower:
188
+ print(f"[CLASSIFIER] >>>>>> RESULT: CODE (contains '{phrase}') <<<<<<")
189
+ return "code"
190
+
191
+ # Default: short queries without action words are likely exploration
192
+ if len(task_lower) < 100:
193
+ print(f"[CLASSIFIER] >>>>>> RESULT: EXPLORE (short query default) <<<<<<")
194
+ return "explore"
195
+
196
+ # Longer queries default to code (probably detailed requirements)
197
+ print(f"[CLASSIFIER] >>>>>> RESULT: CODE (long query default) <<<<<<")
198
+ return "code"
199
 
200
  def run(self, task: str) -> Dict[str, Any]:
201
  """
202
  Run the multi-agent workflow for a task.
203
 
204
+ First classifies the task:
205
+ - 'explore' → Uses lightweight ExplorerAgent only
206
+ - 'code' → Uses full Planner → Coder → Reviewer pipeline
207
+
208
  Args:
209
  task: User's task description (e.g., "Add a login feature")
210
 
211
+ Returns:
212
+ Result dict with status, changes, and messages
213
+ """
214
+ # Classify the task first
215
+ task_type = self.classify_task(task)
216
+
217
+ if task_type == "explore":
218
+ # Use lightweight Explorer agent for search/explain queries
219
+ print(f"\n[ORCHESTRATOR] Task type: EXPLORE (using Explorer agent)")
220
+ print(f"[ORCHESTRATOR] Query: {task}")
221
+
222
+ answer = self.explorer.run(task)
223
+
224
+ return {
225
+ 'status': 'complete',
226
+ 'success': True,
227
+ 'task': task,
228
+ 'plan': answer, # Explorer's answer goes in plan field
229
+ 'code_changes': None,
230
+ 'review_feedback': None,
231
+ 'error': None,
232
+ 'iterations': 1
233
+ }
234
+
235
+ # Full workflow for code tasks: Planner → Coder → Reviewer
236
+ print(f"\n[ORCHESTRATOR] Task type: CODE (using full workflow)")
237
+ return self._run_full_workflow(task)
238
+
239
+ def _run_full_workflow(self, task: str) -> Dict[str, Any]:
240
+ """
241
+ Run the full Explorer → Planner → Coder → Reviewer workflow.
242
+
243
+ v3.0: Now starts with Explorer to gather context efficiently,
244
+ then Planner creates plan based on exploration (no tools).
245
+
246
+ Args:
247
+ task: User's task description
248
+
249
  Returns:
250
  Result dict with status, changes, and messages
251
  """
252
  # Initialize context
253
  self.context = TaskContext(task_description=task)
254
+ self.state = AgentState.EXPLORING # v3.0: Start with EXPLORING
255
 
256
  # Main state machine loop
257
  while self.state not in [AgentState.COMPLETE, AgentState.FAILED]:
 
262
  break
263
 
264
  # Execute current state
265
+ if self.state == AgentState.EXPLORING:
266
+ self._execute_exploring() # NEW - Explorer first
267
+
268
+ elif self.state == AgentState.PLANNING:
269
  self._execute_planning()
270
 
271
  elif self.state == AgentState.CODING:
 
279
  # Return final result
280
  return self._build_result()
281
 
282
+ def _execute_exploring(self):
283
+ """
284
+ Execute exploring state: call Explorer agent to gather context.
285
+
286
+ Explorer's job (v3.0):
287
+ - Search codebase efficiently using token-optimized tools
288
+ - Find relevant files, functions, and patterns
289
+ - Return context summary for Planner to use
290
+
291
+ Transition: Always go to PLANNING next
292
+ """
293
+ print(f"\n[ORCHESTRATOR] State: EXPLORING")
294
+ print(f"[ORCHESTRATOR] Running Explorer to gather codebase context...")
295
+
296
+ # Run Explorer to gather context (uses token-efficient tools)
297
+ exploration_result = self.explorer.run(self.context.task_description)
298
+
299
+ # Store exploration context for Planner to use
300
+ self.context.exploration_context = exploration_result
301
+
302
+ # Transition to planning
303
+ self.state = AgentState.PLANNING
304
+ print(f"[ORCHESTRATOR] Exploration complete. Transitioning to PLANNING")
305
+
306
  def _execute_planning(self):
307
  """
308
  Execute planning state: call Planner agent.
309
 
310
+ Planner's job (v3.0):
311
+ - Receive exploration context from Explorer
312
+ - Create step-by-step plan based on exploration (NO TOOLS)
313
+ - Pure LLM reasoning - no searching
314
 
315
  Transition: Always go to CODING next
316
  """
317
  print(f"\n[ORCHESTRATOR] State: PLANNING")
318
+ print(f"[ORCHESTRATOR] Using exploration context to create plan (no tools)...")
319
 
320
+ # Call the Planner with exploration context (v3.0: Planner has no tools)
321
+ self.context.plan = self.planner.run(
322
+ task=self.context.task_description,
323
+ exploration_context=self.context.exploration_context
324
+ )
325
 
326
  # Transition to coding
327
  self.state = AgentState.CODING
 
331
  """
332
  Execute coding state: call Coder agent.
333
 
334
+ Coder's job (v3.0):
335
+ - Receive exploration context and plan
336
+ - Read/write files to implement the plan
337
+ - Test in sandbox
338
+ - NO searching (Explorer already did that)
339
 
340
  Transition: Always go to REVIEWING next
341
  """
 
343
 
344
  # Check if this is a rework (Reviewer rejected previous code)
345
  if self.context.review_feedback:
346
+ print(f"[ORCHESTRATOR] Passing exploration + plan + REVIEWER FEEDBACK to Coder...")
347
  else:
348
+ print(f"[ORCHESTRATOR] Passing exploration context + plan to Coder (no search needed)...")
349
 
350
+ # Call the Coder with exploration context (v3.0: Coder doesn't search)
351
  self.context.code_changes = self.coder.run(
352
  plan=self.context.plan,
353
  task=self.context.task_description,
354
+ exploration_context=self.context.exploration_context,
355
  review_feedback=self.context.review_feedback
356
  )
357
 
codepilot/agents/planner_agent.py CHANGED
@@ -1,157 +1,128 @@
1
  """
2
- Planner Agent - Creates implementation plans
3
 
4
  The Planner's job:
5
- 1. Understand the task
6
- 2. Search the codebase to see what exists
7
- 3. Create a detailed, step-by-step plan
8
-
9
- Tools it has access to:
10
- - search_codebase (hybrid retrieval)
11
- - read_file (to understand existing code)
12
- - list_files (to explore structure)
 
13
  """
14
 
15
  from codepilot.llm.client import OpenAIClient
16
- from codepilot.tools.registry import get_tools, get_tool_function
17
  from codepilot.agents.conversation import ConversationManager
18
- from typing import Dict, Any
19
- import json
20
 
21
 
22
- # Planner's specialized system prompt
23
  PLANNER_SYSTEM_PROMPT = """You are a senior software architect and planning expert.
24
 
25
- Your ONLY job is to create detailed implementation plans. You do NOT write code.
26
 
27
- When given a task:
28
- 1. First, search the codebase to understand what already exists
29
- 2. Identify which files need to be modified or created
30
- 3. Break down the task into clear, specific steps
31
- 4. Consider dependencies and potential risks
32
 
33
- Your plan should be:
34
- - Specific (mention exact file names, function names)
35
- - Ordered (steps build on each other)
36
- - Complete (covers all aspects of the task)
37
- - Realistic (considers existing code structure)
 
 
 
 
38
 
39
- Output your plan as a numbered list of steps.
 
 
 
 
40
 
41
- Tools available to you:
42
- - search_codebase: Search for existing code (use this first!)
43
- - read_file: Read specific files to understand them
44
- - list_files: Explore directory structure
45
-
46
- You do NOT have write_file or run_command - you only plan, never execute.
47
  """
48
 
49
 
50
  class PlannerAgent:
51
  """
52
- Planner Agent - Creates implementation plans.
53
 
54
  This agent is specialized for planning. It has:
55
- - Custom system prompt (architect mindset)
56
- - Limited tools (read-only)
57
- - Single responsibility (planning only)
 
58
  """
59
 
60
- def __init__(self, model: str = "gpt-3.5-turbo"):
61
  """
62
  Initialize Planner agent.
63
 
64
  Args:
65
- model: LLM model to use
66
  """
67
- self.client = OpenAIClient(model=model)
68
- self.conversation = ConversationManager()
69
-
70
- # Planner only gets read-only tools
71
- self.allowed_tools = [
72
- "search_codebase",
73
- "read_file",
74
- "list_files"
75
- ]
76
 
77
- def run(self, task: str) -> str:
78
  """
79
- Create a plan for the given task.
 
 
80
 
81
  Args:
82
  task: Task description (e.g., "Add login feature")
 
83
 
84
  Returns:
85
  Detailed implementation plan as a string
86
  """
87
- # Reset conversation
88
- self.conversation = ConversationManager()
89
-
90
- # Add system prompt
91
- self.conversation.add_message("system", PLANNER_SYSTEM_PROMPT)
92
-
93
- # Add user task
94
- user_prompt = f"""Task: {task}
95
-
96
- Please create a detailed implementation plan. Start by searching the codebase to understand what exists."""
97
- self.conversation.add_message("user", user_prompt)
98
-
99
- # Get only the tools this agent is allowed to use
100
- all_tools = get_tools()
101
- planner_tools = [
102
- tool for tool in all_tools
103
- if tool['function']['name'] in self.allowed_tools
104
- ]
105
-
106
- # Run planning loop (agent explores codebase, then creates plan)
107
- max_iterations = 10
108
- for iteration in range(max_iterations):
109
- # Call LLM
110
- response = self.client.chat(
111
- messages=self.conversation.get_messages(),
112
- tools=planner_tools
113
- )
114
-
115
- finish_reason = response.choices[0].finish_reason
116
- message = response.choices[0].message
117
-
118
- # Add assistant response to conversation
119
- self.conversation.add_message(
120
- role="assistant",
121
- content=message.content,
122
- tool_calls=message.tool_calls
123
- )
124
-
125
- # Check if done
126
- if finish_reason == "stop":
127
- # Agent finished planning
128
- return message.content
129
-
130
- # Execute tool calls
131
- if finish_reason == "tool_calls":
132
- for tool_call in message.tool_calls:
133
- tool_name = tool_call.function.name
134
- tool_args = json.loads(tool_call.function.arguments)
135
-
136
- print(f"[PLANNER] Calling tool: {tool_name}({tool_args})")
137
-
138
- # Execute tool
139
- tool_func = get_tool_function(tool_name)
140
- if tool_func:
141
- result = tool_func(**tool_args)
142
- else:
143
- result = f"Error: Tool {tool_name} not found"
144
-
145
- # Add tool result to conversation
146
- self.conversation.add_tool_result(
147
- tool_call_id=tool_call.id,
148
- tool_name=tool_name,
149
- result=str(result)
150
- )
151
-
152
- # If we hit max iterations, return what we have
153
- return "Error: Planner exceeded max iterations"
154
 
155
  def get_tool_access(self) -> list:
156
  """Return list of tools this agent can access."""
157
- return self.allowed_tools
 
1
  """
2
+ Planner Agent - Creates implementation plans (v3.0 - Pure LLM, No Tools)
3
 
4
  The Planner's job:
5
+ 1. Receive exploration context from Explorer agent
6
+ 2. Create a detailed, step-by-step implementation plan
7
+ 3. NO searching - Explorer already did that
8
+
9
+ v3.0 Changes:
10
+ - Removed all tools (pure LLM reasoning)
11
+ - Receives exploration_context from Explorer
12
+ - Single LLM call instead of tool loop
13
+ - ~90% token reduction vs v2.0
14
  """
15
 
16
  from codepilot.llm.client import OpenAIClient
17
+ from codepilot.llm.claude_client import ClaudeClient
18
  from codepilot.agents.conversation import ConversationManager
19
+ from typing import Optional
 
20
 
21
 
22
+ # Planner's system prompt (v3.0 - no tools, just planning)
23
  PLANNER_SYSTEM_PROMPT = """You are a senior software architect and planning expert.
24
 
25
+ Your ONLY job is to create detailed implementation plans based on the exploration context provided.
26
 
27
+ You do NOT have any tools. The Explorer agent has already searched the codebase for you.
28
+ Use the EXPLORATION RESULTS to understand the codebase structure and create your plan.
 
 
 
29
 
30
+ === YOUR PLAN SHOULD INCLUDE ===
31
+ 1. OVERVIEW: Brief summary of what needs to be done
32
+ 2. FILES TO MODIFY: List each file with specific changes needed
33
+ 3. IMPLEMENTATION STEPS: Ordered steps with exact details:
34
+ - File path
35
+ - Function/class to modify or create
36
+ - What code to add/change
37
+ - Line numbers if provided in exploration
38
+ 4. TESTING: How to verify the changes work
39
 
40
+ === PLAN QUALITY REQUIREMENTS ===
41
+ - Be SPECIFIC: Include exact file names, function names, line numbers
42
+ - Be ORDERED: Steps should build on each other logically
43
+ - Be COMPLETE: Cover all aspects of the task
44
+ - Be CONCISE: Don't repeat information from exploration
45
 
46
+ You do NOT write code - just create the plan for the Coder agent to follow.
 
 
 
 
 
47
  """
48
 
49
 
50
  class PlannerAgent:
51
  """
52
+ Planner Agent - Creates implementation plans (v3.0).
53
 
54
  This agent is specialized for planning. It has:
55
+ - NO tools (pure LLM reasoning)
56
+ - Receives exploration context from Explorer
57
+ - Single LLM call (no iteration loop)
58
+ - Maximum token efficiency
59
  """
60
 
61
+ def __init__(self, model: str = "claude-sonnet-4-5-20250929"):
62
  """
63
  Initialize Planner agent.
64
 
65
  Args:
66
+ model: LLM model to use (default: Claude Sonnet 4.5)
67
  """
68
+ # Use Claude client for Claude models, OpenAI client as fallback
69
+ if "claude" in model.lower():
70
+ self.client = ClaudeClient(model=model)
71
+ else:
72
+ self.client = OpenAIClient(model=model)
 
 
 
 
73
 
74
+ def run(self, task: str, exploration_context: Optional[str] = None) -> str:
75
  """
76
+ Create a plan for the given task using exploration context.
77
+
78
+ v3.0: No tools - pure LLM reasoning based on Explorer's findings.
79
 
80
  Args:
81
  task: Task description (e.g., "Add login feature")
82
+ exploration_context: Context gathered by Explorer agent
83
 
84
  Returns:
85
  Detailed implementation plan as a string
86
  """
87
+ print(f"[PLANNER] Creating plan based on exploration context (no tools)")
88
+
89
+ # Build the prompt with exploration context
90
+ if exploration_context:
91
+ user_prompt = f"""=== EXPLORATION RESULTS ===
92
+ {exploration_context}
93
+
94
+ === TASK ===
95
+ {task}
96
+
97
+ Based on the exploration results above, create a detailed implementation plan.
98
+ Include specific file paths, function names, and step-by-step instructions for the Coder agent.
99
+ """
100
+ else:
101
+ # Fallback if no exploration context (shouldn't happen in v3.0)
102
+ user_prompt = f"""=== TASK ===
103
+ {task}
104
+
105
+ Create a detailed implementation plan for this task.
106
+ Note: No exploration context was provided, so make reasonable assumptions about the codebase structure.
107
+ """
108
+
109
+ # Create conversation with system prompt and user message
110
+ conversation = ConversationManager()
111
+ conversation.add_message("system", PLANNER_SYSTEM_PROMPT)
112
+ conversation.add_message("user", user_prompt)
113
+
114
+ # Single LLM call - no tools, no iteration loop
115
+ response = self.client.chat(
116
+ messages=conversation.get_messages(),
117
+ tools=None, # NO TOOLS - pure reasoning
118
+ max_tokens=2000 # Enough for a detailed plan
119
+ )
120
+
121
+ plan = response.choices[0].message.content
122
+ print(f"[PLANNER] Plan created successfully")
123
+
124
+ return plan
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
125
 
126
  def get_tool_access(self) -> list:
127
  """Return list of tools this agent can access."""
128
+ return [] # v3.0: No tools
codepilot/agents/reviewer_agent.py CHANGED
@@ -13,6 +13,7 @@ Tools it has access to:
13
  """
14
 
15
  from codepilot.llm.client import OpenAIClient
 
16
  from codepilot.tools.registry import get_tools, get_tool_function
17
  from codepilot.agents.conversation import ConversationManager
18
  from typing import Dict, Any, Tuple
@@ -24,41 +25,37 @@ REVIEWER_SYSTEM_PROMPT = """You are a senior code reviewer and quality assurance
24
 
25
  Your ONLY job is to review code changes and provide feedback. You do NOT write code yourself.
26
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
  When given code changes:
28
- 1. Read each changed file carefully
29
- 2. Check for common issues:
30
- - Bugs and logic errors
31
- - Security vulnerabilities (SQL injection, XSS, etc.)
32
- - Missing error handling
33
- - Poor naming or unclear code
34
- - Code that doesn't match the plan
35
- 3. Decide: APPROVE or REJECT
36
- 4. If rejecting, provide specific, actionable feedback
37
-
38
- Your review should be:
39
- - Thorough (check all aspects of the code)
40
- - Specific (point to exact issues with line numbers if possible)
41
- - Constructive (explain WHY something is wrong and HOW to fix it)
42
- - Fair (don't reject for minor style issues)
43
 
44
  DECISION CRITERIA:
45
- ✅ APPROVE if:
46
- - Code works correctly
47
- - No security issues
48
- - Follows the plan
49
- - Has basic error handling
50
- - Is reasonably readable
51
-
52
- REJECT if:
53
- - Code has bugs
54
- - Security vulnerabilities exist
55
- - Doesn't implement the plan
56
- - Missing critical error handling
57
- - Code is unclear or confusing
58
-
59
- Tools available to you:
60
- - read_file: Read files to understand full context
61
- - search_codebase: Check for similar patterns in the codebase
62
 
63
  You do NOT have write_file - you only review, never modify code.
64
  """
@@ -74,20 +71,27 @@ class ReviewerAgent:
74
  - Single responsibility (review only)
75
  """
76
 
77
- def __init__(self, model: str = "gpt-3.5-turbo"):
78
  """
79
  Initialize Reviewer agent.
80
 
81
  Args:
82
- model: LLM model to use
83
  """
84
- self.client = OpenAIClient(model=model)
 
 
 
 
 
85
  self.conversation = ConversationManager()
86
 
87
  # Reviewer only gets read-only tools
88
  self.allowed_tools = [
89
- "read_file",
90
- "search_codebase"
 
 
91
  ]
92
 
93
  def run(self, code_changes: Dict[str, str], plan: str, task: str) -> Tuple[bool, str]:
 
13
  """
14
 
15
  from codepilot.llm.client import OpenAIClient
16
+ from codepilot.llm.claude_client import ClaudeClient
17
  from codepilot.tools.registry import get_tools, get_tool_function
18
  from codepilot.agents.conversation import ConversationManager
19
  from typing import Dict, Any, Tuple
 
25
 
26
  Your ONLY job is to review code changes and provide feedback. You do NOT write code yourself.
27
 
28
+ === CRITICAL: TOKEN-EFFICIENT FILE READING ===
29
+ 1. NEVER use read_file as your first choice!
30
+ 2. ALWAYS use get_file_outline FIRST to see file structure (~50 tokens vs ~2000 tokens)
31
+ 3. THEN use get_code_chunk to read ONLY the specific function/class you need to review
32
+ 4. ONLY use read_file if you absolutely need the ENTIRE file (rare!)
33
+
34
+ CORRECT workflow:
35
+ get_file_outline("file.py") → See structure
36
+ get_code_chunk("file.py", "my_func") → Review just that function
37
+
38
+ WRONG workflow:
39
+ read_file("file.py") → Wastes 2000+ tokens!
40
+
41
+ === REVIEW WORKFLOW ===
42
  When given code changes:
43
+ 1. The code changes are already provided in the prompt - review those first
44
+ 2. If you need more context, use get_file_outline then get_code_chunk
45
+ 3. Check for: bugs, security issues, missing error handling, plan compliance
46
+ 4. Decide: APPROVE or REJECT
 
 
 
 
 
 
 
 
 
 
 
47
 
48
  DECISION CRITERIA:
49
+ ✅ APPROVE if: Code works, no security issues, follows plan, has error handling
50
+ REJECT if: Has bugs, security issues, doesn't follow plan, unclear code
51
+
52
+ === TOOLS ===
53
+ - get_file_outline: Get file structure WITHOUT code - USE THIS FIRST!
54
+ - get_code_chunk: Extract ONE specific function/class - USE THIS SECOND!
55
+ - read_file: Read ENTIRE file - AVOID THIS!
56
+ - search_repository: Find similar patterns
57
+
58
+ End your review with: "DECISION: APPROVE" or "DECISION: REJECT"
 
 
 
 
 
 
 
59
 
60
  You do NOT have write_file - you only review, never modify code.
61
  """
 
71
  - Single responsibility (review only)
72
  """
73
 
74
+ def __init__(self, model: str = "claude-sonnet-4-5-20250929"):
75
  """
76
  Initialize Reviewer agent.
77
 
78
  Args:
79
+ model: LLM model to use (default: Claude Sonnet 4.5)
80
  """
81
+ # Use Claude client for Claude models, OpenAI client as fallback
82
+ if "claude" in model.lower():
83
+ self.client = ClaudeClient(model=model)
84
+ else:
85
+ self.client = OpenAIClient(model=model)
86
+
87
  self.conversation = ConversationManager()
88
 
89
  # Reviewer only gets read-only tools
90
  self.allowed_tools = [
91
+ "get_file_outline", # Get file structure without full code (token-efficient!)
92
+ "get_code_chunk", # Extract specific function/class by name
93
+ "read_file", # Full file contents (use sparingly)
94
+ "search_repository"
95
  ]
96
 
97
  def run(self, code_changes: Dict[str, str], plan: str, task: str) -> Tuple[bool, str]:
codepilot/context/indexer.py CHANGED
@@ -48,11 +48,15 @@ class CodebaseIndexer:
48
  # Skip unwanted directories (modify dirs in-place)
49
  dirs[:] = [d for d in dirs if d not in [
50
  '__pycache__', 'venv', 'node_modules', '.git',
51
- '.pytest_cache', '.mypy_cache'
52
  ]]
53
 
54
  # Process each file
55
  for file in files:
 
 
 
 
56
  # Check if file has matching extension
57
  if any(file.endswith(ext) for ext in file_extensions):
58
  file_path = os.path.join(root, file)
 
48
  # Skip unwanted directories (modify dirs in-place)
49
  dirs[:] = [d for d in dirs if d not in [
50
  '__pycache__', 'venv', 'node_modules', '.git',
51
+ '.pytest_cache', '.mypy_cache', 'tests', 'test'
52
  ]]
53
 
54
  # Process each file
55
  for file in files:
56
+ # Skip test files
57
+ if file.startswith('test_') or file.endswith('_test.py'):
58
+ continue
59
+
60
  # Check if file has matching extension
61
  if any(file.endswith(ext) for ext in file_extensions):
62
  file_path = os.path.join(root, file)
codepilot/llm/claude_client.py ADDED
@@ -0,0 +1,235 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Claude Client Wrapper
3
+ Handles all communication with Anthropic's Claude API
4
+ """
5
+
6
+ import os
7
+ import json
8
+ from dotenv import load_dotenv
9
+ from anthropic import Anthropic
10
+ from typing import List, Dict, Optional
11
+
12
+ load_dotenv()
13
+
14
+
15
+ class ClaudeClient:
16
+ """Wrapper for Anthropic Claude API calls"""
17
+
18
+ def __init__(self, model: str = "claude-sonnet-4-5-20250929"):
19
+ """
20
+ Initialize Claude client
21
+
22
+ Args:
23
+ model: Claude model to use (default: claude-3-5-sonnet-20241022)
24
+ """
25
+ self.api_key = os.getenv('ANTHROPIC_API_KEY')
26
+
27
+ if not self.api_key:
28
+ raise ValueError("ANTHROPIC_API_KEY not found in environment variables")
29
+
30
+ self.client = Anthropic(api_key=self.api_key)
31
+ self.model = model
32
+
33
+ print(f"✅ Claude Client initialized with model: {self.model}")
34
+
35
+ def chat(
36
+ self,
37
+ messages: List[Dict[str, str]],
38
+ tools: Optional[List[Dict]] = None,
39
+ temperature: float = 0.7,
40
+ max_tokens: int = 1000
41
+ ):
42
+ """
43
+ Send a chat completion request to Claude
44
+
45
+ Args:
46
+ messages: List of message dicts with 'role' and 'content'
47
+ tools: Optional list of tool definitions for function calling
48
+ temperature: Randomness (0-1, lower = more focused)
49
+ max_tokens: Maximum tokens in response
50
+
51
+ Returns:
52
+ Response object compatible with OpenAI format
53
+ """
54
+ try:
55
+ # Separate system message from conversation and convert tool messages
56
+ system_message = ""
57
+ conversation_messages = []
58
+ pending_tool_results = []
59
+
60
+ for msg in messages:
61
+ if msg.get("role") == "system":
62
+ system_message = msg.get("content", "")
63
+ elif msg.get("role") == "tool":
64
+ # Collect tool results to group them
65
+ pending_tool_results.append({
66
+ "type": "tool_result",
67
+ "tool_use_id": msg.get("tool_call_id"),
68
+ "content": msg.get("content", "")
69
+ })
70
+ elif msg.get("role") == "assistant" and msg.get("tool_calls"):
71
+ # Convert OpenAI tool_calls to Claude tool_use format
72
+ # Flush pending tool results first
73
+ if pending_tool_results:
74
+ conversation_messages.append({
75
+ "role": "user",
76
+ "content": pending_tool_results
77
+ })
78
+ pending_tool_results = []
79
+
80
+ # Convert tool_calls to content blocks with tool_use
81
+ content_blocks = []
82
+ if msg.get("content"):
83
+ content_blocks.append({"type": "text", "text": msg.get("content")})
84
+
85
+ for tc in msg.get("tool_calls", []):
86
+ # Handle both dict and object formats
87
+ if isinstance(tc, dict):
88
+ tool_id = tc.get("id")
89
+ func = tc.get("function", {})
90
+ func_name = func.get("name")
91
+ func_args = func.get("arguments", "{}")
92
+ else:
93
+ tool_id = tc.id
94
+ func_name = tc.function.name
95
+ func_args = tc.function.arguments
96
+
97
+ content_blocks.append({
98
+ "type": "tool_use",
99
+ "id": tool_id,
100
+ "name": func_name,
101
+ "input": json.loads(func_args) if isinstance(func_args, str) else func_args
102
+ })
103
+
104
+ conversation_messages.append({
105
+ "role": "assistant",
106
+ "content": content_blocks
107
+ })
108
+ else:
109
+ # Flush pending tool results before adding non-tool message
110
+ if pending_tool_results:
111
+ conversation_messages.append({
112
+ "role": "user",
113
+ "content": pending_tool_results
114
+ })
115
+ pending_tool_results = []
116
+ conversation_messages.append(msg)
117
+
118
+ # Flush any remaining tool results
119
+ if pending_tool_results:
120
+ conversation_messages.append({
121
+ "role": "user",
122
+ "content": pending_tool_results
123
+ })
124
+
125
+ # Build request parameters
126
+ request_params = {
127
+ "model": self.model,
128
+ "messages": conversation_messages,
129
+ "temperature": temperature,
130
+ "max_tokens": max_tokens
131
+ }
132
+
133
+ # Add system message if present
134
+ if system_message:
135
+ request_params["system"] = system_message
136
+
137
+ # Add tools if provided (convert from OpenAI format to Claude format)
138
+ if tools:
139
+ claude_tools = self._convert_tools_to_claude_format(tools)
140
+ request_params["tools"] = claude_tools
141
+
142
+ # Make API call
143
+ response = self.client.messages.create(**request_params)
144
+
145
+ # Convert Claude response to OpenAI-compatible format
146
+ openai_compatible_response = self._convert_to_openai_format(response)
147
+
148
+ # Print token usage for cost tracking
149
+ usage = response.usage
150
+ print(f"📊 Tokens: {usage.input_tokens} prompt + {usage.output_tokens} completion = {usage.input_tokens + usage.output_tokens} total")
151
+
152
+ return openai_compatible_response
153
+
154
+ except Exception as e:
155
+ print(f"❌ Claude API Error: {e}")
156
+ raise
157
+
158
+ def _convert_tools_to_claude_format(self, openai_tools: List[Dict]) -> List[Dict]:
159
+ """Convert OpenAI tool format to Claude tool format"""
160
+ claude_tools = []
161
+
162
+ for tool in openai_tools:
163
+ if tool.get("type") == "function":
164
+ func = tool.get("function", {})
165
+ claude_tool = {
166
+ "name": func.get("name"),
167
+ "description": func.get("description"),
168
+ "input_schema": func.get("parameters", {})
169
+ }
170
+ claude_tools.append(claude_tool)
171
+
172
+ return claude_tools
173
+
174
+ def _convert_to_openai_format(self, claude_response):
175
+ """Convert Claude response to OpenAI-compatible format"""
176
+ import json
177
+
178
+ # Create simple dict-based objects that provide attribute access
179
+ class DictObject(dict):
180
+ """Object that behaves like both dict and object (JSON serializable)"""
181
+ def __init__(self, **kwargs):
182
+ super().__init__(kwargs)
183
+ self.__dict__ = self
184
+
185
+ # Extract content and tool calls from Claude response
186
+ content_parts = []
187
+ tool_calls = []
188
+
189
+ for block in claude_response.content:
190
+ if block.type == "text":
191
+ content_parts.append(block.text)
192
+ elif block.type == "tool_use":
193
+ # Convert to OpenAI tool call format
194
+ tool_call = DictObject(
195
+ id=block.id,
196
+ type="function",
197
+ function=DictObject(
198
+ name=block.name,
199
+ arguments=json.dumps(block.input)
200
+ )
201
+ )
202
+ tool_calls.append(tool_call)
203
+
204
+ # Determine finish reason
205
+ finish_reason = "stop"
206
+ if claude_response.stop_reason == "tool_use":
207
+ finish_reason = "tool_calls"
208
+ elif claude_response.stop_reason == "max_tokens":
209
+ finish_reason = "length"
210
+
211
+ # Build message
212
+ message = DictObject(
213
+ role="assistant",
214
+ content="\n".join(content_parts) if content_parts else None,
215
+ tool_calls=tool_calls if tool_calls else None
216
+ )
217
+
218
+ # Build choice
219
+ choice = DictObject(
220
+ message=message,
221
+ finish_reason=finish_reason
222
+ )
223
+
224
+ # Build usage
225
+ usage = DictObject(
226
+ prompt_tokens=claude_response.usage.input_tokens,
227
+ completion_tokens=claude_response.usage.output_tokens,
228
+ total_tokens=claude_response.usage.input_tokens + claude_response.usage.output_tokens
229
+ )
230
+
231
+ # Build response
232
+ return DictObject(
233
+ choices=[choice],
234
+ usage=usage
235
+ )
codepilot/llm/client.py CHANGED
@@ -36,7 +36,7 @@ class OpenAIClient:
36
  messages: List[Dict[str, str]],
37
  tools: Optional[List[Dict]] = None,
38
  temperature: float = 0.7,
39
- max_tokens: int = 2000
40
  ) -> openai.types.chat.ChatCompletion:
41
  """
42
  Send a chat completion request to OpenAI
 
36
  messages: List[Dict[str, str]],
37
  tools: Optional[List[Dict]] = None,
38
  temperature: float = 0.7,
39
+ max_tokens: int = 800
40
  ) -> openai.types.chat.ChatCompletion:
41
  """
42
  Send a chat completion request to OpenAI
codepilot/tools/context_tools.py CHANGED
@@ -86,7 +86,8 @@ def index_codebase(path: str = ".") -> str:
86
  })
87
 
88
  # Create and index hybrid retriever
89
- _hybrid_retriever = HybridRetriever()
 
90
  retrieval_stats = _hybrid_retriever.index_documents(documents)
91
 
92
  # Return summary
@@ -141,3 +142,20 @@ def search_codebase(query: str, top_k: int = 5) -> str:
141
  output.append(f" {line}")
142
 
143
  return '\n'.join(output)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
86
  })
87
 
88
  # Create and index hybrid retriever
89
+ # Weights tuned for code search: heavily favor BM25 (exact matches) over embeddings (semantic)
90
+ _hybrid_retriever = HybridRetriever(bm25_weight=0.85, embedding_weight=0.15)
91
  retrieval_stats = _hybrid_retriever.index_documents(documents)
92
 
93
  # Return summary
 
142
  output.append(f" {line}")
143
 
144
  return '\n'.join(output)
145
+
146
+
147
+ def search_repository(query: str, top_k: int = 5) -> str:
148
+ """
149
+ Search the cloned GitHub repository using hybrid retrieval (BM25 + embeddings).
150
+
151
+ This is a wrapper that provides semantic search over cloned repositories.
152
+ Uses the same hybrid search as search_codebase.
153
+
154
+ Args:
155
+ query: What to search for (e.g., "Flask application class", "error handling")
156
+ top_k: Number of results to return (default: 5)
157
+
158
+ Returns:
159
+ Formatted search results with file paths, scores, and code snippets
160
+ """
161
+ return search_codebase(query, top_k)
codepilot/tools/file_tools.py CHANGED
@@ -6,20 +6,35 @@ import subprocess
6
  import os
7
 
8
 
9
- def read_file(path):
10
  """
11
  Reads and returns the contents of a file.
 
12
 
13
  Args:
14
  path: File path to read
 
15
 
16
  Returns:
17
  str: File contents or error message
18
  """
19
  try:
20
  with open(path, 'r') as f:
21
- content = f.read()
22
- return f"Successfully read file '{path}':\n\n{content}"
 
 
 
 
 
 
 
 
 
 
 
 
 
23
  except FileNotFoundError:
24
  return f"Error: File '{path}' not found."
25
  except PermissionError:
 
6
  import os
7
 
8
 
9
+ def read_file(path, max_lines=200):
10
  """
11
  Reads and returns the contents of a file.
12
+ Truncates large files to prevent context overflow.
13
 
14
  Args:
15
  path: File path to read
16
+ max_lines: Maximum lines to return (default 100)
17
 
18
  Returns:
19
  str: File contents or error message
20
  """
21
  try:
22
  with open(path, 'r') as f:
23
+ lines = f.readlines()
24
+
25
+ total_lines = len(lines)
26
+
27
+ # Truncate if too large
28
+ if total_lines > max_lines:
29
+ # Keep first half and last half
30
+ keep = max_lines // 2
31
+ content = ''.join(lines[:keep])
32
+ content += f'\n... [truncated {total_lines - max_lines} lines] ...\n\n'
33
+ content += ''.join(lines[-keep:])
34
+ return f"Successfully read file '{path}' ({total_lines} lines, showing first/last {keep}):\n\n{content}"
35
+ else:
36
+ content = ''.join(lines)
37
+ return f"Successfully read file '{path}':\n\n{content}"
38
  except FileNotFoundError:
39
  return f"Error: File '{path}' not found."
40
  except PermissionError:
codepilot/tools/github_tools.py ADDED
@@ -0,0 +1,211 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ GitHub Repository Tools
3
+ Handles cloning and managing public GitHub repositories for CodePilot sessions
4
+ """
5
+
6
+ import os
7
+ import re
8
+ import shutil
9
+ import subprocess
10
+ import tempfile
11
+ from typing import Optional, Tuple
12
+ import uuid
13
+
14
+
15
+ def extract_github_url(text: str) -> Optional[str]:
16
+ """
17
+ Extract a GitHub repository URL from text.
18
+
19
+ Supports formats:
20
+ - https://github.com/user/repo
21
+ - https://github.com/user/repo.git
22
+ - github.com/user/repo
23
+ - http://github.com/user/repo
24
+
25
+ Returns:
26
+ GitHub URL if found, None otherwise
27
+ """
28
+ # Pattern to match GitHub URLs
29
+ pattern = r'(?:https?://)?(?:www\.)?github\.com/([a-zA-Z0-9_-]+)/([a-zA-Z0-9_.-]+)(?:\.git)?'
30
+ match = re.search(pattern, text)
31
+
32
+ if match:
33
+ user = match.group(1)
34
+ repo = match.group(2).rstrip('.git')
35
+ return f"https://github.com/{user}/{repo}.git"
36
+
37
+ return None
38
+
39
+
40
+ def get_repo_name(github_url: str) -> str:
41
+ """Extract repository name from GitHub URL."""
42
+ # Remove .git suffix if present
43
+ url = github_url.rstrip('.git')
44
+ # Get the last part of the URL
45
+ return url.split('/')[-1]
46
+
47
+
48
+ def clone_repository(github_url: str, base_dir: Optional[str] = None) -> Tuple[bool, str, str]:
49
+ """
50
+ Clone a public GitHub repository to a temporary directory.
51
+
52
+ Args:
53
+ github_url: The GitHub repository URL
54
+ base_dir: Optional base directory for cloning (default: system temp)
55
+
56
+ Returns:
57
+ Tuple of (success: bool, path_or_error: str, repo_name: str)
58
+ """
59
+ repo_name = get_repo_name(github_url)
60
+
61
+ # Create a unique session directory
62
+ session_id = str(uuid.uuid4())[:8]
63
+
64
+ if base_dir is None:
65
+ # Use /tmp for cloud environments (more space than tempfile default)
66
+ base_dir = "/tmp/codepilot_repos"
67
+
68
+ # Ensure base directory exists
69
+ os.makedirs(base_dir, exist_ok=True)
70
+
71
+ # Create session-specific directory
72
+ session_dir = os.path.join(base_dir, f"{repo_name}_{session_id}")
73
+
74
+ try:
75
+ # Clone with depth=1 for faster cloning (only latest commit)
76
+ result = subprocess.run(
77
+ ["git", "clone", "--depth", "1", github_url, session_dir],
78
+ capture_output=True,
79
+ text=True,
80
+ timeout=120 # 2 minute timeout
81
+ )
82
+
83
+ if result.returncode != 0:
84
+ error_msg = result.stderr or "Unknown error during clone"
85
+ # Clean up failed clone
86
+ if os.path.exists(session_dir):
87
+ shutil.rmtree(session_dir, ignore_errors=True)
88
+ return False, f"Clone failed: {error_msg}", repo_name
89
+
90
+ return True, session_dir, repo_name
91
+
92
+ except subprocess.TimeoutExpired:
93
+ # Clean up on timeout
94
+ if os.path.exists(session_dir):
95
+ shutil.rmtree(session_dir, ignore_errors=True)
96
+ return False, "Clone timed out (repository may be too large)", repo_name
97
+
98
+ except Exception as e:
99
+ # Clean up on any error
100
+ if os.path.exists(session_dir):
101
+ shutil.rmtree(session_dir, ignore_errors=True)
102
+ return False, f"Clone error: {str(e)}", repo_name
103
+
104
+
105
+ def cleanup_repository(repo_path: str) -> bool:
106
+ """
107
+ Clean up a cloned repository.
108
+
109
+ Args:
110
+ repo_path: Path to the cloned repository
111
+
112
+ Returns:
113
+ True if cleanup successful, False otherwise
114
+ """
115
+ try:
116
+ if os.path.exists(repo_path):
117
+ shutil.rmtree(repo_path)
118
+ return True
119
+ except Exception:
120
+ return False
121
+
122
+
123
+ def get_repo_info(repo_path: str) -> dict:
124
+ """
125
+ Get basic information about a cloned repository.
126
+
127
+ Args:
128
+ repo_path: Path to the cloned repository
129
+
130
+ Returns:
131
+ Dictionary with repo info
132
+ """
133
+ info = {
134
+ "path": repo_path,
135
+ "name": os.path.basename(repo_path).split('_')[0], # Remove session ID
136
+ "files": [],
137
+ "total_files": 0,
138
+ "languages": set()
139
+ }
140
+
141
+ # File extension to language mapping
142
+ ext_to_lang = {
143
+ '.py': 'Python',
144
+ '.js': 'JavaScript',
145
+ '.ts': 'TypeScript',
146
+ '.tsx': 'TypeScript',
147
+ '.jsx': 'JavaScript',
148
+ '.java': 'Java',
149
+ '.go': 'Go',
150
+ '.rs': 'Rust',
151
+ '.cpp': 'C++',
152
+ '.c': 'C',
153
+ '.h': 'C/C++',
154
+ '.rb': 'Ruby',
155
+ '.php': 'PHP',
156
+ '.swift': 'Swift',
157
+ '.kt': 'Kotlin',
158
+ '.cs': 'C#',
159
+ '.html': 'HTML',
160
+ '.css': 'CSS',
161
+ '.scss': 'SCSS',
162
+ '.md': 'Markdown',
163
+ '.json': 'JSON',
164
+ '.yaml': 'YAML',
165
+ '.yml': 'YAML',
166
+ }
167
+
168
+ # Walk the repository
169
+ for root, dirs, files in os.walk(repo_path):
170
+ # Skip hidden directories and common non-code directories
171
+ dirs[:] = [d for d in dirs if not d.startswith('.') and d not in ['node_modules', 'venv', '__pycache__', 'dist', 'build']]
172
+
173
+ for file in files:
174
+ if not file.startswith('.'):
175
+ info["total_files"] += 1
176
+ ext = os.path.splitext(file)[1].lower()
177
+ if ext in ext_to_lang:
178
+ info["languages"].add(ext_to_lang[ext])
179
+
180
+ # Store relative path
181
+ rel_path = os.path.relpath(os.path.join(root, file), repo_path)
182
+ info["files"].append(rel_path)
183
+
184
+ info["languages"] = list(info["languages"])
185
+
186
+ return info
187
+
188
+
189
+ def validate_github_url(url: str) -> Tuple[bool, str]:
190
+ """
191
+ Validate that a URL is a valid public GitHub repository.
192
+
193
+ Args:
194
+ url: The URL to validate
195
+
196
+ Returns:
197
+ Tuple of (is_valid: bool, message: str)
198
+ """
199
+ if not url:
200
+ return False, "No URL provided"
201
+
202
+ # Check if it's a GitHub URL
203
+ if 'github.com' not in url.lower():
204
+ return False, "Not a GitHub URL"
205
+
206
+ # Extract and validate format
207
+ extracted = extract_github_url(url)
208
+ if not extracted:
209
+ return False, "Invalid GitHub URL format. Expected: github.com/user/repo"
210
+
211
+ return True, extracted
codepilot/tools/registry.py CHANGED
@@ -5,6 +5,7 @@ Maps tool names to their implementations and schemas
5
 
6
  import os
7
  from codepilot.tools.file_tools import read_file, write_file, run_command, search_code, list_files, git_status
 
8
  from codepilot.sandbox.sandbox_tools import (
9
  create_sandbox,
10
  close_sandbox,
@@ -14,29 +15,120 @@ from codepilot.sandbox.sandbox_tools import (
14
  )
15
  from typing import Callable, List, Dict, Optional
16
 
17
- # Check if running in production BEFORE importing heavy ML dependencies
18
- # Detects: Render, HuggingFace Spaces, or any cloud with PORT env var
19
- _IS_PRODUCTION = os.getenv('RENDER_SERVICE_NAME') or os.getenv('RENDER') or os.getenv('SPACE_ID') or os.getenv('PORT')
20
 
21
- # Only import heavy context_tools (sentence-transformers, torch) in local development
22
- if not _IS_PRODUCTION:
23
- from codepilot.tools.context_tools import search_codebase, index_codebase
24
- else:
25
- # Provide stub functions for production to avoid import errors
26
- def search_codebase(query: str, top_k: int = 5) -> str:
27
- return "⚠️ Codebase search is disabled in cloud mode (resource constraints)"
28
 
29
- def index_codebase(root_path: str) -> str:
30
- return "⚠️ Codebase indexing is disabled in cloud mode (resource constraints)"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31
 
32
 
33
- # Tool schemas for OpenAI function calling
34
  TOOLS = [
35
  {
36
  "type": "function",
37
  "function": {
38
  "name": "read_file",
39
- "description": "Reads the contents of a file at the specified path. Use this when you need to view or analyze file contents.",
40
  "parameters": {
41
  "type": "object",
42
  "properties": {
@@ -225,7 +317,67 @@ TOOLS = [
225
  "required": ["code"]
226
  }
227
  }
228
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
229
  ]
230
 
231
 
@@ -241,7 +393,10 @@ TOOL_FUNCTIONS = {
241
  "index_codebase": index_codebase,
242
  "upload_to_sandbox": upload_to_sandbox,
243
  "execute_in_sandbox": execute_in_sandbox,
244
- "run_command_in_sandbox": run_command_in_sandbox
 
 
 
245
  }
246
 
247
 
@@ -250,7 +405,7 @@ def get_tools() -> List[Dict]:
250
  Get all available tool schemas
251
 
252
  Returns:
253
- List of tool schema dictionaries for OpenAI
254
  """
255
  return TOOLS
256
 
 
5
 
6
  import os
7
  from codepilot.tools.file_tools import read_file, write_file, run_command, search_code, list_files, git_status
8
+ from codepilot.context.parser import CodeParser
9
  from codepilot.sandbox.sandbox_tools import (
10
  create_sandbox,
11
  close_sandbox,
 
15
  )
16
  from typing import Callable, List, Dict, Optional
17
 
18
+ # Full context tools with BM25 + embeddings (requires 16GB+ RAM)
19
+ from codepilot.tools.context_tools import search_codebase, index_codebase, search_repository
 
20
 
21
+ # Initialize parser for file outline tools
22
+ _parser = CodeParser()
 
 
 
 
 
23
 
24
+
25
+ def get_file_outline(path: str) -> str:
26
+ """
27
+ Get the structure/outline of a Python file without reading full contents.
28
+ Returns classes, functions, methods with their signatures and docstrings.
29
+ Much more token-efficient than read_file for understanding file structure.
30
+
31
+ Args:
32
+ path: Path to the Python file
33
+
34
+ Returns:
35
+ Formatted outline showing file structure
36
+ """
37
+ result = _parser.parse_file(path)
38
+
39
+ if result.get('parse_errors'):
40
+ return f"Error: {result['parse_errors'][0]}"
41
+
42
+ lines = [f"# {path} ({result.get('total_lines', 0)} lines)\n"]
43
+
44
+ # Show imports summary
45
+ imports = result.get('imports', [])
46
+ if imports:
47
+ modules = set()
48
+ for imp in imports:
49
+ if imp['type'] == 'import':
50
+ modules.add(imp['name'].split('.')[0])
51
+ else:
52
+ mod = imp.get('module', '')
53
+ if mod:
54
+ modules.add(mod.split('.')[0])
55
+ lines.append(f"Imports: {', '.join(sorted(modules)[:10])}")
56
+ if len(modules) > 10:
57
+ lines.append(f" ... and {len(modules) - 10} more")
58
+ lines.append("")
59
+
60
+ # Show classes with methods
61
+ for cls in result.get('classes', []):
62
+ bases = f"({', '.join(cls['bases'])})" if cls['bases'] else ""
63
+ decorators = '\n'.join(f"@{d}" for d in cls.get('decorators', []))
64
+ if decorators:
65
+ lines.append(decorators)
66
+ lines.append(f"class {cls['name']}{bases}: # lines {cls['start_line']}-{cls['end_line']}")
67
+ if cls.get('docstring'):
68
+ # Truncate long docstrings
69
+ doc = cls['docstring'][:150].replace('\n', ' ')
70
+ if len(cls['docstring']) > 150:
71
+ doc += "..."
72
+ lines.append(f' """{doc}"""')
73
+ for method in cls.get('methods', []):
74
+ async_prefix = "async " if method.get('is_async') else ""
75
+ lines.append(f" {async_prefix}def {method['name']}() # line {method['line']}")
76
+ lines.append("")
77
+
78
+ # Show standalone functions
79
+ standalone_funcs = [f for f in result.get('functions', [])
80
+ if not any(f['start_line'] >= c['start_line'] and f['end_line'] <= c['end_line']
81
+ for c in result.get('classes', []))]
82
+
83
+ for func in standalone_funcs:
84
+ params = ', '.join(func.get('parameters', []))
85
+ async_prefix = "async " if func.get('is_async') else ""
86
+ decorators = '\n'.join(f"@{d}" for d in func.get('decorators', []))
87
+ if decorators:
88
+ lines.append(decorators)
89
+ lines.append(f"{async_prefix}def {func['name']}({params}): # lines {func['start_line']}-{func['end_line']}")
90
+ if func.get('docstring'):
91
+ doc = func['docstring'][:100].replace('\n', ' ')
92
+ if len(func['docstring']) > 100:
93
+ doc += "..."
94
+ lines.append(f' """{doc}"""')
95
+ lines.append("")
96
+
97
+ # Show globals
98
+ globals_list = result.get('globals', [])
99
+ if globals_list:
100
+ lines.append("# Global variables:")
101
+ for g in globals_list[:10]:
102
+ lines.append(f" {g['name']}: {g.get('type', 'unknown')} # line {g['line']}")
103
+ if len(globals_list) > 10:
104
+ lines.append(f" ... and {len(globals_list) - 10} more")
105
+
106
+ return '\n'.join(lines)
107
+
108
+
109
+ def get_code_chunk(path: str, name: str) -> str:
110
+ """
111
+ Extract a specific function or class from a file by name.
112
+ Use this when you need to see the implementation of a specific function/class
113
+ after using get_file_outline to identify what you need.
114
+
115
+ Args:
116
+ path: Path to the Python file
117
+ name: Name of the function or class to extract
118
+
119
+ Returns:
120
+ The complete code for the specified function/class with relevant imports
121
+ """
122
+ return _parser.extract_code_chunk(path, name)
123
 
124
 
125
+ # Tool schemas for LLM function calling (compatible with Claude and OpenAI)
126
  TOOLS = [
127
  {
128
  "type": "function",
129
  "function": {
130
  "name": "read_file",
131
+ "description": "WARNING: This reads the ENTIRE file which wastes tokens! PREFER get_file_outline (for structure) or get_code_chunk (for specific function/class) instead. Only use read_file when you absolutely need the complete file contents.",
132
  "parameters": {
133
  "type": "object",
134
  "properties": {
 
317
  "required": ["code"]
318
  }
319
  }
320
+ },
321
+ {
322
+ "type": "function",
323
+ "function": {
324
+ "name": "search_repository",
325
+ "description": "Search the cloned GitHub repository using hybrid retrieval (BM25 + semantic embeddings). Use this to find functions, classes, or code patterns in the cloned repo. More powerful than search_code - finds both exact matches AND semantically related code.",
326
+ "parameters": {
327
+ "type": "object",
328
+ "properties": {
329
+ "query": {
330
+ "type": "string",
331
+ "description": "What to search for. Can be natural language (e.g., 'authentication logic', 'error handling') or specific terms (e.g., 'Flask class', 'route decorator')"
332
+ },
333
+ "top_k": {
334
+ "type": "integer",
335
+ "description": "Number of results to return (default: 5, max: 20)",
336
+ "default": 5
337
+ }
338
+ },
339
+ "required": ["query"]
340
+ }
341
+ }
342
+ },
343
+ {
344
+ "type": "function",
345
+ "function": {
346
+ "name": "get_file_outline",
347
+ "description": "Get the structure/outline of a Python file WITHOUT reading full contents. Returns classes, functions, methods with signatures and docstrings. Use this FIRST to understand a file's structure before using read_file or get_code_chunk. Much more token-efficient than read_file (~50 tokens vs ~2000 tokens for a typical file).",
348
+ "parameters": {
349
+ "type": "object",
350
+ "properties": {
351
+ "path": {
352
+ "type": "string",
353
+ "description": "Path to the Python file to outline"
354
+ }
355
+ },
356
+ "required": ["path"]
357
+ }
358
+ }
359
+ },
360
+ {
361
+ "type": "function",
362
+ "function": {
363
+ "name": "get_code_chunk",
364
+ "description": "Extract a specific function or class from a file by name. Use this after get_file_outline to read just the code you need instead of the entire file. Returns the complete implementation with relevant imports.",
365
+ "parameters": {
366
+ "type": "object",
367
+ "properties": {
368
+ "path": {
369
+ "type": "string",
370
+ "description": "Path to the Python file"
371
+ },
372
+ "name": {
373
+ "type": "string",
374
+ "description": "Name of the function or class to extract (e.g., 'MyClass', 'my_function')"
375
+ }
376
+ },
377
+ "required": ["path", "name"]
378
+ }
379
+ }
380
+ },
381
  ]
382
 
383
 
 
393
  "index_codebase": index_codebase,
394
  "upload_to_sandbox": upload_to_sandbox,
395
  "execute_in_sandbox": execute_in_sandbox,
396
+ "run_command_in_sandbox": run_command_in_sandbox,
397
+ "search_repository": search_repository,
398
+ "get_file_outline": get_file_outline,
399
+ "get_code_chunk": get_code_chunk
400
  }
401
 
402
 
 
405
  Get all available tool schemas
406
 
407
  Returns:
408
+ List of tool schema dictionaries for LLM function calling
409
  """
410
  return TOOLS
411
 
requirements.txt CHANGED
@@ -1,8 +1,9 @@
1
- # Cloud deployment requirements (lightweight - no PyTorch/sentence-transformers)
2
- # These are only the essential packages needed for HuggingFace Spaces
3
 
4
  # Core
5
  openai>=1.0.0
 
6
  python-dotenv>=1.2.0
7
 
8
  # E2B Sandbox
@@ -12,11 +13,16 @@ e2b-code-interpreter>=2.4.0
12
  langchain>=0.3.0
13
  langgraph>=0.2.0
14
 
15
- # Lightweight search (no embeddings in cloud mode)
16
  rank-bm25>=0.2.2
 
 
17
 
18
  # Chainlit UI
19
  chainlit>=1.0.0
20
 
21
  # For dependency graphs
22
  networkx>=3.0
 
 
 
 
1
+ # Full deployment requirements with embeddings support
2
+ # For HuggingFace Spaces with 16GB+ RAM
3
 
4
  # Core
5
  openai>=1.0.0
6
+ anthropic>=0.25.0
7
  python-dotenv>=1.2.0
8
 
9
  # E2B Sandbox
 
13
  langchain>=0.3.0
14
  langgraph>=0.2.0
15
 
16
+ # Search - BM25 + Embeddings
17
  rank-bm25>=0.2.2
18
+ sentence-transformers>=2.2.0
19
+ chromadb>=0.4.0
20
 
21
  # Chainlit UI
22
  chainlit>=1.0.0
23
 
24
  # For dependency graphs
25
  networkx>=3.0
26
+
27
+ # Tree-sitter for AST parsing
28
+ tree-sitter>=0.20.0