Spaces:

ayushm98
/

codepilot

Runtime error

ayushm98 commited on Dec 20, 2025

Commit

45bf590

1 Parent(s): 561b52e

v3.2: Complete multi-agent workflow with Explorer-first architecture

Features:
- ExplorerAgent for token-efficient codebase exploration
- Task classifier (explore vs code queries)
- Claude Sonnet 4.5 integration (200K context)
- Token-efficient tools (get_file_outline, get_code_chunk)
- Optimized workflow: Explorer → Planner → Coder → Reviewer
- Version tracking for debugging deployments

Architecture:
- Explorer does all searching (BM25 + embeddings)
- Planner creates plans (no tools, pure LLM)
- Coder implements (no search, uses exploration context)
- Reviewer validates (read-only tools)

Files changed (21) hide show

.gitignore +36 -0
CLAUDE.md +171 -0
Dockerfile +2 -1
README.md +2 -2
chainlit_app.py +195 -91
codepilot/agents/__init__.py +24 -0
codepilot/agents/base_agent.py +14 -7
codepilot/agents/coder_agent.py +101 -69
codepilot/agents/conversation.py +23 -4
codepilot/agents/explorer_agent.py +168 -0
codepilot/agents/orchestrator.py +224 -29
codepilot/agents/planner_agent.py +87 -116
codepilot/agents/reviewer_agent.py +41 -37
codepilot/context/indexer.py +5 -1
codepilot/llm/claude_client.py +235 -0
codepilot/llm/client.py +1 -1
codepilot/tools/context_tools.py +19 -1
codepilot/tools/file_tools.py +18 -3
codepilot/tools/github_tools.py +211 -0
codepilot/tools/registry.py +172 -17
requirements.txt +9 -3

.gitignore ADDED Viewed

	@@ -0,0 +1,36 @@

+# Environment
+.env
+.env.local
+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+venv/
+env/
+.venv/
+# IDE
+.idea/
+.vscode/
+*.swp
+*.swo
+# OS
+.DS_Store
+Thumbs.db
+# Project specific
+.codepilot_cache/
+.chainlit/
+# Claude Code
+.claude/
+# Test files
+manual_tests/
+# Logs
+*.log

CLAUDE.md ADDED Viewed

	@@ -0,0 +1,171 @@

+# CLAUDE.md
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+## Project Overview
+**CodePilot** - An autonomous AI coding agent that takes GitHub issues, understands codebases, writes code in sandboxed environments, and creates pull requests autonomously.
+**Tech Stack:** Python 3.11+, Claude Sonnet 4.5 (Anthropic API), E2B sandboxed execution, LangChain/LangGraph
+**Development Timeline:** 24-week phased implementation (currently in Phase 5: Chainlit UI - COMPLETE)
+## Architecture
+This project follows a layered architecture (planned - see devon-project-plan.md for full roadmap):
+```
+Multi-Agent System (Planner, Coder, Reviewer)
+    ↓
+Context Engine (Hybrid Retrieval, AST-aware chunking)
+    ↓
+Tool Layer (read_file, write_file, run_command, search_code)
+    ↓
+E2B Sandbox (Isolated code execution)
+```
+## Current Implementation Status
+**✅ COMPLETED PHASES:**
+**Phase 1: Foundation (Weeks 1-3)**
+- ✅ LLM client wrapper (`codepilot/llm/claude_client.py`) - Claude API with tool calling
+- ✅ Tool registry (`codepilot/tools/registry.py`) - Function calling infrastructure
+- ✅ Base agent (`codepilot/agents/base_agent.py`) - Core ReAct loop
+- ✅ Core tools: `read_file`, `write_file`, `run_command`, `search_codebase`, `list_files`
+**Phase 2: Context Engineering (Weeks 4-8)**
+- ✅ BM25 keyword search (`codepilot/context/bm25_search.py`)
+- ✅ Dense embeddings (`codepilot/context/embeddings.py`) - sentence-transformers
+- ✅ Hybrid retrieval (`codepilot/context/retrieval.py`) - Combined BM25 + semantic search
+- ✅ Code parser (`codepilot/context/parser.py`) - AST-aware chunking
+- ✅ Codebase indexer (`codepilot/context/indexer.py`) - Full codebase indexing
+- ✅ Context selector (`codepilot/context/selector.py`) - Smart context selection
+- ✅ Context tools: `index_codebase`, `search_codebase`, `get_relevant_context`
+**Phase 3: Multi-Agent Architecture (Weeks 9-12)**
+- ✅ Planner agent (`codepilot/agents/planner_agent.py`) - Creates implementation plans
+- ✅ Coder agent (`codepilot/agents/coder_agent.py`) - Writes and tests code
+- ✅ Reviewer agent (`codepilot/agents/reviewer_agent.py`) - Code review and approval
+- ✅ Orchestrator (`codepilot/agents/orchestrator.py`) - State machine coordination
+**Phase 4: E2B Sandbox Integration (Weeks 13-14)**
+- ✅ E2B sandbox manager (`codepilot/sandbox/e2b_sandbox.py`) - Isolated execution
+- ✅ Sandbox tools (`codepilot/sandbox/sandbox_tools.py`) - upload, execute, run commands
+- ✅ Integration with Coder agent - Automatic sandbox testing workflow
+**Phase 5: Chainlit UI (Weeks 15-16)**
+- ✅ Chainlit application (`chainlit_app.py`) - Interactive chat interface
+- ✅ Real-time workflow visualization with Chainlit Steps
+- ✅ Detailed agent progress tracking (Planner → Coder → Reviewer)
+- ✅ Code preview and test results display
+- ✅ User guide (`CHAINLIT_GUIDE.md`)
+**NEXT PHASES:**
+**Phase 6: GitHub Integration (Weeks 17-18)** - Not started
+- GitHub webhooks for issue tracking
+- Automated PR creation
+- Branch management
+**Phase 7: Evals & Benchmarks (Weeks 19-21)** - Not started
+- SWE-bench evaluation
+- Custom test suite
+**Phase 8: Production Hardening (Weeks 22-24)** - Not started
+- Error handling and retries
+- Logging and monitoring
+- Deployment configuration
+## Development Commands
+**Setup:**
+```bash
+python -m venv venv
+source venv/bin/activate  # On Windows: venv\Scripts\activate
+pip install -r requirements.txt
+```
+**Verify setup:**
+```bash
+python test_setup.py  # Checks that API keys are loaded correctly
+```
+**Run Chainlit UI (Phase 5):**
+```bash
+chainlit run chainlit_app.py
+# Opens at http://localhost:8000
+# See CHAINLIT_GUIDE.md for full usage guide
+```
+**Test individual phases:**
+```bash
+# Phase 2: Context Engineering
+python test_context.py
+# Phase 3: Multi-Agent Workflow
+python test_multi_agent.py
+# Phase 4: E2B Sandbox
+python test_sandbox.py
+python test_workflow_with_sandbox.py
+```
+**Environment variables required in .env:**
+```
+ANTHROPIC_API_KEY=sk-ant-...
+E2B_API_KEY=e2b_...
+```
+## Project Phases (from devon-project-plan.md)
+1. **Phase 1 (Weeks 1-3):** Foundation - Basic agent loop, tool calling, LLM abstraction
+2. **Phase 2 (Weeks 4-8):** Context Engineering - Hybrid retrieval (BM25 + dense), AST-aware chunking
+3. **Phase 3 (Weeks 9-12):** Multi-Agent Architecture - Orchestrator with specialized agents
+4. **Phase 4 (Weeks 13-14):** E2B Sandbox Integration
+5. **Phase 5 (Weeks 15-16):** Chainlit UI
+6. **Phase 6 (Weeks 17-18):** GitHub Integration (webhooks, PRs)
+7. **Phase 7 (Weeks 19-21):** Evals & Benchmarks (SWE-bench)
+8. **Phase 8 (Weeks 22-24):** Production Hardening
+## Key Design Principles
+**From the project plan:**
+- **Focus on Context Engineering:** This is the differentiator, not UI/UX
+- **ReAct Pattern:** Reason about what to do, Act with tools, observe results, repeat
+- **AST-Aware Processing:** Parse code structurally, not as text (tree-sitter for multi-language support)
+- **Hybrid Retrieval:** Combine BM25 (exact matches) + dense embeddings (semantic search)
+- **Sandboxed Execution:** All code runs in E2B containers, never on host
+- **Multi-Agent Orchestration:** Specialized agents (Planner, Coder, Reviewer) coordinated by orchestrator
+## Tool Schema Format
+Tools follow Claude/Anthropic function calling format:
+```python
+{
+    "type": "function",
+    "function": {
+        "name": "tool_name",
+        "description": "Clear description for LLM to understand when to use",
+        "parameters": {
+            "type": "object",
+            "properties": {...},
+            "required": [...]
+        }
+    }
+}
+```
+## Implementation Notes
+- All tool functions return formatted strings (success messages or errors)
+- `write_file` auto-creates parent directories if needed
+- `run_command` has 30-second timeout to prevent hanging
+- Error handling uses specific exceptions (FileNotFoundError, PermissionError) before generic fallback
+## Important Files
+- `devon-project-plan.md` - Complete 24-week implementation roadmap with architectural details
+- `codepilot/llm/claude_client.py` - Claude API wrapper with tool calling
+- `codepilot/agents/orchestrator.py` - Multi-agent state machine
+- `requirements.txt` - Python dependencies (anthropic, e2b-code-interpreter, langchain, langgraph)
+- `.env` - API keys (not committed, in .gitignore)

Dockerfile CHANGED Viewed

@@ -1,4 +1,5 @@
 # HuggingFace Spaces Dockerfile for CodePilot
 FROM python:3.11-slim
 # Set working directory
@@ -19,7 +20,7 @@ ENV HOME=/home/user \
 WORKDIR $HOME/app
 # Copy requirements first (for better caching)
-COPY --chown=user requirements-cloud.txt ./requirements.txt
 # Install Python dependencies
 RUN pip install --no-cache-dir --upgrade pip && \

 # HuggingFace Spaces Dockerfile for CodePilot
+# BUILD_VERSION: 7 (v3.2 coder no list_files)
 FROM python:3.11-slim
 # Set working directory
 WORKDIR $HOME/app
 # Copy requirements first (for better caching)
+COPY --chown=user requirements.txt ./requirements.txt
 # Install Python dependencies
 RUN pip install --no-cache-dir --upgrade pip && \

README.md CHANGED Viewed

@@ -56,7 +56,7 @@ User Request
 ## Tech Stack
 - **Python** - Core language
-- **OpenAI GPT-4** - LLM for agent reasoning
 - **LangChain/LangGraph** - Agent orchestration
 - **E2B** - Sandboxed code execution
 - **Chainlit** - Chat UI
@@ -65,7 +65,7 @@ User Request
 | Variable | Description |
 |----------|-------------|
-| `OPENAI_API_KEY` | Your OpenAI API key |
 | `E2B_API_KEY` | Your E2B sandbox API key |
 ## License

 ## Tech Stack
 - **Python** - Core language
+- **Claude Sonnet 4.5** - LLM for agent reasoning (Anthropic API)
 - **LangChain/LangGraph** - Agent orchestration
 - **E2B** - Sandboxed code execution
 - **Chainlit** - Chat UI
 | Variable | Description |
 |----------|-------------|
+| `ANTHROPIC_API_KEY` | Your Anthropic API key |
 | `E2B_API_KEY` | Your E2B sandbox API key |
 ## License

chainlit_app.py CHANGED Viewed

@@ -17,16 +17,34 @@ from contextlib import redirect_stdout, redirect_stderr
 import asyncio
 from concurrent.futures import ThreadPoolExecutor
-# Check if running in production BEFORE importing heavy dependencies
-# Detects: Render, HuggingFace Spaces, or any cloud with PORT env var
-IS_PRODUCTION = os.getenv('RENDER_SERVICE_NAME') or os.getenv('RENDER') or os.getenv('SPACE_ID') or os.getenv('PORT')
-# Only import heavy ML dependencies in local development
-if not IS_PRODUCTION:
-    from codepilot.tools.context_tools import index_codebase
-# Import orchestrator (lighter weight)
-from codepilot.agents.orchestrator import Orchestrator
 # Authentication disabled for now - uncomment to enable password protection
@@ -56,54 +74,39 @@ async def start():
     print("[CHAINLIT] on_chat_start triggered")  # Debug log
     await cl.Message(
-        content="# 🤖 CodePilot - Autonomous AI Coding Agent\n\n"
                 "I can help you write code, fix bugs, and implement features!\n\n"
-                "**How it works:**\n"
-                "1. 🤔 **Planner** - Searches codebase and creates implementation plan\n"
-                "2. 💻 **Coder** - Writes code locally, uploads to sandbox, runs tests\n"
-                "3. 👁️ **Reviewer** - Reviews tested code and decides approval\n\n"
-                "**What I can do:**\n"
-                "- Write new functions and features\n"
-                "- Fix bugs and add error handling\n"
-                "- Create tests and verify code works\n"
-                "- Search and understand your codebase\n\n"
-                "**Ready!** What would you like me to build?"
     ).send()
     print("[CHAINLIT] Welcome message sent")  # Debug log
-    # Skip indexing on deployment to avoid startup issues (using module-level constant)
-    if IS_PRODUCTION:
-        print(f"[CHAINLIT] Running in production mode (PORT={os.getenv('PORT')}) - skipping codebase indexing")
-        await cl.Message(content="ℹ️ Running in cloud mode - codebase indexing disabled").send()
-        cl.user_session.set("orchestrator", Orchestrator(max_iterations=3))
-        cl.user_session.set("ready", True)
-        print("[CHAINLIT] Orchestrator created, ready=True")
-        return
-    # Index codebase in background (only in local development)
-    index_msg = await cl.Message(content="🔍 Indexing codebase...").send()
-    try:
-        # Get project root
-        project_root = os.path.dirname(os.path.abspath(__file__))
-        index_result = index_codebase(project_root)
-        # Update message content
-        index_msg.content = f"✅ Codebase indexed!\n```\n{index_result}\n```"
-        await index_msg.update()
-        # Store orchestrator in session (reduced iterations to save API credits)
-        cl.user_session.set("orchestrator", Orchestrator(max_iterations=3))
-        cl.user_session.set("ready", True)
-    except Exception as e:
-        # Update message content
-        index_msg.content = f"⚠️ Indexing failed (will continue anyway):\n```\n{str(e)}\n```"
-        await index_msg.update()
-        # Still create orchestrator even if indexing fails
-        cl.user_session.set("orchestrator", Orchestrator(max_iterations=10))
-        cl.user_session.set("ready", True)
 @cl.on_message
@@ -112,12 +115,112 @@ async def main(message: cl.Message):
     # Check if ready
     if not cl.user_session.get("ready"):
-        await cl.Message(content="⚠️ System is still initializing, please wait...").send()
         return
     # Get orchestrator
     orchestrator: Orchestrator = cl.user_session.get("orchestrator")
     # Create a message for streaming logs
     log_msg = cl.Message(content="")
     await log_msg.send()
@@ -130,10 +233,10 @@ async def main(message: cl.Message):
             """Run orchestrator in thread and capture output."""
             try:
                 with redirect_stdout(captured_output), redirect_stderr(captured_output):
-                    return orchestrator.run(message.content)
             except Exception as e:
                 # Capture any exceptions from orchestrator
-                print(f"❌ Error in orchestrator: {str(e)}")
                 import traceback
                 traceback.print_exc()
                 raise
@@ -165,10 +268,10 @@ async def main(message: cl.Message):
                 filtered_lines = []
                 for line in accumulated_logs.split('\n'):
                     # Extract token usage before filtering (only count each line once!)
-                    if '📊 Tokens:' in line and line not in seen_token_lines:
                         seen_token_lines.add(line)  # Mark as counted
                         try:
-                            # Parse: "📊 Tokens: 505 prompt + 20 completion = 525 total"
                             parts = line.split('Tokens:')[1].strip()
                             prompt = int(parts.split('prompt')[0].strip())
                             completion = int(parts.split('+')[1].split('completion')[0].strip())
@@ -179,24 +282,25 @@ async def main(message: cl.Message):
                             pass
                     # Skip token counts, progress bars, and verbose details
-                    if any(skip in line for skip in ['📊 Tokens:', 'Batches:', '|##', 'it/s]']):
                         continue
                     # Keep important lines
                     if any(keep in line for keep in [
-                        '[ORCHESTRATOR]', '[PLANNER]', '[CODER]', '[REVIEWER]',
-                        'Calling tool:', '✅ Tool', 'Transitioning', 'APPROVED', 'REJECTED'
                     ]):
                         filtered_lines.append(line)
                 filtered_output = '\n'.join(filtered_lines)
-                # Calculate cost (GPT-3.5-turbo pricing: $0.0015/1K input, $0.002/1K output)
-                input_cost = (total_prompt_tokens / 1000) * 0.0015
-                output_cost = (total_completion_tokens / 1000) * 0.002
                 total_cost = input_cost + output_cost
                 # Add usage summary to logs
-                usage_summary = f"\n\n💰 CREDITS USED:\n"
                 usage_summary += f"  Input:  {total_prompt_tokens:,} tokens (${input_cost:.4f})\n"
                 usage_summary += f"  Output: {total_completion_tokens:,} tokens (${output_cost:.4f})\n"
                 usage_summary += f"  Total:  {total_tokens:,} tokens (${total_cost:.4f})"
@@ -212,42 +316,42 @@ async def main(message: cl.Message):
         final_logs = captured_output.getvalue()
         # Update with final logs
-        log_msg.content = f"## 📋 Execution Log\n```\n{final_logs}\n```"
         await log_msg.update()
         # Send results summary
         summary_lines = []
         if result.get('plan'):
-            summary_lines.append("## 🤔 Planner")
-            summary_lines.append(f"✅ Plan created ({len(result['plan'])} chars)\n")
         if result.get('code_changes'):
-            summary_lines.append("## 💻 Coder")
-            summary_lines.append(f"✅ Created {len(result['code_changes'])} file(s):")
             for file_path in result['code_changes'].keys():
                 summary_lines.append(f"  - {file_path}")
             summary_lines.append("")
         if result.get('review_feedback'):
-            summary_lines.append("## 👁️ Reviewer")
             if result.get('success'):
-                summary_lines.append("✅ Code approved")
             else:
-                summary_lines.append("⚠️ Needs revision")
             summary_lines.append("")
-        summary_lines.append("## 🎯 Result")
         if result.get('success'):
-            summary_lines.append(f"✅ **Success** (Iterations: {result.get('iterations', 'N/A')})")
         else:
-            summary_lines.append(f"⚠️ **Incomplete** (Iterations: {result.get('iterations', 'N/A')})")
-        # Add final cost summary
-        summary_lines.append("\n## 💰 API Credits Used (GPT-3.5-Turbo)")
         summary_lines.append(f"**Total Tokens:** {total_tokens:,}")
-        summary_lines.append(f"- Input: {total_prompt_tokens:,} tokens (${(total_prompt_tokens/1000)*0.0015:.4f})")
-        summary_lines.append(f"- Output: {total_completion_tokens:,} tokens (${(total_completion_tokens/1000)*0.002:.4f})")
         summary_lines.append(f"\n**Estimated Cost:** ${total_cost:.4f}")
         await cl.Message(content="\n".join(summary_lines)).send()
@@ -258,9 +362,9 @@ async def main(message: cl.Message):
         error_type = type(e).__name__
         if "rate_limit" in error_message.lower() or "429" in error_message:
-            user_message = f"""## ⏱️ Rate Limit Reached
-OpenAI API rate limit exceeded. This happens when too many requests are made in a short time.
 **What to do:**
 - Wait a few minutes and try again
@@ -272,15 +376,15 @@ OpenAI API rate limit exceeded. This happens when too many requests are made in
 {error_message}
 ```
 """
-        elif "insufficient_quota" in error_message.lower():
-            user_message = f"""## 💳 API Credits Exhausted
-Your OpenAI API credits have been exhausted.
 **What to do:**
-- Add credits to your OpenAI account at https://platform.openai.com/account/billing
-- Check your usage at https://platform.openai.com/usage
-- Current model: GPT-3.5-turbo (~$0.02 per task)
 **Error details:**
 ```
@@ -288,13 +392,13 @@ Your OpenAI API credits have been exhausted.
 ```
 """
         elif "api_key" in error_message.lower() or "authentication" in error_message.lower():
-            user_message = f"""## 🔑 API Key Error
-There's an issue with your OpenAI API key.
 **What to do:**
-- Verify your OPENAI_API_KEY in .env file
-- Check that the key is valid at https://platform.openai.com/api-keys
 - Restart the application after updating .env
 **Error details:**
@@ -303,7 +407,7 @@ There's an issue with your OpenAI API key.
 ```
 """
         elif "timeout" in error_message.lower():
-            user_message = f"""## ⏰ Request Timeout
 The operation took too long and timed out.
@@ -319,7 +423,7 @@ The operation took too long and timed out.
 """
         else:
             # Generic error with helpful context
-            user_message = f"""## ❌ Error Occurred
 An unexpected error occurred during execution.

 import asyncio
 from concurrent.futures import ThreadPoolExecutor
+# ============================================================
+# STARTUP VERSION CHECK - Change this to detect if rebuild worked
+# ============================================================
+APP_VERSION = "3.2.0-coder-no-list"
+BUILD_ID = "2024-12-19-v6"
+print("=" * 60)
+print(f"[STARTUP] CodePilot Chainlit App")
+print(f"[STARTUP] APP_VERSION: {APP_VERSION}")
+print(f"[STARTUP] BUILD_ID: {BUILD_ID}")
+print("=" * 60)
+# ============================================================
+# Import full context tools (embeddings + BM25) - requires 16GB+ RAM
+from codepilot.tools.context_tools import index_codebase
+# Import orchestrator
+from codepilot.agents.orchestrator import Orchestrator, ORCHESTRATOR_VERSION
+# Print orchestrator version for debugging
+print(f"[STARTUP] ORCHESTRATOR_VERSION: {ORCHESTRATOR_VERSION}")
+# Import GitHub tools for repo cloning
+from codepilot.tools.github_tools import (
+    extract_github_url,
+    clone_repository,
+    get_repo_info,
+    cleanup_repository
+)
 # Authentication disabled for now - uncomment to enable password protection
     print("[CHAINLIT] on_chat_start triggered")  # Debug log
     await cl.Message(
+        content=f"# CodePilot - Autonomous AI Coding Agent\n\n"
+                f"**Version:** `{APP_VERSION}` | **Build:** `{BUILD_ID}`\n\n"
                 "I can help you write code, fix bugs, and implement features!\n\n"
+                "**How to use:**\n"
+                "1. Paste a **public GitHub URL** and I'll clone and analyze it\n"
+                "2. Tell me what you want to build or fix\n"
+                "3. Watch my agents (Planner > Coder > Reviewer) work!\n\n"
+                "**Example:**\n"
+                "```\nAnalyze https://github.com/user/repo and add error handling to the API endpoints\n```\n\n"
+                "**Ready!** Paste a GitHub URL or describe your task."
     ).send()
     print("[CHAINLIT] Welcome message sent")  # Debug log
+    # Initialize session variables
+    cl.user_session.set("repo_path", None)
+    cl.user_session.set("repo_info", None)
+    # Skip self-indexing - agents will only work with cloned GitHub repos
+    # Create orchestrator and mark as ready
+    cl.user_session.set("orchestrator", Orchestrator(max_iterations=3))
+    cl.user_session.set("ready", True)
+    print("[CHAINLIT] Orchestrator created, ready for GitHub repos")
+@cl.on_chat_end
+async def end():
+    """Cleanup when chat ends."""
+    # Clean up any cloned repositories
+    repo_path = cl.user_session.get("repo_path")
+    if repo_path:
+        print(f"[CHAINLIT] Cleaning up repo: {repo_path}")
+        cleanup_repository(repo_path)
 @cl.on_message
     # Check if ready
     if not cl.user_session.get("ready"):
+        await cl.Message(content="System is still initializing, please wait...").send()
         return
     # Get orchestrator
     orchestrator: Orchestrator = cl.user_session.get("orchestrator")
+    # Check for GitHub URL in message
+    github_url = extract_github_url(message.content)
+    task_context = ""
+    if github_url:
+        # Clone the repository
+        clone_msg = await cl.Message(content=f"Cloning repository: `{github_url}`...").send()
+        success, result, repo_name = clone_repository(github_url)
+        if success:
+            repo_path = result
+            repo_info = get_repo_info(repo_path)
+            # Store in session
+            cl.user_session.set("repo_path", repo_path)
+            cl.user_session.set("repo_info", repo_info)
+            # Index the repository for search (full BM25 + embeddings)
+            try:
+                index_result = index_codebase(repo_path)
+                print(f"[CHAINLIT] Repository indexed: {index_result}")
+            except Exception as e:
+                print(f"[CHAINLIT] Indexing failed (non-critical): {e}")
+            # Create context for the task (limited to avoid token overflow)
+            languages = ", ".join(repo_info["languages"][:5]) if repo_info["languages"] else "Unknown"
+            # Only include first 20 files to keep context small
+            sample_files = repo_info["files"][:20] if repo_info["files"] else []
+            files_preview = "\n".join(f"  - {f}" for f in sample_files)
+            if len(repo_info["files"]) > 20:
+                files_preview += f"\n  ... and {len(repo_info['files']) - 20} more files"
+            task_context = f"""
+[REPOSITORY CONTEXT]
+Repository: {repo_name}
+Path: {repo_path}
+Total Files: {repo_info['total_files']}
+Languages: {languages}
+Sample Files:
+{files_preview}
+AVAILABLE TOOLS:
+- search_repository: Search this cloned repository using BM25 keyword matching (use this to find functions, classes, or code patterns in the Flask repo)
+- read_file: Read a specific file (use full path: {repo_path}/filename.py)
+- search_code: Grep for exact pattern matches in the repository
+"""
+            # Update clone message
+            clone_msg.content = f"**Repository cloned successfully!**\n\n" \
+                               f"- **Name:** {repo_name}\n" \
+                               f"- **Files:** {repo_info['total_files']}\n" \
+                               f"- **Languages:** {languages}\n" \
+                               f"- **Path:** `{repo_path}`"
+            await clone_msg.update()
+        else:
+            # Clone failed
+            clone_msg.content = f"**Failed to clone repository**\n\n{result}\n\n" \
+                               f"Make sure the repository is public and the URL is correct."
+            await clone_msg.update()
+            return
+    # Check if we have a repo from previous message
+    elif cl.user_session.get("repo_path"):
+        repo_path = cl.user_session.get("repo_path")
+        repo_info = cl.user_session.get("repo_info")
+        if repo_info:
+            languages = ", ".join(repo_info["languages"][:5]) if repo_info["languages"] else "Unknown"
+            task_context = f"""
+[REPOSITORY CONTEXT]
+Repository: {repo_info['name']}
+Path: {repo_path}
+Total Files: {repo_info['total_files']}
+Languages: {languages}
+AVAILABLE TOOLS:
+- search_repository: Search this cloned repository using BM25 keyword matching (use this to find functions, classes, or code patterns in the Flask repo)
+- read_file: Read a specific file (use full path: {repo_path}/filename.py)
+- search_code: Grep for exact pattern matches in the repository
+"""
+    # Prepare the full task with context
+    # Remove the GitHub URL from the message to get just the user's query
+    user_query = message.content
+    print(f"[DEBUG] Original message.content: '{message.content}'")
+    print(f"[DEBUG] GitHub URL found: '{github_url}'")
+    if github_url:
+        # Remove the URL from the message to get the actual task
+        import re
+        user_query = re.sub(r'https?://github\.com/[^\s]+', '', user_query).strip()
+        print(f"[DEBUG] After URL removal: '{user_query}'")
+    full_task = task_context + "\n\n" + user_query if task_context else user_query
+    print(f"[DEBUG] task_context exists: {bool(task_context)}")
+    print(f"[DEBUG] task_context length: {len(task_context) if task_context else 0}")
+    print(f"[DEBUG] Final user_query: '{user_query}'")
+    print(f"[DEBUG] Full task (first 500 chars): '{full_task[:500]}...'")
     # Create a message for streaming logs
     log_msg = cl.Message(content="")
     await log_msg.send()
             """Run orchestrator in thread and capture output."""
             try:
                 with redirect_stdout(captured_output), redirect_stderr(captured_output):
+                    return orchestrator.run(full_task)
             except Exception as e:
                 # Capture any exceptions from orchestrator
+                print(f"Error in orchestrator: {str(e)}")
                 import traceback
                 traceback.print_exc()
                 raise
                 filtered_lines = []
                 for line in accumulated_logs.split('\n'):
                     # Extract token usage before filtering (only count each line once!)
+                    if 'Tokens:' in line and line not in seen_token_lines:
                         seen_token_lines.add(line)  # Mark as counted
                         try:
+                            # Parse: "Tokens: 505 prompt + 20 completion = 525 total"
                             parts = line.split('Tokens:')[1].strip()
                             prompt = int(parts.split('prompt')[0].strip())
                             completion = int(parts.split('+')[1].split('completion')[0].strip())
                             pass
                     # Skip token counts, progress bars, and verbose details
+                    if any(skip in line for skip in ['Tokens:', 'Batches:', '|##', 'it/s]']):
                         continue
                     # Keep important lines
                     if any(keep in line for keep in [
+                        '[CLASSIFIER]', '[ORCHESTRATOR]', '[PLANNER]', '[CODER]', '[REVIEWER]',
+                        '[EXPLORER]', 'Calling tool:', 'Tool', 'Transitioning', 'APPROVED', 'REJECTED',
+                        '[GITHUB]', 'Cloning', 'Repository'
                     ]):
                         filtered_lines.append(line)
                 filtered_output = '\n'.join(filtered_lines)
+                # Calculate cost (Claude Sonnet 4.5 pricing: $3/1M input, $15/1M output)
+                input_cost = (total_prompt_tokens / 1000000) * 3.0
+                output_cost = (total_completion_tokens / 1000000) * 15.0
                 total_cost = input_cost + output_cost
                 # Add usage summary to logs
+                usage_summary = f"\n\nCREDITS USED:\n"
                 usage_summary += f"  Input:  {total_prompt_tokens:,} tokens (${input_cost:.4f})\n"
                 usage_summary += f"  Output: {total_completion_tokens:,} tokens (${output_cost:.4f})\n"
                 usage_summary += f"  Total:  {total_tokens:,} tokens (${total_cost:.4f})"
         final_logs = captured_output.getvalue()
         # Update with final logs
+        log_msg.content = f"## Execution Log\n```\n{final_logs}\n```"
         await log_msg.update()
         # Send results summary
         summary_lines = []
         if result.get('plan'):
+            summary_lines.append("## Planner")
+            summary_lines.append(f"Plan created ({len(result['plan'])} chars)\n")
         if result.get('code_changes'):
+            summary_lines.append("## Coder")
+            summary_lines.append(f"Created {len(result['code_changes'])} file(s):")
             for file_path in result['code_changes'].keys():
                 summary_lines.append(f"  - {file_path}")
             summary_lines.append("")
         if result.get('review_feedback'):
+            summary_lines.append("## Reviewer")
             if result.get('success'):
+                summary_lines.append("Code approved")
             else:
+                summary_lines.append("Needs revision")
             summary_lines.append("")
+        summary_lines.append("## Result")
         if result.get('success'):
+            summary_lines.append(f"**Success** (Iterations: {result.get('iterations', 'N/A')})")
         else:
+            summary_lines.append(f"**Incomplete** (Iterations: {result.get('iterations', 'N/A')})")
+        # Add final cost summary (Claude Sonnet 4.5 pricing: $3/1M input, $15/1M output)
+        summary_lines.append("\n## API Credits Used (Claude Sonnet 4.5)")
         summary_lines.append(f"**Total Tokens:** {total_tokens:,}")
+        summary_lines.append(f"- Input: {total_prompt_tokens:,} tokens (${(total_prompt_tokens/1000000)*3.0:.4f})")
+        summary_lines.append(f"- Output: {total_completion_tokens:,} tokens (${(total_completion_tokens/1000000)*15.0:.4f})")
         summary_lines.append(f"\n**Estimated Cost:** ${total_cost:.4f}")
         await cl.Message(content="\n".join(summary_lines)).send()
         error_type = type(e).__name__
         if "rate_limit" in error_message.lower() or "429" in error_message:
+            user_message = f"""## Rate Limit Reached
+Claude API rate limit exceeded. This happens when too many requests are made in a short time.
 **What to do:**
 - Wait a few minutes and try again
 {error_message}
 ```
 """
+        elif "insufficient_quota" in error_message.lower() or "credit" in error_message.lower():
+            user_message = f"""## API Credits Exhausted
+Your Anthropic API credits have been exhausted.
 **What to do:**
+- Add credits to your Anthropic account at https://console.anthropic.com/settings/billing
+- Check your usage at https://console.anthropic.com/settings/usage
+- Current model: Claude Sonnet 4.5 (~$0.20 per task)
 **Error details:**
 ```
 ```
 """
         elif "api_key" in error_message.lower() or "authentication" in error_message.lower():
+            user_message = f"""## API Key Error
+There's an issue with your Anthropic API key.
 **What to do:**
+- Verify your ANTHROPIC_API_KEY in .env file
+- Check that the key is valid at https://console.anthropic.com/settings/keys
 - Restart the application after updating .env
 **Error details:**
 ```
 """
         elif "timeout" in error_message.lower():
+            user_message = f"""## Request Timeout
 The operation took too long and timed out.
 """
         else:
             # Generic error with helpful context
+            user_message = f"""## Error Occurred
 An unexpected error occurred during execution.

codepilot/agents/__init__.py CHANGED Viewed

	@@ -0,0 +1,24 @@

+"""
+CodePilot Agents Module
+This module contains all agent implementations:
+- ExplorerAgent: Lightweight agent for search/exploration queries
+- PlannerAgent: Creates implementation plans
+- CoderAgent: Implements code based on plans
+- ReviewerAgent: Reviews code for quality
+- Orchestrator: Routes tasks and manages multi-agent workflow
+"""
+from codepilot.agents.explorer_agent import ExplorerAgent
+from codepilot.agents.planner_agent import PlannerAgent
+from codepilot.agents.coder_agent import CoderAgent
+from codepilot.agents.reviewer_agent import ReviewerAgent
+from codepilot.agents.orchestrator import Orchestrator
+__all__ = [
+    "ExplorerAgent",
+    "PlannerAgent",
+    "CoderAgent",
+    "ReviewerAgent",
+    "Orchestrator"
+]

codepilot/agents/base_agent.py CHANGED Viewed

@@ -12,18 +12,22 @@ from codepilot.tools.registry import get_tools, get_tool_function
 class Agent:
     """Main agent that executes tasks using LLM and tools"""
-    def __init__(self, model: str = "gpt-3.5-turbo", max_iterations: int = 10):
         """
         Initialize the agent
         Args:
-            model: OpenAI model to use
             max_iterations: Maximum number of LLM calls to prevent infinite loops
         """
         print("🚀 Initializing Agent...")
-        # Initialize components
-        self.client = OpenAIClient(model=model)
         self.conversation = ConversationManager()
         self.tools = get_tools()
         self.max_iterations = max_iterations
@@ -52,7 +56,7 @@ class Agent:
         for iteration in range(1, self.max_iterations + 1):
             print(f"\n--- Iteration {iteration}/{self.max_iterations} ---")
-            # Call OpenAI with current conversation and tools
             response = self.client.chat(
                 messages=self.conversation.get_messages(),
                 tools=self.tools
@@ -87,7 +91,10 @@ class Agent:
                 for tool_call in tool_calls:
                     self._execute_tool_call(tool_call)
-                # Continue loop - send results back to OpenAI
                 continue
             else:
@@ -106,7 +113,7 @@ class Agent:
         Execute a single tool call
         Args:
-            tool_call: Tool call object from OpenAI response
         """
         tool_id = tool_call.id
         tool_name = tool_call.function.name

 class Agent:
     """Main agent that executes tasks using LLM and tools"""
+    def __init__(self, model: str = "claude-sonnet-4-5-20250929", max_iterations: int = 10):
         """
         Initialize the agent
         Args:
+            model: LLM model to use (default: Claude Sonnet 4.5)
             max_iterations: Maximum number of LLM calls to prevent infinite loops
         """
         print("🚀 Initializing Agent...")
+        # Initialize components - use Claude by default
+        from codepilot.llm.claude_client import ClaudeClient
+        if "claude" in model.lower():
+            self.client = ClaudeClient(model=model)
+        else:
+            self.client = OpenAIClient(model=model)
         self.conversation = ConversationManager()
         self.tools = get_tools()
         self.max_iterations = max_iterations
         for iteration in range(1, self.max_iterations + 1):
             print(f"\n--- Iteration {iteration}/{self.max_iterations} ---")
+            # Call LLM with current conversation and tools
             response = self.client.chat(
                 messages=self.conversation.get_messages(),
                 tools=self.tools
                 for tool_call in tool_calls:
                     self._execute_tool_call(tool_call)
+                # Trim conversation to prevent context overflow (optimized for Claude's 200K context)
+                self.conversation.trim_messages(keep_recent=8)
+                # Continue loop - send results back to LLM
                 continue
             else:
         Execute a single tool call
         Args:
+            tool_call: Tool call object from LLM response
         """
         tool_id = tool_call.id
         tool_name = tool_call.function.name

codepilot/agents/coder_agent.py CHANGED Viewed

@@ -1,105 +1,128 @@
 """
-Coder Agent - Implements code based on plans
 The Coder's job:
 1. Read the plan from Planner
-2. Search/read existing code to understand it
 3. Write code changes to implement the plan
-4. Follow best practices and coding standards
-Tools it has access to:
-- search_codebase (find relevant files)
-- read_file (understand existing code)
-- write_file (implement changes)
-- list_files (explore structure)
 """
 from codepilot.llm.client import OpenAIClient
 from codepilot.tools.registry import get_tools, get_tool_function
 from codepilot.agents.conversation import ConversationManager
-from typing import Dict, Any
 import json
-# Coder's specialized system prompt
 CODER_SYSTEM_PROMPT = """You are an expert software engineer and implementation specialist.
-Your ONLY job is to write code that implements the given plan. You do NOT create plans yourself.
-When given a plan:
-1. Read and understand each step carefully
-2. Search the codebase to find relevant files
-3. Read existing files to understand the current implementation
-4. Write clean, well-structured code that follows the plan
-5. Make incremental changes, one step at a time
-Your code should be:
-- Clean and readable (follow existing code style)
-- Well-tested (add error handling)
-- Documented (add comments for complex logic)
-- Minimal (only change what's necessary)
-IMPORTANT RULES:
-- Follow the plan exactly - don't add extra features
-- Match the existing code style in each file
-- Test your changes mentally before writing
-- If you need clarification on the plan, state what's unclear
-Tools available to you:
-- search_codebase: Find existing code
-- read_file: Understand current implementation
 - write_file: Create or modify files
-- list_files: Explore directory structure
-- upload_to_sandbox: Upload files to isolated testing environment
-- run_command_in_sandbox: Run commands safely in sandbox (e.g., pytest, python test.py)
-- execute_in_sandbox: Execute Python code snippets for quick testing
-IMPORTANT: Always test your code in the sandbox before submitting!
-1. Write the file locally (write_file)
-2. Upload to sandbox (upload_to_sandbox)
-3. Run tests in sandbox (run_command_in_sandbox)
-4. Fix any issues before marking as complete
 """
 class CoderAgent:
     """
-    Coder Agent - Implements code based on plans.
     This agent is specialized for coding. It has:
-    - Custom system prompt (engineer mindset)
-    - Write access tools (can modify files)
-    - Single responsibility (implementation only)
     """
-    def __init__(self, model: str = "gpt-3.5-turbo"):
         """
         Initialize Coder agent.
         Args:
-            model: LLM model to use
         """
-        self.client = OpenAIClient(model=model)
         self.conversation = ConversationManager()
-        # Coder gets read + write tools + sandbox execution (safe testing)
         self.allowed_tools = [
-            "search_codebase",
-            "read_file",
-            "write_file",
-            "list_files",
-            "upload_to_sandbox",
-            "run_command_in_sandbox",
-            "execute_in_sandbox"
         ]
-    def run(self, plan: str, task: str, review_feedback: str = None) -> Dict[str, str]:
         """
         Implement the given plan.
         Args:
             plan: Implementation plan from Planner
             task: Original task description (for context)
             review_feedback: Optional feedback from Reviewer if code was rejected
         Returns:
@@ -111,24 +134,34 @@ class CoderAgent:
         # Add system prompt
         self.conversation.add_message("system", CODER_SYSTEM_PROMPT)
-        # Build user prompt with task, plan, and optionally review feedback
-        user_prompt = f"""Original Task: {task}
-Implementation Plan:
-{plan}"""
         # If this is a rework (Reviewer rejected the code), include feedback
         if review_feedback:
-            user_prompt += f"""
-IMPORTANT - REVIEWER FEEDBACK (CODE WAS REJECTED):
 {review_feedback}
 Please fix the issues mentioned by the Reviewer and resubmit the code."""
         else:
-            user_prompt += """
-Please implement this plan step by step. Write clean, well-structured code that follows the plan."""
         self.conversation.add_message("user", user_prompt)
@@ -142,8 +175,8 @@ Please implement this plan step by step. Write clean, well-structured code that
         # Track which files were modified
         modified_files = {}
-        # Run coding loop (agent reads code, writes changes)
-        max_iterations = 15  # Coder might need more iterations than planner
         for iteration in range(max_iterations):
             # Call LLM
             response = self.client.chat(
@@ -163,7 +196,6 @@ Please implement this plan step by step. Write clean, well-structured code that
             # Check if done
             if finish_reason == "stop":
-                # Agent finished coding
                 print(f"[CODER] Finished implementation")
                 return modified_files

 """
+Coder Agent - Implements code based on plans (v3.0)
 The Coder's job:
 1. Read the plan from Planner
+2. Use exploration context (already searched by Explorer)
 3. Write code changes to implement the plan
+4. Test in sandbox
+v3.0 Changes:
+- Removed search tools (Explorer already searched)
+- Receives exploration_context from orchestrator
+- Focused only on reading/writing/testing
 """
 from codepilot.llm.client import OpenAIClient
+from codepilot.llm.claude_client import ClaudeClient
 from codepilot.tools.registry import get_tools, get_tool_function
 from codepilot.agents.conversation import ConversationManager
+from typing import Dict, Any, Optional
 import json
+# Coder's specialized system prompt (v3.2 - no search, no list_files, uses exploration context)
 CODER_SYSTEM_PROMPT = """You are an expert software engineer and implementation specialist.
+Your ONLY job is to write code that implements the given plan. You do NOT explore or search.
+=== CRITICAL: USE THE PROVIDED CONTEXT ===
+The Explorer agent has ALREADY searched the codebase for you. All file paths and code patterns are in the EXPLORATION RESULTS below.
+DO NOT:
+- Navigate directories (no list_files)
+- Search for files (no searching)
+- Explore the codebase
+DO:
+- Use the exact file paths from exploration results
+- Start writing code immediately
+- Follow the plan step by step
+=== WORKFLOW ===
+1. Read the exploration results - they contain all file paths you need
+2. If modifying existing code: use get_code_chunk to read the specific function
+3. Write your changes with write_file using paths from exploration
+4. Test in sandbox if needed
+=== TOOLS ===
+- get_file_outline: See file structure (use if unsure about a file)
+- get_code_chunk: Read ONE specific function/class
+- read_file: Read entire file (only when rewriting whole file)
 - write_file: Create or modify files
+- upload_to_sandbox: Upload files for testing
+- run_command_in_sandbox: Run tests in sandbox
+- execute_in_sandbox: Execute Python snippets
+=== SANDBOX WORKFLOW ===
+When testing in sandbox:
+1. Upload with RELATIVE path: upload_to_sandbox(path="file.py", content=code)
+2. Run with RELATIVE path: run_command_in_sandbox(command="python file.py")
+3. The sandbox CANNOT access /tmp/codepilot_repos/ - use simple filenames!
+Your code should be:
+- Clean (follow existing code style)
+- Minimal (only change what's necessary)
+- Follow the plan exactly
+START CODING IMMEDIATELY - do not explore!
 """
 class CoderAgent:
     """
+    Coder Agent - Implements code based on plans (v3.0).
     This agent is specialized for coding. It has:
+    - NO search tools (Explorer already searched)
+    - Receives exploration_context
+    - Write access + sandbox execution
     """
+    def __init__(self, model: str = "claude-sonnet-4-5-20250929"):
         """
         Initialize Coder agent.
         Args:
+            model: LLM model to use (default: Claude Sonnet 4.5)
         """
+        # Use Claude client for Claude models, OpenAI client as fallback
+        if "claude" in model.lower():
+            self.client = ClaudeClient(model=model)
+        else:
+            self.client = OpenAIClient(model=model)
         self.conversation = ConversationManager()
+        # v3.2: Removed list_files - Explorer provides all paths needed
+        # Coder only needs: read, write, and sandbox tools
         self.allowed_tools = [
+            "get_file_outline",     # Get file structure without full code
+            "get_code_chunk",       # Extract specific function/class by name
+            "read_file",            # Full file contents (use sparingly)
+            "write_file",           # Create or modify files
+            # "list_files" REMOVED - use exploration context instead
+            "upload_to_sandbox",    # Upload files for testing
+            "run_command_in_sandbox",  # Run tests in sandbox
+            "execute_in_sandbox"    # Execute Python snippets
         ]
+    def run(
+        self,
+        plan: str,
+        task: str,
+        exploration_context: Optional[str] = None,
+        review_feedback: Optional[str] = None
+    ) -> Dict[str, str]:
         """
         Implement the given plan.
+        v3.0: Now receives exploration_context so it doesn't need to search.
         Args:
             plan: Implementation plan from Planner
             task: Original task description (for context)
+            exploration_context: Context gathered by Explorer agent
             review_feedback: Optional feedback from Reviewer if code was rejected
         Returns:
         # Add system prompt
         self.conversation.add_message("system", CODER_SYSTEM_PROMPT)
+        # Build user prompt with exploration context, task, plan
+        user_prompt = f"""=== ORIGINAL TASK ===
+{task}
+"""
+        # Add exploration context if available
+        if exploration_context:
+            user_prompt += f"""=== EXPLORATION RESULTS (from Explorer agent) ===
+{exploration_context}
+"""
+        user_prompt += f"""=== IMPLEMENTATION PLAN (from Planner agent) ===
+{plan}
+"""
         # If this is a rework (Reviewer rejected the code), include feedback
         if review_feedback:
+            user_prompt += f"""=== REVIEWER FEEDBACK (CODE WAS REJECTED) ===
 {review_feedback}
 Please fix the issues mentioned by the Reviewer and resubmit the code."""
         else:
+            user_prompt += """Please implement this plan step by step.
+Use the exploration results to understand the codebase structure.
+Write clean, well-structured code that follows the plan.
+Test your code in the sandbox before finishing."""
         self.conversation.add_message("user", user_prompt)
         # Track which files were modified
         modified_files = {}
+        # Run coding loop
+        max_iterations = 15
         for iteration in range(max_iterations):
             # Call LLM
             response = self.client.chat(
             # Check if done
             if finish_reason == "stop":
                 print(f"[CODER] Finished implementation")
                 return modified_files

codepilot/agents/conversation.py CHANGED Viewed

@@ -1,6 +1,6 @@
 """
 Conversation Manager
-Handles conversation history in OpenAI's message format
 """
 from typing import List, Dict, Any
@@ -58,13 +58,13 @@ class ConversationManager:
         Add an assistant message with tool calls
         Args:
-            tool_calls: List of tool call objects from OpenAI response
         """
         # Extract tool call info for logging
         tool_names = [tc.function.name for tc in tool_calls]
         print(f"🔧 Assistant calling tools: {tool_names}")
-        # OpenAI requires this specific format
         self.messages.append({
             "role": "assistant",
             "content": None,  # No text content when making tool calls
@@ -86,7 +86,7 @@ class ConversationManager:
         Add a tool execution result to the conversation
         Args:
-            tool_call_id: The ID of the tool call (from OpenAI)
             tool_name: Name of the tool that was executed
             result: The result string from the tool
         """
@@ -109,6 +109,25 @@ class ConversationManager:
         """
         return self.messages
     def clear(self):
         """Clear all messages from history"""
         self.messages = []

 """
 Conversation Manager
+Handles conversation history in standard LLM message format
 """
 from typing import List, Dict, Any
         Add an assistant message with tool calls
         Args:
+            tool_calls: List of tool call objects from LLM response
         """
         # Extract tool call info for logging
         tool_names = [tc.function.name for tc in tool_calls]
         print(f"🔧 Assistant calling tools: {tool_names}")
+        # Standard tool call format (converted by client for Claude)
         self.messages.append({
             "role": "assistant",
             "content": None,  # No text content when making tool calls
         Add a tool execution result to the conversation
         Args:
+            tool_call_id: The ID of the tool call (from LLM)
             tool_name: Name of the tool that was executed
             result: The result string from the tool
         """
         """
         return self.messages
+    def trim_messages(self, keep_recent: int = 10):
+        """
+        Trim conversation history to prevent context overflow.
+        Keeps the system message and the most recent N messages.
+        Args:
+            keep_recent: Number of recent messages to keep (default: 10)
+        """
+        if len(self.messages) <= keep_recent + 1:
+            return  # No need to trim
+        # Keep system message (first) + recent messages
+        system_msg = [self.messages[0]] if self.messages and self.messages[0].get("role") == "system" else []
+        recent_msgs = self.messages[-keep_recent:]
+        old_count = len(self.messages)
+        self.messages = system_msg + recent_msgs
+        print(f"✂️  Trimmed conversation: {old_count} → {len(self.messages)} messages")
     def clear(self):
         """Clear all messages from history"""
         self.messages = []

codepilot/agents/explorer_agent.py ADDED Viewed

	@@ -0,0 +1,168 @@

+"""
+Explorer Agent - Lightweight agent for search and exploration queries
+The Explorer's job:
+1. Search the codebase to find relevant code
+2. Answer questions about the codebase
+3. Explain how code works
+This agent is used for queries like:
+- "Find the Flask class"
+- "Where is the login function?"
+- "Explain how routing works"
+It does NOT write code - just explores and explains.
+"""
+from codepilot.llm.client import OpenAIClient
+from codepilot.llm.claude_client import ClaudeClient
+from codepilot.tools.registry import get_tools, get_tool_function
+from codepilot.agents.conversation import ConversationManager
+import json
+# Explorer's specialized system prompt - optimized for token efficiency
+EXPLORER_SYSTEM_PROMPT = """You are a code exploration expert.
+Your job is to search codebases and answer questions about code.
+You do NOT write code or create plans - just find and explain.
+=== TOKEN-EFFICIENT WORKFLOW ===
+1. Use search_code or search_repository to find relevant files
+2. Use get_file_outline to see file structure (~50 tokens, NOT full code)
+3. Use get_code_chunk to read ONLY the specific function/class you need
+4. Provide a clear, concise answer
+NEVER use read_file - it wastes tokens by reading entire files!
+=== TOOLS ===
+- get_file_outline: See file structure WITHOUT code - USE THIS!
+- get_code_chunk: Read ONE specific function/class - USE THIS!
+- search_code: Grep for exact patterns (e.g., "^class Flask")
+- search_repository: Semantic search (BM25 + embeddings)
+- list_files: List directory contents
+=== RESPONSE FORMAT ===
+After finding the answer, respond with:
+1. What you found (file path, line numbers)
+2. Brief explanation of how it works
+3. Key code snippets if relevant
+Be concise. Answer the question directly.
+"""
+class ExplorerAgent:
+    """
+    Explorer Agent - Lightweight agent for search/exploration queries.
+    This agent is specialized for exploration. It has:
+    - Minimal system prompt (token-efficient)
+    - Read-only tools (no write access)
+    - Fewer iterations (max 5)
+    - No read_file (forces use of efficient tools)
+    """
+    def __init__(self, model: str = "claude-sonnet-4-5-20250929"):
+        """
+        Initialize Explorer agent.
+        Args:
+            model: LLM model to use (default: Claude Sonnet 4.5)
+        """
+        # Use Claude client for Claude models, OpenAI client as fallback
+        if "claude" in model.lower():
+            self.client = ClaudeClient(model=model)
+        else:
+            self.client = OpenAIClient(model=model)
+        self.conversation = ConversationManager()
+        # Explorer only gets token-efficient read-only tools
+        # Intentionally excludes read_file to force efficient tool usage
+        self.allowed_tools = [
+            "get_file_outline",     # File structure without code
+            "get_code_chunk",       # Specific function/class only
+            "search_code",          # Grep pattern matching
+            "search_repository",    # Semantic search
+            "list_files"            # Directory listing
+        ]
+    def run(self, query: str) -> str:
+        """
+        Explore the codebase to answer a query.
+        Args:
+            query: User's question (e.g., "Find the Flask class")
+        Returns:
+            Answer as a string
+        """
+        # Reset conversation
+        self.conversation = ConversationManager()
+        # Add system prompt
+        self.conversation.add_message("system", EXPLORER_SYSTEM_PROMPT)
+        # Add user query
+        self.conversation.add_message("user", query)
+        # Get only the tools this agent is allowed to use
+        all_tools = get_tools()
+        explorer_tools = [
+            tool for tool in all_tools
+            if tool['function']['name'] in self.allowed_tools
+        ]
+        # Run exploration loop (fewer iterations than other agents)
+        max_iterations = 5
+        for iteration in range(max_iterations):
+            # Call LLM
+            response = self.client.chat(
+                messages=self.conversation.get_messages(),
+                tools=explorer_tools
+            )
+            finish_reason = response.choices[0].finish_reason
+            message = response.choices[0].message
+            # Add assistant response to conversation
+            self.conversation.add_message(
+                role="assistant",
+                content=message.content,
+                tool_calls=message.tool_calls
+            )
+            # Check if done
+            if finish_reason == "stop":
+                # Agent finished exploring
+                return message.content
+            # Execute tool calls
+            if finish_reason == "tool_calls":
+                for tool_call in message.tool_calls:
+                    tool_name = tool_call.function.name
+                    tool_args = json.loads(tool_call.function.arguments)
+                    print(f"[EXPLORER] Calling tool: {tool_name}({tool_args})")
+                    # Execute tool
+                    tool_func = get_tool_function(tool_name)
+                    if tool_func:
+                        result = tool_func(**tool_args)
+                    else:
+                        result = f"Error: Tool {tool_name} not found"
+                    # Add tool result to conversation
+                    self.conversation.add_tool_result(
+                        tool_call_id=tool_call.id,
+                        tool_name=tool_name,
+                        result=str(result)
+                    )
+        # If we hit max iterations, return what we have
+        return "I found some information but couldn't complete the search. Please try a more specific query."
+    def get_tool_access(self) -> list:
+        """Return list of tools this agent can access."""
+        return self.allowed_tools

codepilot/agents/orchestrator.py CHANGED Viewed

@@ -8,16 +8,22 @@ The orchestrator is the "brain" that:
 4. Handles the overall task flow
 """
 from enum import Enum
 from typing import Dict, Any, Optional
 from dataclasses import dataclass
 from codepilot.agents.planner_agent import PlannerAgent
 from codepilot.agents.coder_agent import CoderAgent
 from codepilot.agents.reviewer_agent import ReviewerAgent
 class AgentState(Enum):
     """Possible states in the multi-agent workflow"""
     PLANNING = "planning"
     CODING = "coding"
     REVIEWING = "reviewing"
@@ -33,7 +39,8 @@ class TaskContext:
     Think of this as a clipboard that agents write to and read from.
     """
     task_description: str  # Original task from user
-    plan: Optional[str] = None  # Created by Planner
     code_changes: Optional[Dict[str, str]] = None  # Created by Coder
     review_feedback: Optional[str] = None  # Created by Reviewer
     error_message: Optional[str] = None  # Set if something fails
@@ -48,14 +55,16 @@ class Orchestrator:
     """
     Orchestrator manages the multi-agent workflow.
-    Flow:
-    1. Start in PLANNING state
-    2. Call Planner agent → get plan
-    3. Transition to CODING state
-    4. Call Coder agent → get code
-    5. Transition to REVIEWING state
-    6. Call Reviewer agent → get feedback
-    7. If approved → COMPLETE
        If rejected → back to CODING (loop)
     """
@@ -71,24 +80,178 @@ class Orchestrator:
         self.max_iterations = max_iterations
         self.context = None
-        # Create agent instances
-        self.planner = PlannerAgent()
-        self.coder = CoderAgent()
-        self.reviewer = ReviewerAgent()
     def run(self, task: str) -> Dict[str, Any]:
         """
         Run the multi-agent workflow for a task.
         Args:
             task: User's task description (e.g., "Add a login feature")
         Returns:
             Result dict with status, changes, and messages
         """
         # Initialize context
         self.context = TaskContext(task_description=task)
-        self.state = AgentState.PLANNING
         # Main state machine loop
         while self.state not in [AgentState.COMPLETE, AgentState.FAILED]:
@@ -99,7 +262,10 @@ class Orchestrator:
                 break
             # Execute current state
-            if self.state == AgentState.PLANNING:
                 self._execute_planning()
             elif self.state == AgentState.CODING:
@@ -113,22 +279,49 @@ class Orchestrator:
         # Return final result
         return self._build_result()
     def _execute_planning(self):
         """
         Execute planning state: call Planner agent.
-        Planner's job:
-        - Understand the task
-        - Search codebase for relevant files
-        - Create step-by-step plan
         Transition: Always go to CODING next
         """
         print(f"\n[ORCHESTRATOR] State: PLANNING")
-        print(f"[ORCHESTRATOR] Task: {self.context.task_description}")
-        # Call the real Planner agent!
-        self.context.plan = self.planner.run(self.context.task_description)
         # Transition to coding
         self.state = AgentState.CODING
@@ -138,10 +331,11 @@ class Orchestrator:
         """
         Execute coding state: call Coder agent.
-        Coder's job:
-        - Read the plan
-        - Read relevant files
-        - Write code changes
         Transition: Always go to REVIEWING next
         """
@@ -149,14 +343,15 @@ class Orchestrator:
         # Check if this is a rework (Reviewer rejected previous code)
         if self.context.review_feedback:
-            print(f"[ORCHESTRATOR] Passing plan + REVIEWER FEEDBACK to Coder agent...")
         else:
-            print(f"[ORCHESTRATOR] Passing plan to Coder agent...")
-        # Call the real Coder agent (with review feedback if available)!
         self.context.code_changes = self.coder.run(
             plan=self.context.plan,
             task=self.context.task_description,
             review_feedback=self.context.review_feedback
         )

 4. Handles the overall task flow
 """
+# VERSION CHECK - If you see this, new code is running!
+ORCHESTRATOR_VERSION = "3.2.0-coder-no-list"
+print(f"[ORCHESTRATOR] ========== LOADING VERSION {ORCHESTRATOR_VERSION} ==========")
 from enum import Enum
 from typing import Dict, Any, Optional
 from dataclasses import dataclass
 from codepilot.agents.planner_agent import PlannerAgent
 from codepilot.agents.coder_agent import CoderAgent
 from codepilot.agents.reviewer_agent import ReviewerAgent
+from codepilot.agents.explorer_agent import ExplorerAgent
 class AgentState(Enum):
     """Possible states in the multi-agent workflow"""
+    EXPLORING = "exploring"  # NEW - Explorer gathers context first
     PLANNING = "planning"
     CODING = "coding"
     REVIEWING = "reviewing"
     Think of this as a clipboard that agents write to and read from.
     """
     task_description: str  # Original task from user
+    exploration_context: Optional[str] = None  # NEW - Created by Explorer
+    plan: Optional[str] = None  # Created by Planner (uses exploration_context)
     code_changes: Optional[Dict[str, str]] = None  # Created by Coder
     review_feedback: Optional[str] = None  # Created by Reviewer
     error_message: Optional[str] = None  # Set if something fails
     """
     Orchestrator manages the multi-agent workflow.
+    Flow (v3.0 - Explorer First):
+    1. Start in EXPLORING state
+    2. Call Explorer agent → gather codebase context (token-efficient)
+    3. Transition to PLANNING state
+    4. Call Planner agent (no tools, pure LLM) → get plan based on exploration
+    5. Transition to CODING state
+    6. Call Coder agent → get code
+    7. Transition to REVIEWING state
+    8. Call Reviewer agent → get feedback
+    9. If approved → COMPLETE
        If rejected → back to CODING (loop)
     """
         self.max_iterations = max_iterations
         self.context = None
+        # Create agent instances (using Claude Sonnet 4.5 - LATEST best coding model, 200K context)
+        self.explorer = ExplorerAgent(model="claude-sonnet-4-5-20250929")  # Lightweight for exploration
+        self.planner = PlannerAgent(model="claude-sonnet-4-5-20250929")
+        self.coder = CoderAgent(model="claude-sonnet-4-5-20250929")
+        self.reviewer = ReviewerAgent(model="claude-sonnet-4-5-20250929")
+    def classify_task(self, task: str) -> str:
+        """
+        Classify task as 'explore' or 'code'.
+        Exploration tasks: find, search, explain, what is, where is
+        Code tasks: add, create, implement, fix, modify
+        Args:
+            task: User's task description (may include context prefix)
+        Returns:
+            'explore' or 'code'
+        """
+        print(f"[CLASSIFIER] ########## CLASSIFIER v2.0 START ##########")
+        print(f"[CLASSIFIER] Raw task length: {len(task)} chars")
+        print(f"[CLASSIFIER] Has [REPOSITORY CONTEXT]: {'[REPOSITORY CONTEXT]' in task}")
+        # Extract just the user's query (after any context sections)
+        task_to_check = task
+        # If task has repository context, extract just the user query
+        if "[REPOSITORY CONTEXT]" in task:
+            print(f"[CLASSIFIER] Extracting user query from context...")
+            # Split by double newline and take the last non-empty part
+            parts = task.split("\n\n")
+            print(f"[CLASSIFIER] Found {len(parts)} parts after splitting")
+            # Get the last substantial part (user's actual query)
+            for i, part in enumerate(reversed(parts)):
+                part = part.strip()
+                print(f"[CLASSIFIER] Checking part {i}: '{part[:50]}...' (len={len(part)})")
+                if part and not part.startswith("[") and not part.startswith("AVAILABLE"):
+                    task_to_check = part
+                    print(f"[CLASSIFIER] Selected user query: '{part[:80]}...'")
+                    break
+        else:
+            print(f"[CLASSIFIER] No context prefix, using raw task")
+        task_lower = task_to_check.lower().strip()
+        # Get just the first few words to determine intent
+        first_words = task_lower.split()[:5]
+        first_part = ' '.join(first_words)
+        print(f"[CLASSIFIER] Final query: '{task_to_check[:100]}'")
+        print(f"[CLASSIFIER] First 5 words: '{first_part}'")
+        # EXPLORE patterns - check these FIRST (questions about code)
+        # These indicate the user wants to understand/find something, not change it
+        explore_starters = [
+            "find", "search", "where", "what", "how", "why",
+            "explain", "show", "describe", "look", "locate",
+            "understand", "tell", "list", "which", "does", "is there",
+            "can you find", "can you show", "can you explain",
+            "i want to know", "i want to understand", "i want to find",
+            "help me find", "help me understand"
+        ]
+        # Check if query STARTS with an explore pattern
+        for pattern in explore_starters:
+            if task_lower.startswith(pattern) or first_part.startswith(pattern):
+                print(f"[CLASSIFIER] >>>>>> RESULT: EXPLORE (starts with '{pattern}') <<<<<<")
+                return "explore"
+        # Also check for question words anywhere in short queries
+        if len(task_lower) < 150:  # Short queries are usually questions
+            question_indicators = ["where is", "what is", "how does", "how do", "how is",
+                                   "what does", "which file", "which function", "which class",
+                                   "is there", "are there", "can you find", "can you show"]
+            for indicator in question_indicators:
+                if indicator in task_lower:
+                    print(f"[CLASSIFIER] >>>>>> RESULT: EXPLORE (contains '{indicator}') <<<<<<")
+                    return "explore"
+        # CODE patterns - these indicate the user wants to modify/create something
+        # Use word boundaries to avoid false matches (e.g., "implemented" shouldn't match "implement")
+        code_starters = [
+            "add", "create", "implement", "fix", "modify", "change",
+            "update", "refactor", "write", "build", "delete", "remove",
+            "make", "develop", "insert", "append", "edit", "replace"
+        ]
+        # Check if query STARTS with a code action word
+        for pattern in code_starters:
+            if task_lower.startswith(pattern + " ") or task_lower.startswith(pattern + "\n"):
+                print(f"[CLASSIFIER] >>>>>> RESULT: CODE (starts with '{pattern}') <<<<<<")
+                return "code"
+        # Check for action phrases that indicate coding intent
+        code_phrases = [
+            "i want to add", "i want to create", "i want to implement",
+            "i want to fix", "i want to modify", "i want to change",
+            "i need to add", "i need to create", "i need to implement",
+            "please add", "please create", "please implement", "please fix",
+            "can you add", "can you create", "can you implement", "can you fix"
+        ]
+        for phrase in code_phrases:
+            if phrase in task_lower:
+                print(f"[CLASSIFIER] >>>>>> RESULT: CODE (contains '{phrase}') <<<<<<")
+                return "code"
+        # Default: short queries without action words are likely exploration
+        if len(task_lower) < 100:
+            print(f"[CLASSIFIER] >>>>>> RESULT: EXPLORE (short query default) <<<<<<")
+            return "explore"
+        # Longer queries default to code (probably detailed requirements)
+        print(f"[CLASSIFIER] >>>>>> RESULT: CODE (long query default) <<<<<<")
+        return "code"
     def run(self, task: str) -> Dict[str, Any]:
         """
         Run the multi-agent workflow for a task.
+        First classifies the task:
+        - 'explore' → Uses lightweight ExplorerAgent only
+        - 'code' → Uses full Planner → Coder → Reviewer pipeline
         Args:
             task: User's task description (e.g., "Add a login feature")
+        Returns:
+            Result dict with status, changes, and messages
+        """
+        # Classify the task first
+        task_type = self.classify_task(task)
+        if task_type == "explore":
+            # Use lightweight Explorer agent for search/explain queries
+            print(f"\n[ORCHESTRATOR] Task type: EXPLORE (using Explorer agent)")
+            print(f"[ORCHESTRATOR] Query: {task}")
+            answer = self.explorer.run(task)
+            return {
+                'status': 'complete',
+                'success': True,
+                'task': task,
+                'plan': answer,  # Explorer's answer goes in plan field
+                'code_changes': None,
+                'review_feedback': None,
+                'error': None,
+                'iterations': 1
+            }
+        # Full workflow for code tasks: Planner → Coder → Reviewer
+        print(f"\n[ORCHESTRATOR] Task type: CODE (using full workflow)")
+        return self._run_full_workflow(task)
+    def _run_full_workflow(self, task: str) -> Dict[str, Any]:
+        """
+        Run the full Explorer → Planner → Coder → Reviewer workflow.
+        v3.0: Now starts with Explorer to gather context efficiently,
+        then Planner creates plan based on exploration (no tools).
+        Args:
+            task: User's task description
         Returns:
             Result dict with status, changes, and messages
         """
         # Initialize context
         self.context = TaskContext(task_description=task)
+        self.state = AgentState.EXPLORING  # v3.0: Start with EXPLORING
         # Main state machine loop
         while self.state not in [AgentState.COMPLETE, AgentState.FAILED]:
                 break
             # Execute current state
+            if self.state == AgentState.EXPLORING:
+                self._execute_exploring()  # NEW - Explorer first
+            elif self.state == AgentState.PLANNING:
                 self._execute_planning()
             elif self.state == AgentState.CODING:
         # Return final result
         return self._build_result()
+    def _execute_exploring(self):
+        """
+        Execute exploring state: call Explorer agent to gather context.
+        Explorer's job (v3.0):
+        - Search codebase efficiently using token-optimized tools
+        - Find relevant files, functions, and patterns
+        - Return context summary for Planner to use
+        Transition: Always go to PLANNING next
+        """
+        print(f"\n[ORCHESTRATOR] State: EXPLORING")
+        print(f"[ORCHESTRATOR] Running Explorer to gather codebase context...")
+        # Run Explorer to gather context (uses token-efficient tools)
+        exploration_result = self.explorer.run(self.context.task_description)
+        # Store exploration context for Planner to use
+        self.context.exploration_context = exploration_result
+        # Transition to planning
+        self.state = AgentState.PLANNING
+        print(f"[ORCHESTRATOR] Exploration complete. Transitioning to PLANNING")
     def _execute_planning(self):
         """
         Execute planning state: call Planner agent.
+        Planner's job (v3.0):
+        - Receive exploration context from Explorer
+        - Create step-by-step plan based on exploration (NO TOOLS)
+        - Pure LLM reasoning - no searching
         Transition: Always go to CODING next
         """
         print(f"\n[ORCHESTRATOR] State: PLANNING")
+        print(f"[ORCHESTRATOR] Using exploration context to create plan (no tools)...")
+        # Call the Planner with exploration context (v3.0: Planner has no tools)
+        self.context.plan = self.planner.run(
+            task=self.context.task_description,
+            exploration_context=self.context.exploration_context
+        )
         # Transition to coding
         self.state = AgentState.CODING
         """
         Execute coding state: call Coder agent.
+        Coder's job (v3.0):
+        - Receive exploration context and plan
+        - Read/write files to implement the plan
+        - Test in sandbox
+        - NO searching (Explorer already did that)
         Transition: Always go to REVIEWING next
         """
         # Check if this is a rework (Reviewer rejected previous code)
         if self.context.review_feedback:
+            print(f"[ORCHESTRATOR] Passing exploration + plan + REVIEWER FEEDBACK to Coder...")
         else:
+            print(f"[ORCHESTRATOR] Passing exploration context + plan to Coder (no search needed)...")
+        # Call the Coder with exploration context (v3.0: Coder doesn't search)
         self.context.code_changes = self.coder.run(
             plan=self.context.plan,
             task=self.context.task_description,
+            exploration_context=self.context.exploration_context,
             review_feedback=self.context.review_feedback
         )

codepilot/agents/planner_agent.py CHANGED Viewed

@@ -1,157 +1,128 @@
 """
-Planner Agent - Creates implementation plans
 The Planner's job:
-1. Understand the task
-2. Search the codebase to see what exists
-3. Create a detailed, step-by-step plan
-Tools it has access to:
-- search_codebase (hybrid retrieval)
-- read_file (to understand existing code)
-- list_files (to explore structure)
 """
 from codepilot.llm.client import OpenAIClient
-from codepilot.tools.registry import get_tools, get_tool_function
 from codepilot.agents.conversation import ConversationManager
-from typing import Dict, Any
-import json
-# Planner's specialized system prompt
 PLANNER_SYSTEM_PROMPT = """You are a senior software architect and planning expert.
-Your ONLY job is to create detailed implementation plans. You do NOT write code.
-When given a task:
-1. First, search the codebase to understand what already exists
-2. Identify which files need to be modified or created
-3. Break down the task into clear, specific steps
-4. Consider dependencies and potential risks
-Your plan should be:
-- Specific (mention exact file names, function names)
-- Ordered (steps build on each other)
-- Complete (covers all aspects of the task)
-- Realistic (considers existing code structure)
-Output your plan as a numbered list of steps.
-Tools available to you:
-- search_codebase: Search for existing code (use this first!)
-- read_file: Read specific files to understand them
-- list_files: Explore directory structure
-You do NOT have write_file or run_command - you only plan, never execute.
 """
 class PlannerAgent:
     """
-    Planner Agent - Creates implementation plans.
     This agent is specialized for planning. It has:
-    - Custom system prompt (architect mindset)
-    - Limited tools (read-only)
-    - Single responsibility (planning only)
     """
-    def __init__(self, model: str = "gpt-3.5-turbo"):
         """
         Initialize Planner agent.
         Args:
-            model: LLM model to use
         """
-        self.client = OpenAIClient(model=model)
-        self.conversation = ConversationManager()
-        # Planner only gets read-only tools
-        self.allowed_tools = [
-            "search_codebase",
-            "read_file",
-            "list_files"
-        ]
-    def run(self, task: str) -> str:
         """
-        Create a plan for the given task.
         Args:
             task: Task description (e.g., "Add login feature")
         Returns:
             Detailed implementation plan as a string
         """
-        # Reset conversation
-        self.conversation = ConversationManager()
-        # Add system prompt
-        self.conversation.add_message("system", PLANNER_SYSTEM_PROMPT)
-        # Add user task
-        user_prompt = f"""Task: {task}
-Please create a detailed implementation plan. Start by searching the codebase to understand what exists."""
-        self.conversation.add_message("user", user_prompt)
-        # Get only the tools this agent is allowed to use
-        all_tools = get_tools()
-        planner_tools = [
-            tool for tool in all_tools
-            if tool['function']['name'] in self.allowed_tools
-        ]
-        # Run planning loop (agent explores codebase, then creates plan)
-        max_iterations = 10
-        for iteration in range(max_iterations):
-            # Call LLM
-            response = self.client.chat(
-                messages=self.conversation.get_messages(),
-                tools=planner_tools
-            )
-            finish_reason = response.choices[0].finish_reason
-            message = response.choices[0].message
-            # Add assistant response to conversation
-            self.conversation.add_message(
-                role="assistant",
-                content=message.content,
-                tool_calls=message.tool_calls
-            )
-            # Check if done
-            if finish_reason == "stop":
-                # Agent finished planning
-                return message.content
-            # Execute tool calls
-            if finish_reason == "tool_calls":
-                for tool_call in message.tool_calls:
-                    tool_name = tool_call.function.name
-                    tool_args = json.loads(tool_call.function.arguments)
-                    print(f"[PLANNER] Calling tool: {tool_name}({tool_args})")
-                    # Execute tool
-                    tool_func = get_tool_function(tool_name)
-                    if tool_func:
-                        result = tool_func(**tool_args)
-                    else:
-                        result = f"Error: Tool {tool_name} not found"
-                    # Add tool result to conversation
-                    self.conversation.add_tool_result(
-                        tool_call_id=tool_call.id,
-                        tool_name=tool_name,
-                        result=str(result)
-                    )
-        # If we hit max iterations, return what we have
-        return "Error: Planner exceeded max iterations"
     def get_tool_access(self) -> list:
         """Return list of tools this agent can access."""
-        return self.allowed_tools

 """
+Planner Agent - Creates implementation plans (v3.0 - Pure LLM, No Tools)
 The Planner's job:
+1. Receive exploration context from Explorer agent
+2. Create a detailed, step-by-step implementation plan
+3. NO searching - Explorer already did that
+v3.0 Changes:
+- Removed all tools (pure LLM reasoning)
+- Receives exploration_context from Explorer
+- Single LLM call instead of tool loop
+- ~90% token reduction vs v2.0
 """
 from codepilot.llm.client import OpenAIClient
+from codepilot.llm.claude_client import ClaudeClient
 from codepilot.agents.conversation import ConversationManager
+from typing import Optional
+# Planner's system prompt (v3.0 - no tools, just planning)
 PLANNER_SYSTEM_PROMPT = """You are a senior software architect and planning expert.
+Your ONLY job is to create detailed implementation plans based on the exploration context provided.
+You do NOT have any tools. The Explorer agent has already searched the codebase for you.
+Use the EXPLORATION RESULTS to understand the codebase structure and create your plan.
+=== YOUR PLAN SHOULD INCLUDE ===
+1. OVERVIEW: Brief summary of what needs to be done
+2. FILES TO MODIFY: List each file with specific changes needed
+3. IMPLEMENTATION STEPS: Ordered steps with exact details:
+   - File path
+   - Function/class to modify or create
+   - What code to add/change
+   - Line numbers if provided in exploration
+4. TESTING: How to verify the changes work
+=== PLAN QUALITY REQUIREMENTS ===
+- Be SPECIFIC: Include exact file names, function names, line numbers
+- Be ORDERED: Steps should build on each other logically
+- Be COMPLETE: Cover all aspects of the task
+- Be CONCISE: Don't repeat information from exploration
+You do NOT write code - just create the plan for the Coder agent to follow.
 """
 class PlannerAgent:
     """
+    Planner Agent - Creates implementation plans (v3.0).
     This agent is specialized for planning. It has:
+    - NO tools (pure LLM reasoning)
+    - Receives exploration context from Explorer
+    - Single LLM call (no iteration loop)
+    - Maximum token efficiency
     """
+    def __init__(self, model: str = "claude-sonnet-4-5-20250929"):
         """
         Initialize Planner agent.
         Args:
+            model: LLM model to use (default: Claude Sonnet 4.5)
         """
+        # Use Claude client for Claude models, OpenAI client as fallback
+        if "claude" in model.lower():
+            self.client = ClaudeClient(model=model)
+        else:
+            self.client = OpenAIClient(model=model)
+    def run(self, task: str, exploration_context: Optional[str] = None) -> str:
         """
+        Create a plan for the given task using exploration context.
+        v3.0: No tools - pure LLM reasoning based on Explorer's findings.
         Args:
             task: Task description (e.g., "Add login feature")
+            exploration_context: Context gathered by Explorer agent
         Returns:
             Detailed implementation plan as a string
         """
+        print(f"[PLANNER] Creating plan based on exploration context (no tools)")
+        # Build the prompt with exploration context
+        if exploration_context:
+            user_prompt = f"""=== EXPLORATION RESULTS ===
+{exploration_context}
+=== TASK ===
+{task}
+Based on the exploration results above, create a detailed implementation plan.
+Include specific file paths, function names, and step-by-step instructions for the Coder agent.
+"""
+        else:
+            # Fallback if no exploration context (shouldn't happen in v3.0)
+            user_prompt = f"""=== TASK ===
+{task}
+Create a detailed implementation plan for this task.
+Note: No exploration context was provided, so make reasonable assumptions about the codebase structure.
+"""
+        # Create conversation with system prompt and user message
+        conversation = ConversationManager()
+        conversation.add_message("system", PLANNER_SYSTEM_PROMPT)
+        conversation.add_message("user", user_prompt)
+        # Single LLM call - no tools, no iteration loop
+        response = self.client.chat(
+            messages=conversation.get_messages(),
+            tools=None,  # NO TOOLS - pure reasoning
+            max_tokens=2000  # Enough for a detailed plan
+        )
+        plan = response.choices[0].message.content
+        print(f"[PLANNER] Plan created successfully")
+        return plan
     def get_tool_access(self) -> list:
         """Return list of tools this agent can access."""
+        return []  # v3.0: No tools

codepilot/agents/reviewer_agent.py CHANGED Viewed

@@ -13,6 +13,7 @@ Tools it has access to:
 """
 from codepilot.llm.client import OpenAIClient
 from codepilot.tools.registry import get_tools, get_tool_function
 from codepilot.agents.conversation import ConversationManager
 from typing import Dict, Any, Tuple
@@ -24,41 +25,37 @@ REVIEWER_SYSTEM_PROMPT = """You are a senior code reviewer and quality assurance
 Your ONLY job is to review code changes and provide feedback. You do NOT write code yourself.
 When given code changes:
-1. Read each changed file carefully
-2. Check for common issues:
-   - Bugs and logic errors
-   - Security vulnerabilities (SQL injection, XSS, etc.)
-   - Missing error handling
-   - Poor naming or unclear code
-   - Code that doesn't match the plan
-3. Decide: APPROVE or REJECT
-4. If rejecting, provide specific, actionable feedback
-Your review should be:
-- Thorough (check all aspects of the code)
-- Specific (point to exact issues with line numbers if possible)
-- Constructive (explain WHY something is wrong and HOW to fix it)
-- Fair (don't reject for minor style issues)
 DECISION CRITERIA:
-✅ APPROVE if:
-- Code works correctly
-- No security issues
-- Follows the plan
-- Has basic error handling
-- Is reasonably readable
-❌ REJECT if:
-- Code has bugs
-- Security vulnerabilities exist
-- Doesn't implement the plan
-- Missing critical error handling
-- Code is unclear or confusing
-Tools available to you:
-- read_file: Read files to understand full context
-- search_codebase: Check for similar patterns in the codebase
 You do NOT have write_file - you only review, never modify code.
 """
@@ -74,20 +71,27 @@ class ReviewerAgent:
     - Single responsibility (review only)
     """
-    def __init__(self, model: str = "gpt-3.5-turbo"):
         """
         Initialize Reviewer agent.
         Args:
-            model: LLM model to use
         """
-        self.client = OpenAIClient(model=model)
         self.conversation = ConversationManager()
         # Reviewer only gets read-only tools
         self.allowed_tools = [
-            "read_file",
-            "search_codebase"
         ]
     def run(self, code_changes: Dict[str, str], plan: str, task: str) -> Tuple[bool, str]:

 """
 from codepilot.llm.client import OpenAIClient
+from codepilot.llm.claude_client import ClaudeClient
 from codepilot.tools.registry import get_tools, get_tool_function
 from codepilot.agents.conversation import ConversationManager
 from typing import Dict, Any, Tuple
 Your ONLY job is to review code changes and provide feedback. You do NOT write code yourself.
+=== CRITICAL: TOKEN-EFFICIENT FILE READING ===
+1. NEVER use read_file as your first choice!
+2. ALWAYS use get_file_outline FIRST to see file structure (~50 tokens vs ~2000 tokens)
+3. THEN use get_code_chunk to read ONLY the specific function/class you need to review
+4. ONLY use read_file if you absolutely need the ENTIRE file (rare!)
+CORRECT workflow:
+  get_file_outline("file.py")           → See structure
+  get_code_chunk("file.py", "my_func")  → Review just that function
+WRONG workflow:
+  read_file("file.py")                  → Wastes 2000+ tokens!
+=== REVIEW WORKFLOW ===
 When given code changes:
+1. The code changes are already provided in the prompt - review those first
+2. If you need more context, use get_file_outline then get_code_chunk
+3. Check for: bugs, security issues, missing error handling, plan compliance
+4. Decide: APPROVE or REJECT
 DECISION CRITERIA:
+✅ APPROVE if: Code works, no security issues, follows plan, has error handling
+❌ REJECT if: Has bugs, security issues, doesn't follow plan, unclear code
+=== TOOLS ===
+- get_file_outline: Get file structure WITHOUT code - USE THIS FIRST!
+- get_code_chunk: Extract ONE specific function/class - USE THIS SECOND!
+- read_file: Read ENTIRE file - AVOID THIS!
+- search_repository: Find similar patterns
+End your review with: "DECISION: APPROVE" or "DECISION: REJECT"
 You do NOT have write_file - you only review, never modify code.
 """
     - Single responsibility (review only)
     """
+    def __init__(self, model: str = "claude-sonnet-4-5-20250929"):
         """
         Initialize Reviewer agent.
         Args:
+            model: LLM model to use (default: Claude Sonnet 4.5)
         """
+        # Use Claude client for Claude models, OpenAI client as fallback
+        if "claude" in model.lower():
+            self.client = ClaudeClient(model=model)
+        else:
+            self.client = OpenAIClient(model=model)
         self.conversation = ConversationManager()
         # Reviewer only gets read-only tools
         self.allowed_tools = [
+            "get_file_outline",     # Get file structure without full code (token-efficient!)
+            "get_code_chunk",       # Extract specific function/class by name
+            "read_file",            # Full file contents (use sparingly)
+            "search_repository"
         ]
     def run(self, code_changes: Dict[str, str], plan: str, task: str) -> Tuple[bool, str]:

codepilot/context/indexer.py CHANGED Viewed

@@ -48,11 +48,15 @@ class CodebaseIndexer:
             # Skip unwanted directories (modify dirs in-place)
             dirs[:] = [d for d in dirs if d not in [
                 '__pycache__', 'venv', 'node_modules', '.git',
-                '.pytest_cache', '.mypy_cache'
             ]]
             # Process each file
             for file in files:
                 # Check if file has matching extension
                 if any(file.endswith(ext) for ext in file_extensions):
                     file_path = os.path.join(root, file)

             # Skip unwanted directories (modify dirs in-place)
             dirs[:] = [d for d in dirs if d not in [
                 '__pycache__', 'venv', 'node_modules', '.git',
+                '.pytest_cache', '.mypy_cache', 'tests', 'test'
             ]]
             # Process each file
             for file in files:
+                # Skip test files
+                if file.startswith('test_') or file.endswith('_test.py'):
+                    continue
                 # Check if file has matching extension
                 if any(file.endswith(ext) for ext in file_extensions):
                     file_path = os.path.join(root, file)

codepilot/llm/claude_client.py ADDED Viewed

	@@ -0,0 +1,235 @@

+"""
+Claude Client Wrapper
+Handles all communication with Anthropic's Claude API
+"""
+import os
+import json
+from dotenv import load_dotenv
+from anthropic import Anthropic
+from typing import List, Dict, Optional
+load_dotenv()
+class ClaudeClient:
+    """Wrapper for Anthropic Claude API calls"""
+    def __init__(self, model: str = "claude-sonnet-4-5-20250929"):
+        """
+        Initialize Claude client
+        Args:
+            model: Claude model to use (default: claude-3-5-sonnet-20241022)
+        """
+        self.api_key = os.getenv('ANTHROPIC_API_KEY')
+        if not self.api_key:
+            raise ValueError("ANTHROPIC_API_KEY not found in environment variables")
+        self.client = Anthropic(api_key=self.api_key)
+        self.model = model
+        print(f"✅ Claude Client initialized with model: {self.model}")
+    def chat(
+        self,
+        messages: List[Dict[str, str]],
+        tools: Optional[List[Dict]] = None,
+        temperature: float = 0.7,
+        max_tokens: int = 1000
+    ):
+        """
+        Send a chat completion request to Claude
+        Args:
+            messages: List of message dicts with 'role' and 'content'
+            tools: Optional list of tool definitions for function calling
+            temperature: Randomness (0-1, lower = more focused)
+            max_tokens: Maximum tokens in response
+        Returns:
+            Response object compatible with OpenAI format
+        """
+        try:
+            # Separate system message from conversation and convert tool messages
+            system_message = ""
+            conversation_messages = []
+            pending_tool_results = []
+            for msg in messages:
+                if msg.get("role") == "system":
+                    system_message = msg.get("content", "")
+                elif msg.get("role") == "tool":
+                    # Collect tool results to group them
+                    pending_tool_results.append({
+                        "type": "tool_result",
+                        "tool_use_id": msg.get("tool_call_id"),
+                        "content": msg.get("content", "")
+                    })
+                elif msg.get("role") == "assistant" and msg.get("tool_calls"):
+                    # Convert OpenAI tool_calls to Claude tool_use format
+                    # Flush pending tool results first
+                    if pending_tool_results:
+                        conversation_messages.append({
+                            "role": "user",
+                            "content": pending_tool_results
+                        })
+                        pending_tool_results = []
+                    # Convert tool_calls to content blocks with tool_use
+                    content_blocks = []
+                    if msg.get("content"):
+                        content_blocks.append({"type": "text", "text": msg.get("content")})
+                    for tc in msg.get("tool_calls", []):
+                        # Handle both dict and object formats
+                        if isinstance(tc, dict):
+                            tool_id = tc.get("id")
+                            func = tc.get("function", {})
+                            func_name = func.get("name")
+                            func_args = func.get("arguments", "{}")
+                        else:
+                            tool_id = tc.id
+                            func_name = tc.function.name
+                            func_args = tc.function.arguments
+                        content_blocks.append({
+                            "type": "tool_use",
+                            "id": tool_id,
+                            "name": func_name,
+                            "input": json.loads(func_args) if isinstance(func_args, str) else func_args
+                        })
+                    conversation_messages.append({
+                        "role": "assistant",
+                        "content": content_blocks
+                    })
+                else:
+                    # Flush pending tool results before adding non-tool message
+                    if pending_tool_results:
+                        conversation_messages.append({
+                            "role": "user",
+                            "content": pending_tool_results
+                        })
+                        pending_tool_results = []
+                    conversation_messages.append(msg)
+            # Flush any remaining tool results
+            if pending_tool_results:
+                conversation_messages.append({
+                    "role": "user",
+                    "content": pending_tool_results
+                })
+            # Build request parameters
+            request_params = {
+                "model": self.model,
+                "messages": conversation_messages,
+                "temperature": temperature,
+                "max_tokens": max_tokens
+            }
+            # Add system message if present
+            if system_message:
+                request_params["system"] = system_message
+            # Add tools if provided (convert from OpenAI format to Claude format)
+            if tools:
+                claude_tools = self._convert_tools_to_claude_format(tools)
+                request_params["tools"] = claude_tools
+            # Make API call
+            response = self.client.messages.create(**request_params)
+            # Convert Claude response to OpenAI-compatible format
+            openai_compatible_response = self._convert_to_openai_format(response)
+            # Print token usage for cost tracking
+            usage = response.usage
+            print(f"📊 Tokens: {usage.input_tokens} prompt + {usage.output_tokens} completion = {usage.input_tokens + usage.output_tokens} total")
+            return openai_compatible_response
+        except Exception as e:
+            print(f"❌ Claude API Error: {e}")
+            raise
+    def _convert_tools_to_claude_format(self, openai_tools: List[Dict]) -> List[Dict]:
+        """Convert OpenAI tool format to Claude tool format"""
+        claude_tools = []
+        for tool in openai_tools:
+            if tool.get("type") == "function":
+                func = tool.get("function", {})
+                claude_tool = {
+                    "name": func.get("name"),
+                    "description": func.get("description"),
+                    "input_schema": func.get("parameters", {})
+                }
+                claude_tools.append(claude_tool)
+        return claude_tools
+    def _convert_to_openai_format(self, claude_response):
+        """Convert Claude response to OpenAI-compatible format"""
+        import json
+        # Create simple dict-based objects that provide attribute access
+        class DictObject(dict):
+            """Object that behaves like both dict and object (JSON serializable)"""
+            def __init__(self, **kwargs):
+                super().__init__(kwargs)
+                self.__dict__ = self
+        # Extract content and tool calls from Claude response
+        content_parts = []
+        tool_calls = []
+        for block in claude_response.content:
+            if block.type == "text":
+                content_parts.append(block.text)
+            elif block.type == "tool_use":
+                # Convert to OpenAI tool call format
+                tool_call = DictObject(
+                    id=block.id,
+                    type="function",
+                    function=DictObject(
+                        name=block.name,
+                        arguments=json.dumps(block.input)
+                    )
+                )
+                tool_calls.append(tool_call)
+        # Determine finish reason
+        finish_reason = "stop"
+        if claude_response.stop_reason == "tool_use":
+            finish_reason = "tool_calls"
+        elif claude_response.stop_reason == "max_tokens":
+            finish_reason = "length"
+        # Build message
+        message = DictObject(
+            role="assistant",
+            content="\n".join(content_parts) if content_parts else None,
+            tool_calls=tool_calls if tool_calls else None
+        )
+        # Build choice
+        choice = DictObject(
+            message=message,
+            finish_reason=finish_reason
+        )
+        # Build usage
+        usage = DictObject(
+            prompt_tokens=claude_response.usage.input_tokens,
+            completion_tokens=claude_response.usage.output_tokens,
+            total_tokens=claude_response.usage.input_tokens + claude_response.usage.output_tokens
+        )
+        # Build response
+        return DictObject(
+            choices=[choice],
+            usage=usage
+        )

codepilot/llm/client.py CHANGED Viewed

@@ -36,7 +36,7 @@ class OpenAIClient:
         messages: List[Dict[str, str]],
         tools: Optional[List[Dict]] = None,
         temperature: float = 0.7,
-        max_tokens: int = 2000
     ) -> openai.types.chat.ChatCompletion:
         """
         Send a chat completion request to OpenAI

         messages: List[Dict[str, str]],
         tools: Optional[List[Dict]] = None,
         temperature: float = 0.7,
+        max_tokens: int = 800
     ) -> openai.types.chat.ChatCompletion:
         """
         Send a chat completion request to OpenAI

codepilot/tools/context_tools.py CHANGED Viewed

@@ -86,7 +86,8 @@ def index_codebase(path: str = ".") -> str:
                 })
     # Create and index hybrid retriever
-    _hybrid_retriever = HybridRetriever()
     retrieval_stats = _hybrid_retriever.index_documents(documents)
     # Return summary
@@ -141,3 +142,20 @@ def search_codebase(query: str, top_k: int = 5) -> str:
             output.append(f"      {line}")
     return '\n'.join(output)

                 })
     # Create and index hybrid retriever
+    # Weights tuned for code search: heavily favor BM25 (exact matches) over embeddings (semantic)
+    _hybrid_retriever = HybridRetriever(bm25_weight=0.85, embedding_weight=0.15)
     retrieval_stats = _hybrid_retriever.index_documents(documents)
     # Return summary
             output.append(f"      {line}")
     return '\n'.join(output)
+def search_repository(query: str, top_k: int = 5) -> str:
+    """
+    Search the cloned GitHub repository using hybrid retrieval (BM25 + embeddings).
+    This is a wrapper that provides semantic search over cloned repositories.
+    Uses the same hybrid search as search_codebase.
+    Args:
+        query: What to search for (e.g., "Flask application class", "error handling")
+        top_k: Number of results to return (default: 5)
+    Returns:
+        Formatted search results with file paths, scores, and code snippets
+    """
+    return search_codebase(query, top_k)

codepilot/tools/file_tools.py CHANGED Viewed

@@ -6,20 +6,35 @@ import subprocess
 import os
-def read_file(path):
     """
     Reads and returns the contents of a file.
     Args:
         path: File path to read
     Returns:
         str: File contents or error message
     """
     try:
         with open(path, 'r') as f:
-            content = f.read()
-        return f"Successfully read file '{path}':\n\n{content}"
     except FileNotFoundError:
         return f"Error: File '{path}' not found."
     except PermissionError:

 import os
+def read_file(path, max_lines=200):
     """
     Reads and returns the contents of a file.
+    Truncates large files to prevent context overflow.
     Args:
         path: File path to read
+        max_lines: Maximum lines to return (default 100)
     Returns:
         str: File contents or error message
     """
     try:
         with open(path, 'r') as f:
+            lines = f.readlines()
+        total_lines = len(lines)
+        # Truncate if too large
+        if total_lines > max_lines:
+            # Keep first half and last half
+            keep = max_lines // 2
+            content = ''.join(lines[:keep])
+            content += f'\n... [truncated {total_lines - max_lines} lines] ...\n\n'
+            content += ''.join(lines[-keep:])
+            return f"Successfully read file '{path}' ({total_lines} lines, showing first/last {keep}):\n\n{content}"
+        else:
+            content = ''.join(lines)
+            return f"Successfully read file '{path}':\n\n{content}"
     except FileNotFoundError:
         return f"Error: File '{path}' not found."
     except PermissionError:

codepilot/tools/github_tools.py ADDED Viewed

	@@ -0,0 +1,211 @@

+"""
+GitHub Repository Tools
+Handles cloning and managing public GitHub repositories for CodePilot sessions
+"""
+import os
+import re
+import shutil
+import subprocess
+import tempfile
+from typing import Optional, Tuple
+import uuid
+def extract_github_url(text: str) -> Optional[str]:
+    """
+    Extract a GitHub repository URL from text.
+    Supports formats:
+    - https://github.com/user/repo
+    - https://github.com/user/repo.git
+    - github.com/user/repo
+    - http://github.com/user/repo
+    Returns:
+        GitHub URL if found, None otherwise
+    """
+    # Pattern to match GitHub URLs
+    pattern = r'(?:https?://)?(?:www\.)?github\.com/([a-zA-Z0-9_-]+)/([a-zA-Z0-9_.-]+)(?:\.git)?'
+    match = re.search(pattern, text)
+    if match:
+        user = match.group(1)
+        repo = match.group(2).rstrip('.git')
+        return f"https://github.com/{user}/{repo}.git"
+    return None
+def get_repo_name(github_url: str) -> str:
+    """Extract repository name from GitHub URL."""
+    # Remove .git suffix if present
+    url = github_url.rstrip('.git')
+    # Get the last part of the URL
+    return url.split('/')[-1]
+def clone_repository(github_url: str, base_dir: Optional[str] = None) -> Tuple[bool, str, str]:
+    """
+    Clone a public GitHub repository to a temporary directory.
+    Args:
+        github_url: The GitHub repository URL
+        base_dir: Optional base directory for cloning (default: system temp)
+    Returns:
+        Tuple of (success: bool, path_or_error: str, repo_name: str)
+    """
+    repo_name = get_repo_name(github_url)
+    # Create a unique session directory
+    session_id = str(uuid.uuid4())[:8]
+    if base_dir is None:
+        # Use /tmp for cloud environments (more space than tempfile default)
+        base_dir = "/tmp/codepilot_repos"
+    # Ensure base directory exists
+    os.makedirs(base_dir, exist_ok=True)
+    # Create session-specific directory
+    session_dir = os.path.join(base_dir, f"{repo_name}_{session_id}")
+    try:
+        # Clone with depth=1 for faster cloning (only latest commit)
+        result = subprocess.run(
+            ["git", "clone", "--depth", "1", github_url, session_dir],
+            capture_output=True,
+            text=True,
+            timeout=120  # 2 minute timeout
+        )
+        if result.returncode != 0:
+            error_msg = result.stderr or "Unknown error during clone"
+            # Clean up failed clone
+            if os.path.exists(session_dir):
+                shutil.rmtree(session_dir, ignore_errors=True)
+            return False, f"Clone failed: {error_msg}", repo_name
+        return True, session_dir, repo_name
+    except subprocess.TimeoutExpired:
+        # Clean up on timeout
+        if os.path.exists(session_dir):
+            shutil.rmtree(session_dir, ignore_errors=True)
+        return False, "Clone timed out (repository may be too large)", repo_name
+    except Exception as e:
+        # Clean up on any error
+        if os.path.exists(session_dir):
+            shutil.rmtree(session_dir, ignore_errors=True)
+        return False, f"Clone error: {str(e)}", repo_name
+def cleanup_repository(repo_path: str) -> bool:
+    """
+    Clean up a cloned repository.
+    Args:
+        repo_path: Path to the cloned repository
+    Returns:
+        True if cleanup successful, False otherwise
+    """
+    try:
+        if os.path.exists(repo_path):
+            shutil.rmtree(repo_path)
+        return True
+    except Exception:
+        return False
+def get_repo_info(repo_path: str) -> dict:
+    """
+    Get basic information about a cloned repository.
+    Args:
+        repo_path: Path to the cloned repository
+    Returns:
+        Dictionary with repo info
+    """
+    info = {
+        "path": repo_path,
+        "name": os.path.basename(repo_path).split('_')[0],  # Remove session ID
+        "files": [],
+        "total_files": 0,
+        "languages": set()
+    }
+    # File extension to language mapping
+    ext_to_lang = {
+        '.py': 'Python',
+        '.js': 'JavaScript',
+        '.ts': 'TypeScript',
+        '.tsx': 'TypeScript',
+        '.jsx': 'JavaScript',
+        '.java': 'Java',
+        '.go': 'Go',
+        '.rs': 'Rust',
+        '.cpp': 'C++',
+        '.c': 'C',
+        '.h': 'C/C++',
+        '.rb': 'Ruby',
+        '.php': 'PHP',
+        '.swift': 'Swift',
+        '.kt': 'Kotlin',
+        '.cs': 'C#',
+        '.html': 'HTML',
+        '.css': 'CSS',
+        '.scss': 'SCSS',
+        '.md': 'Markdown',
+        '.json': 'JSON',
+        '.yaml': 'YAML',
+        '.yml': 'YAML',
+    }
+    # Walk the repository
+    for root, dirs, files in os.walk(repo_path):
+        # Skip hidden directories and common non-code directories
+        dirs[:] = [d for d in dirs if not d.startswith('.') and d not in ['node_modules', 'venv', '__pycache__', 'dist', 'build']]
+        for file in files:
+            if not file.startswith('.'):
+                info["total_files"] += 1
+                ext = os.path.splitext(file)[1].lower()
+                if ext in ext_to_lang:
+                    info["languages"].add(ext_to_lang[ext])
+                # Store relative path
+                rel_path = os.path.relpath(os.path.join(root, file), repo_path)
+                info["files"].append(rel_path)
+    info["languages"] = list(info["languages"])
+    return info
+def validate_github_url(url: str) -> Tuple[bool, str]:
+    """
+    Validate that a URL is a valid public GitHub repository.
+    Args:
+        url: The URL to validate
+    Returns:
+        Tuple of (is_valid: bool, message: str)
+    """
+    if not url:
+        return False, "No URL provided"
+    # Check if it's a GitHub URL
+    if 'github.com' not in url.lower():
+        return False, "Not a GitHub URL"
+    # Extract and validate format
+    extracted = extract_github_url(url)
+    if not extracted:
+        return False, "Invalid GitHub URL format. Expected: github.com/user/repo"
+    return True, extracted

codepilot/tools/registry.py CHANGED Viewed

@@ -5,6 +5,7 @@ Maps tool names to their implementations and schemas
 import os
 from codepilot.tools.file_tools import read_file, write_file, run_command, search_code, list_files, git_status
 from codepilot.sandbox.sandbox_tools import (
     create_sandbox,
     close_sandbox,
@@ -14,29 +15,120 @@ from codepilot.sandbox.sandbox_tools import (
 )
 from typing import Callable, List, Dict, Optional
-# Check if running in production BEFORE importing heavy ML dependencies
-# Detects: Render, HuggingFace Spaces, or any cloud with PORT env var
-_IS_PRODUCTION = os.getenv('RENDER_SERVICE_NAME') or os.getenv('RENDER') or os.getenv('SPACE_ID') or os.getenv('PORT')
-# Only import heavy context_tools (sentence-transformers, torch) in local development
-if not _IS_PRODUCTION:
-    from codepilot.tools.context_tools import search_codebase, index_codebase
-else:
-    # Provide stub functions for production to avoid import errors
-    def search_codebase(query: str, top_k: int = 5) -> str:
-        return "⚠️ Codebase search is disabled in cloud mode (resource constraints)"
-    def index_codebase(root_path: str) -> str:
-        return "⚠️ Codebase indexing is disabled in cloud mode (resource constraints)"
-# Tool schemas for OpenAI function calling
 TOOLS = [
     {
         "type": "function",
         "function": {
             "name": "read_file",
-            "description": "Reads the contents of a file at the specified path. Use this when you need to view or analyze file contents.",
             "parameters": {
                 "type": "object",
                 "properties": {
@@ -225,7 +317,67 @@ TOOLS = [
                 "required": ["code"]
             }
         }
-    }
 ]
@@ -241,7 +393,10 @@ TOOL_FUNCTIONS = {
     "index_codebase": index_codebase,
     "upload_to_sandbox": upload_to_sandbox,
     "execute_in_sandbox": execute_in_sandbox,
-    "run_command_in_sandbox": run_command_in_sandbox
 }
@@ -250,7 +405,7 @@ def get_tools() -> List[Dict]:
     Get all available tool schemas
     Returns:
-        List of tool schema dictionaries for OpenAI
     """
     return TOOLS

 import os
 from codepilot.tools.file_tools import read_file, write_file, run_command, search_code, list_files, git_status
+from codepilot.context.parser import CodeParser
 from codepilot.sandbox.sandbox_tools import (
     create_sandbox,
     close_sandbox,
 )
 from typing import Callable, List, Dict, Optional
+# Full context tools with BM25 + embeddings (requires 16GB+ RAM)
+from codepilot.tools.context_tools import search_codebase, index_codebase, search_repository
+# Initialize parser for file outline tools
+_parser = CodeParser()
+def get_file_outline(path: str) -> str:
+    """
+    Get the structure/outline of a Python file without reading full contents.
+    Returns classes, functions, methods with their signatures and docstrings.
+    Much more token-efficient than read_file for understanding file structure.
+    Args:
+        path: Path to the Python file
+    Returns:
+        Formatted outline showing file structure
+    """
+    result = _parser.parse_file(path)
+    if result.get('parse_errors'):
+        return f"Error: {result['parse_errors'][0]}"
+    lines = [f"# {path} ({result.get('total_lines', 0)} lines)\n"]
+    # Show imports summary
+    imports = result.get('imports', [])
+    if imports:
+        modules = set()
+        for imp in imports:
+            if imp['type'] == 'import':
+                modules.add(imp['name'].split('.')[0])
+            else:
+                mod = imp.get('module', '')
+                if mod:
+                    modules.add(mod.split('.')[0])
+        lines.append(f"Imports: {', '.join(sorted(modules)[:10])}")
+        if len(modules) > 10:
+            lines.append(f"  ... and {len(modules) - 10} more")
+        lines.append("")
+    # Show classes with methods
+    for cls in result.get('classes', []):
+        bases = f"({', '.join(cls['bases'])})" if cls['bases'] else ""
+        decorators = '\n'.join(f"@{d}" for d in cls.get('decorators', []))
+        if decorators:
+            lines.append(decorators)
+        lines.append(f"class {cls['name']}{bases}:  # lines {cls['start_line']}-{cls['end_line']}")
+        if cls.get('docstring'):
+            # Truncate long docstrings
+            doc = cls['docstring'][:150].replace('\n', ' ')
+            if len(cls['docstring']) > 150:
+                doc += "..."
+            lines.append(f'    """{doc}"""')
+        for method in cls.get('methods', []):
+            async_prefix = "async " if method.get('is_async') else ""
+            lines.append(f"    {async_prefix}def {method['name']}()  # line {method['line']}")
+        lines.append("")
+    # Show standalone functions
+    standalone_funcs = [f for f in result.get('functions', [])
+                       if not any(f['start_line'] >= c['start_line'] and f['end_line'] <= c['end_line']
+                                 for c in result.get('classes', []))]
+    for func in standalone_funcs:
+        params = ', '.join(func.get('parameters', []))
+        async_prefix = "async " if func.get('is_async') else ""
+        decorators = '\n'.join(f"@{d}" for d in func.get('decorators', []))
+        if decorators:
+            lines.append(decorators)
+        lines.append(f"{async_prefix}def {func['name']}({params}):  # lines {func['start_line']}-{func['end_line']}")
+        if func.get('docstring'):
+            doc = func['docstring'][:100].replace('\n', ' ')
+            if len(func['docstring']) > 100:
+                doc += "..."
+            lines.append(f'    """{doc}"""')
+        lines.append("")
+    # Show globals
+    globals_list = result.get('globals', [])
+    if globals_list:
+        lines.append("# Global variables:")
+        for g in globals_list[:10]:
+            lines.append(f"  {g['name']}: {g.get('type', 'unknown')}  # line {g['line']}")
+        if len(globals_list) > 10:
+            lines.append(f"  ... and {len(globals_list) - 10} more")
+    return '\n'.join(lines)
+def get_code_chunk(path: str, name: str) -> str:
+    """
+    Extract a specific function or class from a file by name.
+    Use this when you need to see the implementation of a specific function/class
+    after using get_file_outline to identify what you need.
+    Args:
+        path: Path to the Python file
+        name: Name of the function or class to extract
+    Returns:
+        The complete code for the specified function/class with relevant imports
+    """
+    return _parser.extract_code_chunk(path, name)
+# Tool schemas for LLM function calling (compatible with Claude and OpenAI)
 TOOLS = [
     {
         "type": "function",
         "function": {
             "name": "read_file",
+            "description": "WARNING: This reads the ENTIRE file which wastes tokens! PREFER get_file_outline (for structure) or get_code_chunk (for specific function/class) instead. Only use read_file when you absolutely need the complete file contents.",
             "parameters": {
                 "type": "object",
                 "properties": {
                 "required": ["code"]
             }
         }
+    },
+    {
+        "type": "function",
+        "function": {
+            "name": "search_repository",
+            "description": "Search the cloned GitHub repository using hybrid retrieval (BM25 + semantic embeddings). Use this to find functions, classes, or code patterns in the cloned repo. More powerful than search_code - finds both exact matches AND semantically related code.",
+            "parameters": {
+                "type": "object",
+                "properties": {
+                    "query": {
+                        "type": "string",
+                        "description": "What to search for. Can be natural language (e.g., 'authentication logic', 'error handling') or specific terms (e.g., 'Flask class', 'route decorator')"
+                    },
+                    "top_k": {
+                        "type": "integer",
+                        "description": "Number of results to return (default: 5, max: 20)",
+                        "default": 5
+                    }
+                },
+                "required": ["query"]
+            }
+        }
+    },
+    {
+        "type": "function",
+        "function": {
+            "name": "get_file_outline",
+            "description": "Get the structure/outline of a Python file WITHOUT reading full contents. Returns classes, functions, methods with signatures and docstrings. Use this FIRST to understand a file's structure before using read_file or get_code_chunk. Much more token-efficient than read_file (~50 tokens vs ~2000 tokens for a typical file).",
+            "parameters": {
+                "type": "object",
+                "properties": {
+                    "path": {
+                        "type": "string",
+                        "description": "Path to the Python file to outline"
+                    }
+                },
+                "required": ["path"]
+            }
+        }
+    },
+    {
+        "type": "function",
+        "function": {
+            "name": "get_code_chunk",
+            "description": "Extract a specific function or class from a file by name. Use this after get_file_outline to read just the code you need instead of the entire file. Returns the complete implementation with relevant imports.",
+            "parameters": {
+                "type": "object",
+                "properties": {
+                    "path": {
+                        "type": "string",
+                        "description": "Path to the Python file"
+                    },
+                    "name": {
+                        "type": "string",
+                        "description": "Name of the function or class to extract (e.g., 'MyClass', 'my_function')"
+                    }
+                },
+                "required": ["path", "name"]
+            }
+        }
+    },
 ]
     "index_codebase": index_codebase,
     "upload_to_sandbox": upload_to_sandbox,
     "execute_in_sandbox": execute_in_sandbox,
+    "run_command_in_sandbox": run_command_in_sandbox,
+    "search_repository": search_repository,
+    "get_file_outline": get_file_outline,
+    "get_code_chunk": get_code_chunk
 }
     Get all available tool schemas
     Returns:
+        List of tool schema dictionaries for LLM function calling
     """
     return TOOLS

requirements.txt CHANGED Viewed

@@ -1,8 +1,9 @@
-# Cloud deployment requirements (lightweight - no PyTorch/sentence-transformers)
-# These are only the essential packages needed for HuggingFace Spaces
 # Core
 openai>=1.0.0
 python-dotenv>=1.2.0
 # E2B Sandbox
@@ -12,11 +13,16 @@ e2b-code-interpreter>=2.4.0
 langchain>=0.3.0
 langgraph>=0.2.0
-# Lightweight search (no embeddings in cloud mode)
 rank-bm25>=0.2.2
 # Chainlit UI
 chainlit>=1.0.0
 # For dependency graphs
 networkx>=3.0

+# Full deployment requirements with embeddings support
+# For HuggingFace Spaces with 16GB+ RAM
 # Core
 openai>=1.0.0
+anthropic>=0.25.0
 python-dotenv>=1.2.0
 # E2B Sandbox
 langchain>=0.3.0
 langgraph>=0.2.0
+# Search - BM25 + Embeddings
 rank-bm25>=0.2.2
+sentence-transformers>=2.2.0
+chromadb>=0.4.0
 # Chainlit UI
 chainlit>=1.0.0
 # For dependency graphs
 networkx>=3.0
+# Tree-sitter for AST parsing
+tree-sitter>=0.20.0