ayushm98 commited on
Commit
6f39ef4
·
1 Parent(s): 94dfc0a

Improve UI: cleaner welcome, better progress display, simplified results

Browse files
Files changed (3) hide show
  1. CLAUDE.md +143 -100
  2. chainlit.md +21 -8
  3. chainlit_app.py +84 -192
CLAUDE.md CHANGED
@@ -6,146 +6,160 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
6
 
7
  **CodePilot** - An autonomous AI coding agent that takes GitHub issues, understands codebases, writes code in sandboxed environments, and creates pull requests autonomously.
8
 
9
- **Tech Stack:** Python 3.11+, Claude Sonnet 4.5 (Anthropic API), E2B sandboxed execution, LangChain/LangGraph
10
- **Development Timeline:** 24-week phased implementation (currently in Phase 5: Chainlit UI - COMPLETE)
11
 
12
  ## Architecture
13
 
14
- This project follows a layered architecture (planned - see devon-project-plan.md for full roadmap):
 
 
15
 
16
  ```
17
- Multi-Agent System (Planner, Coder, Reviewer)
18
-
19
- Context Engine (Hybrid Retrieval, AST-aware chunking)
20
-
21
- Tool Layer (read_file, write_file, run_command, search_code)
22
-
23
- E2B Sandbox (Isolated code execution)
 
 
 
 
 
 
 
 
 
 
24
  ```
25
 
26
- ## Current Implementation Status
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
 
28
- **✅ COMPLETED PHASES:**
29
-
30
- **Phase 1: Foundation (Weeks 1-3)**
31
- - ✅ LLM client wrapper (`codepilot/llm/claude_client.py`) - Claude API with tool calling
32
- - Tool registry (`codepilot/tools/registry.py`) - Function calling infrastructure
33
- - Base agent (`codepilot/agents/base_agent.py`) - Core ReAct loop
34
- - Core tools: `read_file`, `write_file`, `run_command`, `search_codebase`, `list_files`
35
-
36
- **Phase 2: Context Engineering (Weeks 4-8)**
37
- - BM25 keyword search (`codepilot/context/bm25_search.py`)
38
- - ✅ Dense embeddings (`codepilot/context/embeddings.py`) - sentence-transformers
39
- - ✅ Hybrid retrieval (`codepilot/context/retrieval.py`) - Combined BM25 + semantic search
40
- - ✅ Code parser (`codepilot/context/parser.py`) - AST-aware chunking
41
- - ✅ Codebase indexer (`codepilot/context/indexer.py`) - Full codebase indexing
42
- - ✅ Context selector (`codepilot/context/selector.py`) - Smart context selection
43
- - ✅ Context tools: `index_codebase`, `search_codebase`, `get_relevant_context`
44
-
45
- **Phase 3: Multi-Agent Architecture (Weeks 9-12)**
46
- - ✅ Planner agent (`codepilot/agents/planner_agent.py`) - Creates implementation plans
47
- - ✅ Coder agent (`codepilot/agents/coder_agent.py`) - Writes and tests code
48
- - ✅ Reviewer agent (`codepilot/agents/reviewer_agent.py`) - Code review and approval
49
- - ✅ Orchestrator (`codepilot/agents/orchestrator.py`) - State machine coordination
50
-
51
- **Phase 4: E2B Sandbox Integration (Weeks 13-14)**
52
- - ✅ E2B sandbox manager (`codepilot/sandbox/e2b_sandbox.py`) - Isolated execution
53
- - ✅ Sandbox tools (`codepilot/sandbox/sandbox_tools.py`) - upload, execute, run commands
54
- - ✅ Integration with Coder agent - Automatic sandbox testing workflow
55
-
56
- **Phase 5: Chainlit UI (Weeks 15-16)**
57
- - ✅ Chainlit application (`chainlit_app.py`) - Interactive chat interface
58
- - ✅ Real-time workflow visualization with Chainlit Steps
59
- - ✅ Detailed agent progress tracking (Planner → Coder → Reviewer)
60
- - ✅ Code preview and test results display
61
- - ✅ User guide (`CHAINLIT_GUIDE.md`)
62
-
63
- **NEXT PHASES:**
64
-
65
- **Phase 6: GitHub Integration (Weeks 17-18)** - Not started
66
- - GitHub webhooks for issue tracking
67
- - Automated PR creation
68
- - Branch management
69
-
70
- **Phase 7: Evals & Benchmarks (Weeks 19-21)** - Not started
71
- - SWE-bench evaluation
72
- - Custom test suite
73
-
74
- **Phase 8: Production Hardening (Weeks 22-24)** - Not started
75
- - Error handling and retries
76
- - Logging and monitoring
77
- - Deployment configuration
78
 
79
  ## Development Commands
80
 
81
  **Setup:**
82
  ```bash
83
  python -m venv venv
84
- source venv/bin/activate # On Windows: venv\Scripts\activate
85
  pip install -r requirements.txt
86
  ```
87
 
88
- **Verify setup:**
89
  ```bash
90
- python test_setup.py # Checks that API keys are loaded correctly
91
  ```
92
 
93
- **Run Chainlit UI (Phase 5):**
94
  ```bash
95
  chainlit run chainlit_app.py
96
  # Opens at http://localhost:8000
97
- # See CHAINLIT_GUIDE.md for full usage guide
98
  ```
99
 
100
- **Test individual phases:**
101
  ```bash
102
- # Phase 2: Context Engineering
103
  python test_context.py
104
 
105
- # Phase 3: Multi-Agent Workflow
106
  python test_multi_agent.py
107
 
108
- # Phase 4: E2B Sandbox
109
  python test_sandbox.py
110
  python test_workflow_with_sandbox.py
111
  ```
112
 
113
- **Environment variables required in .env:**
114
  ```
115
  ANTHROPIC_API_KEY=sk-ant-...
116
  E2B_API_KEY=e2b_...
117
  ```
118
 
119
- ## Project Phases (from devon-project-plan.md)
 
 
 
 
 
 
 
 
 
 
 
 
120
 
121
- 1. **Phase 1 (Weeks 1-3):** Foundation - Basic agent loop, tool calling, LLM abstraction
122
- 2. **Phase 2 (Weeks 4-8):** Context Engineering - Hybrid retrieval (BM25 + dense), AST-aware chunking
123
- 3. **Phase 3 (Weeks 9-12):** Multi-Agent Architecture - Orchestrator with specialized agents
124
- 4. **Phase 4 (Weeks 13-14):** E2B Sandbox Integration
125
- 5. **Phase 5 (Weeks 15-16):** Chainlit UI
126
- 6. **Phase 6 (Weeks 17-18):** GitHub Integration (webhooks, PRs)
127
- 7. **Phase 7 (Weeks 19-21):** Evals & Benchmarks (SWE-bench)
128
- 8. **Phase 8 (Weeks 22-24):** Production Hardening
129
 
130
  ## Key Design Principles
131
 
132
- **From the project plan:**
133
- - **Focus on Context Engineering:** This is the differentiator, not UI/UX
134
- - **ReAct Pattern:** Reason about what to do, Act with tools, observe results, repeat
135
- - **AST-Aware Processing:** Parse code structurally, not as text (tree-sitter for multi-language support)
136
- - **Hybrid Retrieval:** Combine BM25 (exact matches) + dense embeddings (semantic search)
137
- - **Sandboxed Execution:** All code runs in E2B containers, never on host
138
- - **Multi-Agent Orchestration:** Specialized agents (Planner, Coder, Reviewer) coordinated by orchestrator
139
 
140
- ## Tool Schema Format
141
 
142
- Tools follow Claude/Anthropic function calling format:
 
143
  ```python
144
  {
145
  "type": "function",
146
  "function": {
147
  "name": "tool_name",
148
- "description": "Clear description for LLM to understand when to use",
149
  "parameters": {
150
  "type": "object",
151
  "properties": {...},
@@ -155,17 +169,46 @@ Tools follow Claude/Anthropic function calling format:
155
  }
156
  ```
157
 
158
- ## Implementation Notes
 
 
 
159
 
 
 
 
160
  - All tool functions return formatted strings (success messages or errors)
161
- - `write_file` auto-creates parent directories if needed
162
- - `run_command` has 30-second timeout to prevent hanging
163
- - Error handling uses specific exceptions (FileNotFoundError, PermissionError) before generic fallback
164
 
165
- ## Important Files
 
 
 
166
 
167
- - `devon-project-plan.md` - Complete 24-week implementation roadmap with architectural details
168
- - `codepilot/llm/claude_client.py` - Claude API wrapper with tool calling
169
- - `codepilot/agents/orchestrator.py` - Multi-agent state machine
170
- - `requirements.txt` - Python dependencies (anthropic, e2b-code-interpreter, langchain, langgraph)
171
- - `.env` - API keys (not committed, in .gitignore)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
 
7
  **CodePilot** - An autonomous AI coding agent that takes GitHub issues, understands codebases, writes code in sandboxed environments, and creates pull requests autonomously.
8
 
9
+ **Tech Stack:** Python 3.11+, Claude Sonnet 4.5 (Anthropic API), E2B sandboxed execution, LangChain/LangGraph, Chainlit UI
10
+ **Current Phase:** Phase 5 Complete (Chainlit UI with multi-agent visualization)
11
 
12
  ## Architecture
13
 
14
+ ### Multi-Agent Workflow System
15
+
16
+ CodePilot uses a **dual-mode orchestrator** that routes tasks to different workflows:
17
 
18
  ```
19
+ ┌─────────────────────────────────────────────────────────┐
20
+ │ ORCHESTRATOR │
21
+ (Task Classification)
22
+ └─────────────────────────────────────────────────────────┘
23
+
24
+ ┌───────────────┴────────────────┐
25
+ │ │
26
+ "explore" "code"
27
+ │ │
28
+ v v
29
+ ┌────────────────┐ ┌──────────────────────────────┐
30
+ │ ExplorerAgent │ │ Full Multi-Agent Pipeline │
31
+ │ (Direct) │ │ Explorer → Clarify → Plan │
32
+ └────────────────┘ │ ↓ │
33
+ │ Coder ⟷ Reviewer │
34
+ │ (iterative) │
35
+ └──────────────────────────────┘
36
  ```
37
 
38
+ **Task Classification Logic** (see `orchestrator.py:92-201`):
39
+ - **Explore tasks**: Questions starting with "find", "where", "what", "how", "explain" → Uses ExplorerAgent only
40
+ - **Code tasks**: Commands starting with "add", "create", "implement", "fix" → Full pipeline
41
+ - Short queries (<100 chars) default to explore; long queries default to code
42
+
43
+ **Full Pipeline Flow** (code tasks):
44
+ 1. **Explorer** - Gathers codebase context using token-efficient tools
45
+ 2. **Clarifier** - Planner generates questions, pauses for user answers (v3.3+)
46
+ 3. **Planner** - Creates implementation plan (NO tools, pure LLM reasoning)
47
+ 4. **Coder** - Implements code, tests in sandbox (NO search, uses Explorer's context)
48
+ 5. **Reviewer** - Reviews code, approves or sends back to Coder with feedback
49
+
50
+ ### Context Engineering (Hybrid Retrieval)
51
+
52
+ The core differentiator is **Reciprocal Rank Fusion (RRF)** combining two search methods:
53
+
54
+ ```
55
+ Query → ┌─ BM25 (keyword) ──────┐
56
+ │ │
57
+ ├─ Embeddings (semantic)┤ → RRF Fusion → Top K Results
58
+ │ (sentence-transformers)
59
+ └───────────────────────┘
60
+ ```
61
+
62
+ **Implementation**: `codepilot/context/hybrid_retriever.py`
63
+ - BM25: Exact matches (function names, variable names)
64
+ - Embeddings: Semantic matches (related concepts)
65
+ - RRF formula: `score = Σ(weight_i / (k + rank_i))` where k=60
66
+ - Default weights: 50% BM25, 50% embeddings
67
+
68
+ ### Token-Efficient Tools
69
+
70
+ **Critical for cost management** - agents should prefer:
71
+ 1. `get_file_outline(path)` - Shows class/function signatures (~50 tokens vs ~2000 for full file)
72
+ 2. `get_code_chunk(path, name)` - Extracts specific function/class by name
73
+ 3. `search_repository(query)` - Hybrid search (use BEFORE reading files)
74
+
75
+ Only use `read_file` when you need complete file contents.
76
 
77
+ ### Agent Tool Access (v3.0+ separation)
78
+
79
+ Each agent has **restricted tool access** to prevent inefficiency:
80
+
81
+ - **ExplorerAgent**: `search_repository`, `get_file_outline`, `get_code_chunk`, `search_code`, `list_files`
82
+ - **PlannerAgent**: **NO TOOLS** (pure LLM reasoning, receives exploration context)
83
+ - **CoderAgent**: `write_file`, `get_code_chunk`, `read_file` (NO search tools)
84
+ - **ReviewerAgent**: `get_file_outline`, `get_code_chunk`, `read_file`
85
+
86
+ **Key insight**: v3.0 removed duplicate searching. Explorer searches once, all agents reuse that context.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
87
 
88
  ## Development Commands
89
 
90
  **Setup:**
91
  ```bash
92
  python -m venv venv
93
+ source venv/bin/activate # Windows: venv\Scripts\activate
94
  pip install -r requirements.txt
95
  ```
96
 
97
+ **Verify installation:**
98
  ```bash
99
+ python test_setup.py # Checks API keys are loaded
100
  ```
101
 
102
+ **Run Chainlit UI (Primary interface):**
103
  ```bash
104
  chainlit run chainlit_app.py
105
  # Opens at http://localhost:8000
106
+ # Ctrl+C to stop, then pkill -f chainlit to clean up background processes
107
  ```
108
 
109
+ **Test individual components:**
110
  ```bash
111
+ # Context Engineering (Phase 2)
112
  python test_context.py
113
 
114
+ # Multi-Agent Workflow (Phase 3)
115
  python test_multi_agent.py
116
 
117
+ # E2B Sandbox (Phase 4)
118
  python test_sandbox.py
119
  python test_workflow_with_sandbox.py
120
  ```
121
 
122
+ **Environment variables** (create `.env` file):
123
  ```
124
  ANTHROPIC_API_KEY=sk-ant-...
125
  E2B_API_KEY=e2b_...
126
  ```
127
 
128
+ ## Current Implementation Status
129
+
130
+ **✅ COMPLETED (Phases 1-5):**
131
+ - Phase 1: LLM client, tool registry, base agent, core tools
132
+ - Phase 2: Hybrid retrieval (BM25 + embeddings), AST-aware parsing, codebase indexing
133
+ - Phase 3: Multi-agent architecture (Explorer, Planner, Coder, Reviewer, Orchestrator)
134
+ - Phase 4: E2B sandbox integration for isolated code execution
135
+ - Phase 5: Chainlit UI with real-time agent progress visualization
136
+
137
+ **🚧 NEXT PHASES:**
138
+ - Phase 6 (Weeks 17-18): GitHub Integration - webhooks, automated PR creation
139
+ - Phase 7 (Weeks 19-21): Evals & Benchmarks - SWE-bench evaluation
140
+ - Phase 8 (Weeks 22-24): Production Hardening - error handling, monitoring, deployment
141
 
142
+ See `devon-project-plan.md` for complete 24-week roadmap.
 
 
 
 
 
 
 
143
 
144
  ## Key Design Principles
145
 
146
+ 1. **Context Engineering is the Differentiator** - Not UI/UX, the hybrid retrieval and AST-aware chunking
147
+ 2. **ReAct Pattern** - All agents use: Reason Act (with tools) → Observe → Repeat
148
+ 3. **AST-Aware Processing** - Parse code structurally using tree-sitter, not as text
149
+ 4. **Sandboxed Execution** - All code runs in E2B containers, never on host machine
150
+ 5. **Single-Search Architecture** - Explorer searches once, all downstream agents reuse context (v3.0+)
151
+ 6. **Clarification Before Action** - Planner asks questions before creating plan (v3.3+)
 
152
 
153
+ ## Important Implementation Details
154
 
155
+ ### Tool Schema Format
156
+ All tools follow Claude/Anthropic function calling format:
157
  ```python
158
  {
159
  "type": "function",
160
  "function": {
161
  "name": "tool_name",
162
+ "description": "Clear description for LLM",
163
  "parameters": {
164
  "type": "object",
165
  "properties": {...},
 
169
  }
170
  ```
171
 
172
+ ### Path Handling (Critical for Coder)
173
+ - **Planner must provide FULL ABSOLUTE PATHS** (e.g., `/tmp/codepilot_repos/flask_abc123/examples/app.py`)
174
+ - **Coder uses paths EXACTLY as written** in the plan
175
+ - Repository path is injected in Chainlit context (see `chainlit_app.py:661-672`)
176
 
177
+ ### File Operations
178
+ - `write_file` auto-creates parent directories
179
+ - `run_command` has 30-second timeout
180
  - All tool functions return formatted strings (success messages or errors)
 
 
 
181
 
182
+ ### Version Tracking
183
+ Files include version constants for debugging hot-reload issues:
184
+ - `orchestrator.py:12` - `ORCHESTRATOR_VERSION`
185
+ - `chainlit_app.py:25-26` - `APP_VERSION`, `BUILD_ID`
186
 
187
+ ### Conversation Management
188
+ Agents use `ConversationManager` (`codepilot/agents/conversation.py`) to maintain message history in OpenAI/Anthropic format. This handles:
189
+ - System/user/assistant messages
190
+ - Tool calls and tool results
191
+ - Proper formatting for both Claude and OpenAI APIs
192
+
193
+ ## Critical Files
194
+
195
+ - `codepilot/agents/orchestrator.py` - Task classification and multi-agent state machine
196
+ - `codepilot/agents/planner_agent.py` - Pure LLM planning (no tools) + clarification questions
197
+ - `codepilot/agents/coder_agent.py` - Code implementation (no search tools)
198
+ - `codepilot/agents/explorer_agent.py` - Codebase exploration (search tools only)
199
+ - `codepilot/context/hybrid_retriever.py` - RRF fusion algorithm
200
+ - `codepilot/tools/registry.py` - Tool schemas and function mappings
201
+ - `chainlit_app.py` - Interactive UI with GitHub repo cloning and progress visualization
202
+ - `requirements.txt` - Python dependencies
203
+
204
+ ## Project Structure
205
+
206
+ ```
207
+ codepilot/
208
+ ├── llm/ # LLM client wrappers (Claude, OpenAI)
209
+ ├── agents/ # Multi-agent system (Orchestrator, Planner, Coder, Reviewer, Explorer)
210
+ ├── tools/ # Tool implementations (file ops, context search, GitHub)
211
+ ├── context/ # Hybrid retrieval (BM25, embeddings, parser, indexer)
212
+ └── sandbox/ # E2B sandbox integration
213
+ chainlit_app.py # Main UI application
214
+ ```
chainlit.md CHANGED
@@ -1,14 +1,27 @@
1
- # Welcome to Chainlit! 🚀🤖
2
 
3
- Hi there, Developer! 👋 We're excited to have you on board. Chainlit is a powerful tool designed to help you prototype, debug and share applications built on top of LLMs.
4
 
5
- ## Useful Links 🔗
6
 
7
- - **Documentation:** Get started with our comprehensive [Chainlit Documentation](https://docs.chainlit.io) 📚
8
- - **Discord Community:** Join our friendly [Chainlit Discord](https://discord.gg/k73SQ3FyUh) to ask questions, share your projects, and connect with other developers! 💬
 
 
9
 
10
- We can't wait to see what you create with Chainlit! Happy coding! 💻😊
11
 
12
- ## Welcome screen
 
 
 
13
 
14
- To modify the welcome screen, edit the `chainlit.md` file at the root of your project. If you do not want a welcome screen, just leave this file empty.
 
 
 
 
 
 
 
 
 
1
+ # 🤖 CodePilot
2
 
3
+ **Autonomous AI coding agent powered by Claude Sonnet 4.5**
4
 
5
+ ## What I Can Do
6
 
7
+ - 🔍 **Understand your codebase** - Hybrid search finds relevant code instantly
8
+ - 📋 **Plan implementations** - Break down tasks into clear steps
9
+ - ✍️ **Write production code** - Multi-agent system writes, tests, and reviews
10
+ - 🏖️ **Run safely in sandboxes** - All code tested in isolated E2B environments
11
 
12
+ ## How to Use
13
 
14
+ 1. **Paste a GitHub URL** (public repos only)
15
+ 2. **Describe what you want** (e.g., "add a health check endpoint")
16
+ 3. **Watch the agents work** - Explorer → Planner → Coder → Reviewer
17
+ 4. **Get production-ready code** - Tested and reviewed automatically
18
 
19
+ ## Example
20
+
21
+ ```
22
+ https://github.com/pallets/flask add a /health endpoint
23
+ ```
24
+
25
+ ---
26
+
27
+ **Ready!** Paste a GitHub URL and tell me what to build 🚀
chainlit_app.py CHANGED
@@ -22,8 +22,8 @@ from concurrent.futures import ThreadPoolExecutor
22
  # ============================================================
23
  # STARTUP VERSION CHECK - Change this to detect if rebuild worked
24
  # ============================================================
25
- APP_VERSION = "3.6.1-plan-fix"
26
- BUILD_ID = "2024-12-20-v6"
27
  print("=" * 60)
28
  print(f"[STARTUP] CodePilot Chainlit App")
29
  print(f"[STARTUP] APP_VERSION: {APP_VERSION}")
@@ -80,7 +80,13 @@ def format_code_output(code_changes: dict) -> str:
80
  if not code_changes:
81
  return "No code changes."
82
 
83
- output = []
 
 
 
 
 
 
84
  for file_path, content in code_changes.items():
85
  # Get just the filename for display
86
  filename = os.path.basename(file_path)
@@ -89,9 +95,9 @@ def format_code_output(code_changes: dict) -> str:
89
 
90
  # Use collapsible details/summary
91
  output.append(f"<details>")
92
- output.append(f"<summary><strong>{filename}</strong> ({line_count} lines) - Click to expand</summary>")
93
  output.append(f"")
94
- output.append(f"**Path:** `{file_path}`")
95
  output.append(f"```{lang}")
96
  output.append(content)
97
  output.append("```")
@@ -224,9 +230,9 @@ def format_progress_display(status: dict, total_cost: float) -> str:
224
  if done:
225
  return "✅"
226
  elif active:
227
- return ""
228
  else:
229
- return ""
230
 
231
  def get_activity(agent: str) -> str:
232
  """Get activity text for an agent."""
@@ -234,17 +240,17 @@ def format_progress_display(status: dict, total_cost: float) -> str:
234
 
235
  if agent == 'Explorer':
236
  if status['explorer_done']:
237
- return status.get('explorer_activity') or 'Done'
238
  elif current == 'Explorer':
239
  return status.get('explorer_activity') or 'Analyzing codebase...'
240
- return 'Waiting'
241
 
242
  elif agent == 'Planner':
243
  if status['planner_done']:
244
- return status.get('planner_activity') or 'Plan created'
245
  elif current == 'Planner':
246
  return status.get('planner_activity') or 'Creating plan...'
247
- return 'Waiting'
248
 
249
  elif agent == 'Coder':
250
  if status['coder_done']:
@@ -252,221 +258,114 @@ def format_progress_display(status: dict, total_cost: float) -> str:
252
  if activity:
253
  return activity
254
  files = status.get('files_written', 0)
255
- return f'Wrote {files} files' if files else 'Done'
256
  elif current == 'Coder':
257
  return status.get('coder_activity') or 'Writing code...'
258
- return 'Waiting'
259
 
260
  elif agent == 'Reviewer':
261
  if status['reviewer_done']:
262
  if status['approved']:
263
- return '**Approved**'
264
  else:
265
- return '**Rejected**'
266
  elif current == 'Reviewer':
267
  return status.get('reviewer_activity') or 'Reviewing...'
268
- return 'Waiting'
269
 
270
- return 'Waiting'
271
 
272
  current = status['current_agent']
273
 
274
- lines = ["## Progress\n"]
275
- lines.append("| Agent | Activity |")
276
- lines.append("|-------|----------|")
277
- lines.append(f"| Explorer | {icon(status['explorer_done'], current == 'Explorer')} {get_activity('Explorer')} |")
278
- lines.append(f"| Planner | {icon(status['planner_done'], current == 'Planner')} {get_activity('Planner')} |")
279
- lines.append(f"| Coder | {icon(status['coder_done'], current == 'Coder')} {get_activity('Coder')} |")
280
- lines.append(f"| Reviewer | {icon(status['reviewer_done'], current == 'Reviewer')} {get_activity('Reviewer')} |")
 
281
 
282
- lines.append(f"\n**Cost:** ${total_cost:.4f}")
 
 
 
 
 
 
283
 
284
  return "\n".join(lines)
285
 
286
 
287
  def format_final_result(result: dict, total_cost: float) -> str:
288
  """Format final result with detailed test checks."""
289
- lines = ["## Results\n"]
290
-
291
  success = result.get('success', False)
292
- has_plan = bool(result.get('plan'))
293
  code_changes = result.get('code_changes', {})
294
- has_code = bool(code_changes)
295
  file_count = len(code_changes) if code_changes else 0
296
  review_feedback = result.get('review_feedback', '')
297
 
298
- # Detailed checks table
299
- lines.append("| Test | Status |")
300
- lines.append("|------|--------|")
301
-
302
- # 1. Plan created
303
- lines.append(f"| Plan created | {'✅ Pass' if has_plan else '❌ Fail'} |")
304
 
305
- # 2. Files written
306
- if has_code:
307
- lines.append(f"| Files written | ✅ Pass ({file_count} files) |")
308
- else:
309
- lines.append("| Files written | ❌ Fail |")
310
-
311
- # 3. Valid syntax (infer from review - if approved, syntax is valid)
312
  if success:
313
- lines.append("| Valid syntax | Pass |")
314
- elif has_code and review_feedback:
315
- # Check if syntax error mentioned in feedback
316
- if 'syntax' in review_feedback.lower() or 'error' in review_feedback.lower():
317
- lines.append("| Valid syntax | Fail |")
318
- else:
319
- lines.append("| Valid syntax | ✅ Pass |")
320
- elif has_code:
321
- lines.append("| Valid syntax | ⬜ Pending |")
322
  else:
323
- lines.append("| Valid syntax | ⬜ N/A |")
 
 
324
 
325
- # 4. Follows patterns (infer from approval)
326
- if success:
327
- lines.append("| Follows patterns | ✅ Pass |")
328
- elif has_code and review_feedback:
329
- if 'pattern' in review_feedback.lower() or 'convention' in review_feedback.lower():
330
- lines.append("| Follows patterns | ❌ Fail |")
331
- else:
332
- lines.append("| Follows patterns | ✅ Pass |")
333
- elif has_code:
334
- lines.append("| Follows patterns | ⬜ Pending |")
335
- else:
336
- lines.append("| Follows patterns | ⬜ N/A |")
337
-
338
- # 5. Matches requirements (infer from approval)
339
- if success:
340
- lines.append("| Matches requirements | ✅ Pass |")
341
- elif has_code and review_feedback:
342
- if 'requirement' in review_feedback.lower() or 'missing' in review_feedback.lower():
343
- lines.append("| Matches requirements | ❌ Fail |")
344
- else:
345
- lines.append("| Matches requirements | ✅ Pass |")
346
- elif has_code:
347
- lines.append("| Matches requirements | ⬜ Pending |")
348
- else:
349
- lines.append("| Matches requirements | ⬜ N/A |")
350
-
351
- # 6. Code review
352
- if review_feedback:
353
- if success:
354
- lines.append("| Code review | ✅ Approved |")
355
- else:
356
- lines.append("| Code review | ❌ Rejected |")
357
- else:
358
- lines.append("| Code review | ⬜ Pending |")
359
-
360
- # Cost at bottom
361
- lines.append(f"\n**Cost:** ${total_cost:.4f}")
362
 
363
  return "\n".join(lines)
364
 
365
 
366
  def format_plan_display(plan: str) -> str:
367
- """Format plan as numbered implementation steps (7-8 max)."""
368
  if not plan:
369
  return ""
370
 
371
- lines = ["## Implementation Plan\n"]
 
 
 
372
  plan_lines = plan.split('\n')
373
  steps = []
374
 
375
- # Section headers to skip (template sections, not actual steps)
376
- skip_patterns = [
377
- 'title:', 'description:', 'overview:', 'summary:', 'objective:',
378
- 'installation:', 'running', 'testing', 'use case:', 'example:',
379
- 'note:', 'warning:', 'important:', 'files:', 'dependencies:',
380
- 'prerequisites:', 'requirements:', 'setup:', 'configuration:',
381
- 'readme', 'docstring', 'documentation'
382
- ]
383
-
384
- def is_section_header(text: str) -> bool:
385
- """Check if text is a section header, not an action step."""
386
- text_lower = text.lower().strip()
387
- # Skip if starts with common header patterns
388
- for pattern in skip_patterns:
389
- if text_lower.startswith(pattern):
390
- return True
391
- # Skip if it's just a label ending with colon and nothing else useful
392
- if text_lower.endswith(':') and len(text_lower) < 30:
393
- return True
394
- return False
395
-
396
- # Strategy 1: Look for existing numbered steps (1., 2., etc.)
397
  for line in plan_lines:
398
  stripped = line.strip()
399
- # Match numbered items like "1.", "1)", "1:"
400
- if stripped and len(stripped) > 2:
401
- import re
402
  match = re.match(r'^(\d+)[.)\]:]\s*(.+)', stripped)
403
  if match:
 
404
  step_text = match.group(2).strip()
405
- # Skip section headers, file paths, and too short items
406
- if (len(step_text) > 10 and
407
- not step_text.startswith('/') and
408
- not is_section_header(step_text)):
409
- steps.append(step_text)
410
-
411
- # Strategy 2: Look for bullet points if no numbered steps found
412
- if len(steps) < 3:
413
- steps = []
414
- for line in plan_lines:
415
- stripped = line.strip()
416
- # Match bullet points
417
- if stripped.startswith(('-', '*', '•')) and len(stripped) > 5:
418
- step_text = stripped.lstrip('-*• ').strip()
419
- # Skip headers, file paths, section headers
420
- if (len(step_text) > 15 and
421
- not step_text.startswith('#') and
422
- not step_text.startswith('/') and
423
- not is_section_header(step_text)):
424
- steps.append(step_text)
425
-
426
- # Strategy 3: Extract key sentences with action verbs
427
- if len(steps) < 3:
428
- steps = []
429
- action_verbs = ['create', 'add', 'implement', 'write', 'update', 'modify',
430
- 'define', 'set up', 'configure', 'import', 'export', 'build']
431
- for line in plan_lines:
432
- stripped = line.strip()
433
- stripped_lower = stripped.lower()
434
- # Skip section headers
435
- if is_section_header(stripped):
436
- continue
437
- for verb in action_verbs:
438
- if verb in stripped_lower and len(stripped) > 20:
439
- # Clean up the line
440
- clean = stripped.lstrip('-*• 0123456789.):]').strip()
441
- if clean and clean not in steps and not is_section_header(clean):
442
- steps.append(clean)
443
- break
444
-
445
- # Deduplicate and limit to 8 steps
446
- seen = set()
447
- unique_steps = []
448
- for step in steps:
449
- step_lower = step.lower()[:30] # Compare first 30 chars
450
- if step_lower not in seen:
451
- seen.add(step_lower)
452
- unique_steps.append(step)
453
- steps = unique_steps[:8]
454
-
455
- # Format as numbered list
456
- if steps:
457
- for i, step in enumerate(steps, 1):
458
- # Truncate long steps
459
- if len(step) > 80:
460
- step = step[:77] + '...'
461
- lines.append(f"{i}. {step}")
462
  else:
463
- # Fallback: show first meaningful line
464
- for line in plan_lines:
465
- if line.strip() and not line.startswith('#') and not is_section_header(line):
466
- lines.append(f"1. {line.strip()[:80]}")
467
- break
468
- if len(lines) == 1:
469
- lines.append("1. Implementation plan created")
 
470
 
471
  lines.append("")
472
  return "\n".join(lines)
@@ -479,16 +378,9 @@ async def start():
479
  print("[CHAINLIT] on_chat_start triggered")
480
 
481
  await cl.Message(
482
- content=f"# CodePilot - Autonomous AI Coding Agent\n\n"
483
- f"**Version:** `{APP_VERSION}` | **Build:** `{BUILD_ID}`\n\n"
484
- "I can help you write code, fix bugs, and implement features!\n\n"
485
- "**How to use:**\n"
486
- "1. Paste a **public GitHub URL** and I'll clone and analyze it\n"
487
- "2. Tell me what you want to build or fix\n"
488
- "3. Watch my agents (Explorer → Planner → Coder → Reviewer) work!\n\n"
489
- "**Example:**\n"
490
- "```\nhttps://github.com/pallets/flask add a health check endpoint example\n```\n\n"
491
- "**Ready!** Paste a GitHub URL with your task."
492
  ).send()
493
 
494
  print("[CHAINLIT] Welcome message sent")
@@ -623,7 +515,7 @@ async def main(message: cl.Message):
623
 
624
  # 2. Then show code
625
  if result.get('code_changes'):
626
- await cl.Message(content="## Generated Code\n\n" + format_code_output(result['code_changes'])).send()
627
 
628
  # 3. Finally show result table
629
  await cl.Message(content=format_final_result(result, total_cost)).send()
@@ -722,7 +614,7 @@ AVAILABLE TOOLS:
722
 
723
  # 2. Then show generated code
724
  if result.get('code_changes'):
725
- await cl.Message(content="## Generated Code\n\n" + format_code_output(result['code_changes'])).send()
726
 
727
  # 3. Finally show result table
728
  await cl.Message(content=format_final_result(result, total_cost)).send()
 
22
  # ============================================================
23
  # STARTUP VERSION CHECK - Change this to detect if rebuild worked
24
  # ============================================================
25
+ APP_VERSION = "3.7.0-clean-ui"
26
+ BUILD_ID = "2026-01-14-v1"
27
  print("=" * 60)
28
  print(f"[STARTUP] CodePilot Chainlit App")
29
  print(f"[STARTUP] APP_VERSION: {APP_VERSION}")
 
80
  if not code_changes:
81
  return "No code changes."
82
 
83
+ output = ["## 💻 Generated Code\n"]
84
+
85
+ # Summary
86
+ file_count = len(code_changes)
87
+ total_lines = sum(len(content.split('\n')) for content in code_changes.values())
88
+ output.append(f"**{file_count} file{'s' if file_count != 1 else ''} • {total_lines} lines**\n")
89
+
90
  for file_path, content in code_changes.items():
91
  # Get just the filename for display
92
  filename = os.path.basename(file_path)
 
95
 
96
  # Use collapsible details/summary
97
  output.append(f"<details>")
98
+ output.append(f"<summary>📄 <strong>{filename}</strong> ({line_count} lines)</summary>")
99
  output.append(f"")
100
+ output.append(f"**Path:** `{file_path}`\n")
101
  output.append(f"```{lang}")
102
  output.append(content)
103
  output.append("```")
 
230
  if done:
231
  return "✅"
232
  elif active:
233
+ return "🔄"
234
  else:
235
+ return "⏸️"
236
 
237
  def get_activity(agent: str) -> str:
238
  """Get activity text for an agent."""
 
240
 
241
  if agent == 'Explorer':
242
  if status['explorer_done']:
243
+ return status.get('explorer_activity') or 'Complete'
244
  elif current == 'Explorer':
245
  return status.get('explorer_activity') or 'Analyzing codebase...'
246
+ return ''
247
 
248
  elif agent == 'Planner':
249
  if status['planner_done']:
250
+ return 'Complete'
251
  elif current == 'Planner':
252
  return status.get('planner_activity') or 'Creating plan...'
253
+ return ''
254
 
255
  elif agent == 'Coder':
256
  if status['coder_done']:
 
258
  if activity:
259
  return activity
260
  files = status.get('files_written', 0)
261
+ return f'Complete ({files} files)' if files else 'Complete'
262
  elif current == 'Coder':
263
  return status.get('coder_activity') or 'Writing code...'
264
+ return ''
265
 
266
  elif agent == 'Reviewer':
267
  if status['reviewer_done']:
268
  if status['approved']:
269
+ return '**Approved ✓**'
270
  else:
271
+ return '**Needs revision**'
272
  elif current == 'Reviewer':
273
  return status.get('reviewer_activity') or 'Reviewing...'
274
+ return ''
275
 
276
+ return ''
277
 
278
  current = status['current_agent']
279
 
280
+ lines = []
281
+
282
+ # Progress bar
283
+ done_count = sum([status['explorer_done'], status['planner_done'],
284
+ status['coder_done'], status['reviewer_done']])
285
+ progress_bar = "█" * done_count + "░" * (4 - done_count)
286
+ lines.append(f"**Progress:** {progress_bar} {done_count}/4 agents")
287
+ lines.append("")
288
 
289
+ # Agent status
290
+ lines.append(f"{icon(status['explorer_done'], current == 'Explorer')} **Explorer** {get_activity('Explorer')}")
291
+ lines.append(f"{icon(status['planner_done'], current == 'Planner')} **Planner** {get_activity('Planner')}")
292
+ lines.append(f"{icon(status['coder_done'], current == 'Coder')} **Coder** {get_activity('Coder')}")
293
+ lines.append(f"{icon(status['reviewer_done'], current == 'Reviewer')} **Reviewer** {get_activity('Reviewer')}")
294
+
295
+ lines.append(f"\n💰 **Cost:** ${total_cost:.4f}")
296
 
297
  return "\n".join(lines)
298
 
299
 
300
  def format_final_result(result: dict, total_cost: float) -> str:
301
  """Format final result with detailed test checks."""
 
 
302
  success = result.get('success', False)
 
303
  code_changes = result.get('code_changes', {})
 
304
  file_count = len(code_changes) if code_changes else 0
305
  review_feedback = result.get('review_feedback', '')
306
 
307
+ lines = []
 
 
 
 
 
308
 
309
+ # Overall status
 
 
 
 
 
 
310
  if success:
311
+ lines.append("##Task Complete!\n")
312
+ lines.append(f"**Files changed:** {file_count}")
313
+ lines.append(f"**Review:** Approved")
314
+ elif code_changes:
315
+ lines.append("## ⚠️ Code Written (Needs Revision)\n")
316
+ lines.append(f"**Files changed:** {file_count}")
317
+ lines.append(f"**Review:** Needs changes")
318
+ if review_feedback:
319
+ lines.append(f"\n**Feedback:**\n{review_feedback}")
320
  else:
321
+ lines.append("## Task Failed\n")
322
+ error = result.get('error', 'Unknown error')
323
+ lines.append(f"**Error:** {error}")
324
 
325
+ lines.append(f"\n💰 **Cost:** ${total_cost:.4f}")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
326
 
327
  return "\n".join(lines)
328
 
329
 
330
  def format_plan_display(plan: str) -> str:
331
+ """Format plan cleanly with a simple summary."""
332
  if not plan:
333
  return ""
334
 
335
+ lines = ["## 📋 Implementation Plan\n"]
336
+
337
+ # Simple approach: just show the plan in a clean format
338
+ # Extract key steps if numbered, otherwise show abbreviated version
339
  plan_lines = plan.split('\n')
340
  steps = []
341
 
342
+ import re
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
343
  for line in plan_lines:
344
  stripped = line.strip()
345
+ # Match numbered items like "1.", "2.", etc.
346
+ if stripped:
 
347
  match = re.match(r'^(\d+)[.)\]:]\s*(.+)', stripped)
348
  if match:
349
+ step_num = match.group(1)
350
  step_text = match.group(2).strip()
351
+ if len(step_text) > 10 and not step_text.startswith('/'):
352
+ # Truncate long steps
353
+ if len(step_text) > 100:
354
+ step_text = step_text[:97] + '...'
355
+ steps.append(f"{step_num}. {step_text}")
356
+
357
+ if steps and len(steps) <= 10:
358
+ # Show numbered steps if we found them
359
+ lines.extend(steps)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
360
  else:
361
+ # Otherwise just show first few lines of the plan
362
+ preview_lines = [l.strip() for l in plan_lines[:8] if l.strip() and not l.strip().startswith('#')]
363
+ if preview_lines:
364
+ lines.append('\n'.join(preview_lines[:5]))
365
+ if len(preview_lines) > 5:
366
+ lines.append("\n*...plan continues...*")
367
+ else:
368
+ lines.append("Plan created successfully")
369
 
370
  lines.append("")
371
  return "\n".join(lines)
 
378
  print("[CHAINLIT] on_chat_start triggered")
379
 
380
  await cl.Message(
381
+ content="👋 **CodePilot ready!**\n\n"
382
+ "Paste a GitHub URL + your task to get started.\n\n"
383
+ "*The welcome screen above explains everything you need to know.*"
 
 
 
 
 
 
 
384
  ).send()
385
 
386
  print("[CHAINLIT] Welcome message sent")
 
515
 
516
  # 2. Then show code
517
  if result.get('code_changes'):
518
+ await cl.Message(content=format_code_output(result['code_changes'])).send()
519
 
520
  # 3. Finally show result table
521
  await cl.Message(content=format_final_result(result, total_cost)).send()
 
614
 
615
  # 2. Then show generated code
616
  if result.get('code_changes'):
617
+ await cl.Message(content=format_code_output(result['code_changes'])).send()
618
 
619
  # 3. Finally show result table
620
  await cl.Message(content=format_final_result(result, total_cost)).send()