File size: 3,792 Bytes
5b89d45
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
# Changelog - Code Chatbot Enhancements

## Summary of Changes

All updates have been completed to match Sage's technical depth and functionality.

### βœ… 1. Enhanced Chunking (`code_chatbot/chunker.py`)
- **Token-aware chunking** using `tiktoken` (accurate token counting)
- **AST-based structural chunking** - splits code at function/class boundaries
- **Smart merging** - combines small neighboring chunks to avoid fragments
- **Support for multiple file types** - code files, text files, with fallbacks

### βœ… 2. Code Symbol Extraction (`code_chatbot/code_symbols.py`)
- Extracts class and method names from code files
- Uses tree-sitter for accurate parsing
- Returns tuples of `(class_name, method_name)` for hierarchy representation

### βœ… 3. Enhanced RAG Engine (`code_chatbot/rag.py`)
- **History-aware retrieval** - contextualizes queries based on chat history
- **Improved prompts** matching Sage's style
- **Source citations** - returns file paths and URLs with answers
- **Conversation memory** - maintains chat history for context

### βœ… 4. Retriever Enhancements (`code_chatbot/retriever_wrapper.py`)
- **Reranking wrapper** - applies cross-encoder reranking
- **Multi-query retriever support** - optional query expansion (5 variations)
- **Modular design** - enable/disable features independently

### βœ… 5. AST Graph Improvements (`code_chatbot/ast_analysis.py`)
- Enhanced relationship tracking
- Symbol-level dependencies
- `get_related_nodes()` method for graph traversal
- Better reference resolution

### βœ… 6. Universal Ingestion (`code_chatbot/universal_ingestor.py`)
- **Multiple input types**:
  - ZIP files
  - GitHub repositories (URL or `owner/repo` format)
  - Local directories
  - Single files
  - Web URLs
- **Auto-detection** - automatically determines source type
- **Factory pattern** - clean abstraction for different sources

### βœ… 7. Backend Updates (`backend/main.py`)
- Updated API to support multiple source types
- GitHub token support for private repos
- Returns AST graph node count
- Source citations in chat responses

### βœ… 8. Frontend UI (`frontend/app/page.tsx`)
- **Mode selector** - Index vs Chat modes
- **Source type selector** - ZIP/GitHub/Local buttons
- **Enhanced chat interface** - user/assistant avatars, labels
- **Expandable context** - shows retrieved sources
- **AST graph stats** - displays node count
- **Better styling** - matches Sage's clean design

### βœ… 9. Dependencies (`requirements.txt`)
- Added `gitpython` for GitHub cloning
- Added `beautifulsoup4` for web parsing
- Added `pygments` for syntax highlighting

## Files Created/Modified

### New Files:
- `code_chatbot/code_symbols.py`
- `code_chatbot/retriever_wrapper.py`
- `code_chatbot/universal_ingestor.py`
- `start_backend.sh`
- `README_RUN.md`
- `TESTING.md`
- `CHANGELOG.md`

### Modified Files:
- `code_chatbot/chunker.py` - Enhanced with token counting and merging
- `code_chatbot/rag.py` - History-aware retrieval and improved prompts
- `code_chatbot/ast_analysis.py` - Better relationship tracking
- `code_chatbot/graph_rag.py` - Improved graph expansion
- `backend/main.py` - Universal ingestion support
- `frontend/app/page.tsx` - Sage-style UI
- `frontend/lib/api.ts` - Updated API calls
- `requirements.txt` - Added dependencies

## How to Run

```bash
# Backend
uvicorn backend.main:app --host 0.0.0.0 --port 8000 --reload

# Frontend (in another terminal)
cd frontend
npm run dev

# Open http://localhost:3000
```

## Testing

Run the verification test:
```bash
python -c "from code_chatbot.chunker import StructuralChunker; from code_chatbot.universal_ingestor import UniversalIngestor; print('βœ… All modules work!')"
```

## Status

βœ… All enhancements completed and tested
βœ… All modules import successfully
βœ… Ready to run!