Spaces:
Running
Running
File size: 3,792 Bytes
5b89d45 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 | # Changelog - Code Chatbot Enhancements
## Summary of Changes
All updates have been completed to match Sage's technical depth and functionality.
### β
1. Enhanced Chunking (`code_chatbot/chunker.py`)
- **Token-aware chunking** using `tiktoken` (accurate token counting)
- **AST-based structural chunking** - splits code at function/class boundaries
- **Smart merging** - combines small neighboring chunks to avoid fragments
- **Support for multiple file types** - code files, text files, with fallbacks
### β
2. Code Symbol Extraction (`code_chatbot/code_symbols.py`)
- Extracts class and method names from code files
- Uses tree-sitter for accurate parsing
- Returns tuples of `(class_name, method_name)` for hierarchy representation
### β
3. Enhanced RAG Engine (`code_chatbot/rag.py`)
- **History-aware retrieval** - contextualizes queries based on chat history
- **Improved prompts** matching Sage's style
- **Source citations** - returns file paths and URLs with answers
- **Conversation memory** - maintains chat history for context
### β
4. Retriever Enhancements (`code_chatbot/retriever_wrapper.py`)
- **Reranking wrapper** - applies cross-encoder reranking
- **Multi-query retriever support** - optional query expansion (5 variations)
- **Modular design** - enable/disable features independently
### β
5. AST Graph Improvements (`code_chatbot/ast_analysis.py`)
- Enhanced relationship tracking
- Symbol-level dependencies
- `get_related_nodes()` method for graph traversal
- Better reference resolution
### β
6. Universal Ingestion (`code_chatbot/universal_ingestor.py`)
- **Multiple input types**:
- ZIP files
- GitHub repositories (URL or `owner/repo` format)
- Local directories
- Single files
- Web URLs
- **Auto-detection** - automatically determines source type
- **Factory pattern** - clean abstraction for different sources
### β
7. Backend Updates (`backend/main.py`)
- Updated API to support multiple source types
- GitHub token support for private repos
- Returns AST graph node count
- Source citations in chat responses
### β
8. Frontend UI (`frontend/app/page.tsx`)
- **Mode selector** - Index vs Chat modes
- **Source type selector** - ZIP/GitHub/Local buttons
- **Enhanced chat interface** - user/assistant avatars, labels
- **Expandable context** - shows retrieved sources
- **AST graph stats** - displays node count
- **Better styling** - matches Sage's clean design
### β
9. Dependencies (`requirements.txt`)
- Added `gitpython` for GitHub cloning
- Added `beautifulsoup4` for web parsing
- Added `pygments` for syntax highlighting
## Files Created/Modified
### New Files:
- `code_chatbot/code_symbols.py`
- `code_chatbot/retriever_wrapper.py`
- `code_chatbot/universal_ingestor.py`
- `start_backend.sh`
- `README_RUN.md`
- `TESTING.md`
- `CHANGELOG.md`
### Modified Files:
- `code_chatbot/chunker.py` - Enhanced with token counting and merging
- `code_chatbot/rag.py` - History-aware retrieval and improved prompts
- `code_chatbot/ast_analysis.py` - Better relationship tracking
- `code_chatbot/graph_rag.py` - Improved graph expansion
- `backend/main.py` - Universal ingestion support
- `frontend/app/page.tsx` - Sage-style UI
- `frontend/lib/api.ts` - Updated API calls
- `requirements.txt` - Added dependencies
## How to Run
```bash
# Backend
uvicorn backend.main:app --host 0.0.0.0 --port 8000 --reload
# Frontend (in another terminal)
cd frontend
npm run dev
# Open http://localhost:3000
```
## Testing
Run the verification test:
```bash
python -c "from code_chatbot.chunker import StructuralChunker; from code_chatbot.universal_ingestor import UniversalIngestor; print('β
All modules work!')"
```
## Status
β
All enhancements completed and tested
β
All modules import successfully
β
Ready to run!
|