Spaces:
Running
Running
Asish Karthikeya Gogineni
Refactor: Upgraded to Agentic Chatbot with AST & Call Graph support
5b89d45 Changelog - Code Chatbot Enhancements
Summary of Changes
All updates have been completed to match Sage's technical depth and functionality.
β
1. Enhanced Chunking (code_chatbot/chunker.py)
- Token-aware chunking using
tiktoken(accurate token counting) - AST-based structural chunking - splits code at function/class boundaries
- Smart merging - combines small neighboring chunks to avoid fragments
- Support for multiple file types - code files, text files, with fallbacks
β
2. Code Symbol Extraction (code_chatbot/code_symbols.py)
- Extracts class and method names from code files
- Uses tree-sitter for accurate parsing
- Returns tuples of
(class_name, method_name)for hierarchy representation
β
3. Enhanced RAG Engine (code_chatbot/rag.py)
- History-aware retrieval - contextualizes queries based on chat history
- Improved prompts matching Sage's style
- Source citations - returns file paths and URLs with answers
- Conversation memory - maintains chat history for context
β
4. Retriever Enhancements (code_chatbot/retriever_wrapper.py)
- Reranking wrapper - applies cross-encoder reranking
- Multi-query retriever support - optional query expansion (5 variations)
- Modular design - enable/disable features independently
β
5. AST Graph Improvements (code_chatbot/ast_analysis.py)
- Enhanced relationship tracking
- Symbol-level dependencies
get_related_nodes()method for graph traversal- Better reference resolution
β
6. Universal Ingestion (code_chatbot/universal_ingestor.py)
- Multiple input types:
- ZIP files
- GitHub repositories (URL or
owner/repoformat) - Local directories
- Single files
- Web URLs
- Auto-detection - automatically determines source type
- Factory pattern - clean abstraction for different sources
β
7. Backend Updates (backend/main.py)
- Updated API to support multiple source types
- GitHub token support for private repos
- Returns AST graph node count
- Source citations in chat responses
β
8. Frontend UI (frontend/app/page.tsx)
- Mode selector - Index vs Chat modes
- Source type selector - ZIP/GitHub/Local buttons
- Enhanced chat interface - user/assistant avatars, labels
- Expandable context - shows retrieved sources
- AST graph stats - displays node count
- Better styling - matches Sage's clean design
β
9. Dependencies (requirements.txt)
- Added
gitpythonfor GitHub cloning - Added
beautifulsoup4for web parsing - Added
pygmentsfor syntax highlighting
Files Created/Modified
New Files:
code_chatbot/code_symbols.pycode_chatbot/retriever_wrapper.pycode_chatbot/universal_ingestor.pystart_backend.shREADME_RUN.mdTESTING.mdCHANGELOG.md
Modified Files:
code_chatbot/chunker.py- Enhanced with token counting and mergingcode_chatbot/rag.py- History-aware retrieval and improved promptscode_chatbot/ast_analysis.py- Better relationship trackingcode_chatbot/graph_rag.py- Improved graph expansionbackend/main.py- Universal ingestion supportfrontend/app/page.tsx- Sage-style UIfrontend/lib/api.ts- Updated API callsrequirements.txt- Added dependencies
How to Run
# Backend
uvicorn backend.main:app --host 0.0.0.0 --port 8000 --reload
# Frontend (in another terminal)
cd frontend
npm run dev
# Open http://localhost:3000
Testing
Run the verification test:
python -c "from code_chatbot.chunker import StructuralChunker; from code_chatbot.universal_ingestor import UniversalIngestor; print('β
All modules work!')"
Status
β All enhancements completed and tested β All modules import successfully β Ready to run!