Asish Karthikeya Gogineni commited on
Commit
23a9623
Β·
1 Parent(s): 7dec411

docs: Organize project with clear README and clean exports

Browse files
Files changed (2) hide show
  1. README.md +125 -52
  2. code_chatbot/__init__.py +35 -0
README.md CHANGED
@@ -1,71 +1,144 @@
1
- # Codebase Agent πŸ•·οΈ
2
 
3
- **Codebase Agent** is an intelligent, local-first code analysis tool that helps you understand, navigate, and query your codebase using advanced AI agents.
4
 
5
- Think of it as a private, super-powered developer assistant that knows your code inside out.
6
 
7
- ![Screenshot](assets/logo.png)
 
 
 
8
 
9
- ## ✨ Key Features
 
 
 
10
 
11
- - **πŸ›‘οΈ 100% Local option**: Run with Ollama + local embeddings for complete privacy.
12
- - **🧠 Agentic Reasoning**: Uses AST (Abstract Syntax Tree) analysis and Call Graphs to trace execution and dependencies.
13
- - **πŸ•ΈοΈ Call Graph Navigation**: Ask questions like "Who calls `database.connect`?" or "What acts as the entry point?".
14
- - **⚑ Multiple Providers**: Support for **Google Gemini** (1M+ context), **Groq** (fast inference), and standard OpenAI-compatible APIs.
15
- - **πŸ“‚ Universal Ingestion**: Upload ZIP files or point to GitHub repositories.
16
 
17
- ## πŸš€ Advanced Features (Cursor-Inspired)
18
-
19
- - **πŸ”„ Incremental Indexing**: Merkle tree-based change detection for 10-100x faster re-indexing
20
- - **πŸ”’ Privacy-Preserving**: Optional HMAC-based path obfuscation for sensitive codebases
21
- - **🧩 Semantic Chunking**: AST-based code splitting that respects function/class boundaries
22
- - **πŸ“Š Rich Metadata**: Automatic extraction of symbols, imports, and cyclomatic complexity
23
- - **🎯 Hybrid Search**: Combines semantic similarity with keyword matching
24
- - **βš™οΈ Highly Configurable**: Fine-tune chunking, retrieval, and privacy settings
25
-
26
- **[πŸ“– Read the Technical Deep-Dive](docs/RAG_PIPELINE.md)** to understand how our RAG pipeline works.
27
 
28
  ## πŸš€ Quick Start
29
 
30
- 1. **Clone the repository**:
31
- ```bash
32
- git clone https://github.com/Asishkarthikeya/Codebase_Agent.git
33
- cd Codebase_Agent
34
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
35
 
36
- 2. **Install dependencies**:
37
- ```bash
38
- pip install -r requirements.txt
39
- ```
40
 
41
- 3. **Run the application**:
42
- ```bash
43
- streamlit run app.py
44
- ```
45
 
46
- 4. **Upload & Chat**:
47
- - Open `http://localhost:8501`
48
- - Enter your API Key (e.g., Gemini or Groq) in the sidebar
49
- - Upload a `.zip` of your code or provide a GitHub URL
50
- - Start chatting!
51
 
52
- ## πŸ”§ Configuration
 
 
 
53
 
54
- The agent creates a `.env` file for your configuration, but you can also set these environment variables manually:
55
 
56
- - `GOOGLE_API_KEY`: For Gemini models
57
- - `GROQ_API_KEY`: For Groq models
58
- - `QDRANT_API_KEY`: For Qdrant vector DB
 
 
 
 
 
 
59
 
60
- ## πŸ€– Agent Credentials
61
 
62
- This project uses:
63
- - **Streamlit** for the UI
64
- - **LangChain** for orchestration
65
- - **ChromaDB** for vector storage
66
- - **NetworkX** for code graph analysis
67
- - **Tree-sitter** for robust parsing
68
 
69
- ## License
70
 
71
- MIT License. See [LICENSE](LICENSE) for details.
 
1
+ # πŸ•·οΈ Code Crawler - Intelligent Codebase Agent
2
 
3
+ An AI-powered codebase assistant that understands your code and helps you navigate, analyze, and modify it. Built with RAG (Retrieval-Augmented Generation), MCP (Model Context Protocol), and CrewAI multi-agent workflows.
4
 
5
+ ## ✨ Features
6
 
7
+ ### πŸ’¬ Chat Mode
8
+ - Ask questions about your codebase
9
+ - Get explanations of functions, modules, and workflows
10
+ - Understand code architecture and data flow
11
 
12
+ ### πŸ” Search Mode (MCP-Powered)
13
+ - Regex pattern matching across your entire codebase
14
+ - Context-aware search results with surrounding code
15
+ - File pattern filtering (glob)
16
 
17
+ ### πŸ”§ Refactor Mode (MCP-Powered)
18
+ - Automated search-and-replace refactorings
19
+ - Dry-run preview before applying changes
20
+ - Common refactoring patterns built-in
 
21
 
22
+ ### ✨ Generate Mode (AI-Powered)
23
+ - Generate complete features from descriptions
24
+ - Follows your codebase's existing patterns
25
+ - Includes tests and documentation
 
 
 
 
 
 
26
 
27
  ## πŸš€ Quick Start
28
 
29
+ ### 1. Install Dependencies
30
+ ```bash
31
+ pip install -r requirements.txt
32
+ ```
33
+
34
+ ### 2. Set Environment Variables
35
+ ```bash
36
+ export GOOGLE_API_KEY="your-api-key"
37
+ ```
38
+ Or create a `.env` file:
39
+ ```
40
+ GOOGLE_API_KEY=your-api-key
41
+ ```
42
+
43
+ ### 3. Run the App
44
+ ```bash
45
+ streamlit run app.py
46
+ ```
47
+
48
+ ### 4. Use the App
49
+ 1. Upload a ZIP file of your codebase
50
+ 2. Click "Process & Index"
51
+ 3. Start chatting or switch modes!
52
+
53
+ ## πŸ“ Project Structure
54
+
55
+ ```
56
+ Codebase_Agent/
57
+ β”œβ”€β”€ app.py # Main Streamlit application
58
+ β”‚
59
+ β”œβ”€β”€ code_chatbot/ # Core library
60
+ β”‚ β”‚
61
+ β”‚ │── Core RAG Engine
62
+ β”‚ β”œβ”€β”€ rag.py # Chat engine with RAG
63
+ β”‚ β”œβ”€β”€ prompts.py # System prompts
64
+ β”‚ β”œβ”€β”€ config.py # Centralized configuration
65
+ β”‚ β”‚
66
+ β”‚ │── Indexing & Chunking
67
+ β”‚ β”œβ”€β”€ indexer.py # Vector database indexing
68
+ β”‚ β”œβ”€β”€ chunker.py # AST-aware code chunking
69
+ β”‚ β”œβ”€β”€ merkle_tree.py # Incremental change detection
70
+ β”‚ β”œβ”€β”€ incremental_indexing.py # Incremental indexing logic
71
+ β”‚ β”œβ”€β”€ indexing_progress.py # Progress tracking UI
72
+ β”‚ β”œβ”€β”€ path_obfuscator.py # Privacy-preserving paths
73
+ β”‚ β”‚
74
+ β”‚ │── Retrieval
75
+ β”‚ β”œβ”€β”€ retriever_wrapper.py # Enhanced retriever
76
+ β”‚ β”œβ”€β”€ llm_retriever.py # LLM-based retrieval
77
+ β”‚ β”œβ”€β”€ reranker.py # Result reranking
78
+ β”‚ β”œβ”€β”€ graph_rag.py # Graph-enhanced RAG
79
+ β”‚ β”‚
80
+ β”‚ │── Code Analysis
81
+ β”‚ β”œβ”€β”€ ast_analysis.py # AST parsing & call graphs
82
+ β”‚ β”œβ”€β”€ code_symbols.py # Symbol extraction
83
+ β”‚ β”‚
84
+ β”‚ │── MCP Tools
85
+ β”‚ β”œβ”€β”€ mcp_server.py # MCP server (search, refactor)
86
+ β”‚ β”œβ”€β”€ mcp_client.py # MCP client interface
87
+ β”‚ β”‚
88
+ β”‚ │── Multi-Agent (CrewAI)
89
+ β”‚ β”œβ”€β”€ agents/ # Agent definitions
90
+ β”‚ β”œβ”€β”€ crews/ # Crew workflows
91
+ β”‚ β”œβ”€β”€ agent_workflow.py # Agent orchestration
92
+ β”‚ β”œβ”€β”€ tools.py # Agent tools
93
+ β”‚ β”‚
94
+ β”‚ │── Utilities
95
+ β”‚ β”œβ”€β”€ universal_ingestor.py # File ingestion (ZIP, GitHub, Web)
96
+ β”‚ └── rate_limiter.py # API rate limiting
97
+ β”‚
98
+ β”œβ”€β”€ components/ # Streamlit UI components
99
+ β”‚ └── multi_mode.py # Mode selector & interfaces
100
+ β”‚
101
+ β”œβ”€β”€ api/ # FastAPI REST endpoints
102
+ β”‚ β”œβ”€β”€ main.py # API entry point
103
+ β”‚ β”œβ”€β”€ routes/ # Route handlers
104
+ β”‚ └── schemas.py # Pydantic models
105
+ β”‚
106
+ β”œβ”€β”€ docs/ # Documentation
107
+ β”‚ └── RAG_PIPELINE.md # Technical documentation
108
+ β”‚
109
+ β”œβ”€β”€ tests/ # Test suite
110
+ β”‚
111
+ └── assets/ # Static assets (logo, etc.)
112
+ ```
113
 
114
+ ## πŸ”§ Configuration
 
 
 
115
 
116
+ All configuration is centralized in `code_chatbot/config.py`:
 
 
 
117
 
118
+ ```python
119
+ from code_chatbot.config import get_default_config
 
 
 
120
 
121
+ config = get_default_config()
122
+ print(config.chunking.max_chunk_size) # 1000
123
+ print(config.retrieval.top_k) # 10
124
+ ```
125
 
126
+ ## πŸ› οΈ Technology Stack
127
 
128
+ | Component | Technology |
129
+ |-----------|------------|
130
+ | **UI** | Streamlit |
131
+ | **LLM** | Google Gemini |
132
+ | **Embeddings** | gemini-embedding-001 |
133
+ | **Vector DB** | ChromaDB / FAISS / Qdrant |
134
+ | **RAG** | LangChain |
135
+ | **Agents** | CrewAI |
136
+ | **Code Tools** | MCP (Model Context Protocol) |
137
 
138
+ ## πŸ“– Documentation
139
 
140
+ - [RAG Pipeline](docs/RAG_PIPELINE.md) - Technical deep-dive
 
 
 
 
 
141
 
142
+ ## πŸ“„ License
143
 
144
+ Apache 2.0 - See [LICENSE](LICENSE)
code_chatbot/__init__.py CHANGED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Code Chatbot - AI-powered codebase assistant.
3
+
4
+ Core modules:
5
+ - rag: Chat engine with RAG
6
+ - indexer: Vector database indexing
7
+ - chunker: AST-aware code chunking
8
+ - merkle_tree: Incremental change detection
9
+ - mcp_server/mcp_client: Code search & refactoring tools
10
+ - agents/crews: Multi-agent workflows
11
+ """
12
+
13
+ # Core
14
+ from .rag import ChatEngine
15
+ from .config import get_default_config
16
+
17
+ # Indexing
18
+ from .indexer import Indexer
19
+ from .chunker import CodeChunker
20
+
21
+ # Tools
22
+ from .mcp_client import MCPClient
23
+
24
+ __all__ = [
25
+ # Core
26
+ 'ChatEngine',
27
+ 'get_default_config',
28
+ # Indexing
29
+ 'Indexer',
30
+ 'CodeChunker',
31
+ # Tools
32
+ 'MCPClient',
33
+ ]
34
+
35
+ __version__ = "2.0.0"