Asish Karthikeya Gogineni commited on
Commit
a360b5d
ยท
1 Parent(s): 6d5c110

Deploy to Hugging Face Spaces

Browse files
Files changed (1) hide show
  1. README.md +21 -133
README.md CHANGED
@@ -1,144 +1,32 @@
1
- # ๐Ÿ•ท๏ธ Code Crawler - Intelligent Codebase Agent
 
 
 
 
 
 
 
 
 
 
2
 
3
- An AI-powered codebase assistant that understands your code and helps you navigate, analyze, and modify it. Built with RAG (Retrieval-Augmented Generation), MCP (Model Context Protocol), and CrewAI multi-agent workflows.
4
 
5
- ## โœจ Features
6
 
7
- ### ๐Ÿ’ฌ Chat Mode
8
- - Ask questions about your codebase
9
- - Get explanations of functions, modules, and workflows
10
- - Understand code architecture and data flow
11
 
12
- ### ๐Ÿ” Search Mode (MCP-Powered)
13
- - Regex pattern matching across your entire codebase
14
- - Context-aware search results with surrounding code
15
- - File pattern filtering (glob)
16
 
17
- ### ๐Ÿ”ง Refactor Mode (MCP-Powered)
18
- - Automated search-and-replace refactorings
19
- - Dry-run preview before applying changes
20
- - Common refactoring patterns built-in
21
 
22
- ### โœจ Generate Mode (AI-Powered)
23
- - Generate complete features from descriptions
24
- - Follows your codebase's existing patterns
25
- - Includes tests and documentation
26
-
27
- ## ๐Ÿš€ Quick Start
28
-
29
- ### 1. Install Dependencies
30
- ```bash
31
- pip install -r requirements.txt
32
- ```
33
-
34
- ### 2. Set Environment Variables
35
- ```bash
36
- export GOOGLE_API_KEY="your-api-key"
37
- ```
38
- Or create a `.env` file:
39
- ```
40
- GOOGLE_API_KEY=your-api-key
41
- ```
42
-
43
- ### 3. Run the App
44
- ```bash
45
- streamlit run app.py
46
- ```
47
-
48
- ### 4. Use the App
49
  1. Upload a ZIP file of your codebase
50
  2. Click "Process & Index"
51
  3. Start chatting or switch modes!
52
 
53
- ## ๐Ÿ“ Project Structure
54
-
55
- ```
56
- Codebase_Agent/
57
- โ”œโ”€โ”€ app.py # Main Streamlit application
58
- โ”‚
59
- โ”œโ”€โ”€ code_chatbot/ # Core library
60
- โ”‚ โ”‚
61
- โ”‚ โ”‚โ”€โ”€ Core RAG Engine
62
- โ”‚ โ”œโ”€โ”€ rag.py # Chat engine with RAG
63
- โ”‚ โ”œโ”€โ”€ prompts.py # System prompts
64
- โ”‚ โ”œโ”€โ”€ config.py # Centralized configuration
65
- โ”‚ โ”‚
66
- โ”‚ โ”‚โ”€โ”€ Indexing & Chunking
67
- โ”‚ โ”œโ”€โ”€ indexer.py # Vector database indexing
68
- โ”‚ โ”œโ”€โ”€ chunker.py # AST-aware code chunking
69
- โ”‚ โ”œโ”€โ”€ merkle_tree.py # Incremental change detection
70
- โ”‚ โ”œโ”€โ”€ incremental_indexing.py # Incremental indexing logic
71
- โ”‚ โ”œโ”€โ”€ indexing_progress.py # Progress tracking UI
72
- โ”‚ โ”œโ”€โ”€ path_obfuscator.py # Privacy-preserving paths
73
- โ”‚ โ”‚
74
- โ”‚ โ”‚โ”€โ”€ Retrieval
75
- โ”‚ โ”œโ”€โ”€ retriever_wrapper.py # Enhanced retriever
76
- โ”‚ โ”œโ”€โ”€ llm_retriever.py # LLM-based retrieval
77
- โ”‚ โ”œโ”€โ”€ reranker.py # Result reranking
78
- โ”‚ โ”œโ”€โ”€ graph_rag.py # Graph-enhanced RAG
79
- โ”‚ โ”‚
80
- โ”‚ โ”‚โ”€โ”€ Code Analysis
81
- โ”‚ โ”œโ”€โ”€ ast_analysis.py # AST parsing & call graphs
82
- โ”‚ โ”œโ”€โ”€ code_symbols.py # Symbol extraction
83
- โ”‚ โ”‚
84
- โ”‚ โ”‚โ”€โ”€ MCP Tools
85
- โ”‚ โ”œโ”€โ”€ mcp_server.py # MCP server (search, refactor)
86
- โ”‚ โ”œโ”€โ”€ mcp_client.py # MCP client interface
87
- โ”‚ โ”‚
88
- โ”‚ โ”‚โ”€โ”€ Multi-Agent (CrewAI)
89
- โ”‚ โ”œโ”€โ”€ agents/ # Agent definitions
90
- โ”‚ โ”œโ”€โ”€ crews/ # Crew workflows
91
- โ”‚ โ”œโ”€โ”€ agent_workflow.py # Agent orchestration
92
- โ”‚ โ”œโ”€โ”€ tools.py # Agent tools
93
- โ”‚ โ”‚
94
- โ”‚ โ”‚โ”€โ”€ Utilities
95
- โ”‚ โ”œโ”€โ”€ universal_ingestor.py # File ingestion (ZIP, GitHub, Web)
96
- โ”‚ โ””โ”€โ”€ rate_limiter.py # API rate limiting
97
- โ”‚
98
- โ”œโ”€โ”€ components/ # Streamlit UI components
99
- โ”‚ โ””โ”€โ”€ multi_mode.py # Mode selector & interfaces
100
- โ”‚
101
- โ”œโ”€โ”€ api/ # FastAPI REST endpoints
102
- โ”‚ โ”œโ”€โ”€ main.py # API entry point
103
- โ”‚ โ”œโ”€โ”€ routes/ # Route handlers
104
- โ”‚ โ””โ”€โ”€ schemas.py # Pydantic models
105
- โ”‚
106
- โ”œโ”€โ”€ docs/ # Documentation
107
- โ”‚ โ””โ”€โ”€ RAG_PIPELINE.md # Technical documentation
108
- โ”‚
109
- โ”œโ”€โ”€ tests/ # Test suite
110
- โ”‚
111
- โ””โ”€โ”€ assets/ # Static assets (logo, etc.)
112
- ```
113
-
114
- ## ๐Ÿ”ง Configuration
115
-
116
- All configuration is centralized in `code_chatbot/config.py`:
117
-
118
- ```python
119
- from code_chatbot.config import get_default_config
120
-
121
- config = get_default_config()
122
- print(config.chunking.max_chunk_size) # 1000
123
- print(config.retrieval.top_k) # 10
124
- ```
125
-
126
- ## ๐Ÿ› ๏ธ Technology Stack
127
-
128
- | Component | Technology |
129
- |-----------|------------|
130
- | **UI** | Streamlit |
131
- | **LLM** | Google Gemini |
132
- | **Embeddings** | gemini-embedding-001 |
133
- | **Vector DB** | ChromaDB / FAISS / Qdrant |
134
- | **RAG** | LangChain |
135
- | **Agents** | CrewAI |
136
- | **Code Tools** | MCP (Model Context Protocol) |
137
-
138
- ## ๐Ÿ“– Documentation
139
-
140
- - [RAG Pipeline](docs/RAG_PIPELINE.md) - Technical deep-dive
141
-
142
- ## ๐Ÿ“„ License
143
 
144
- Apache 2.0 - See [LICENSE](LICENSE)
 
1
+ ---
2
+ title: Code Crawler
3
+ emoji: ๐Ÿ•ท๏ธ
4
+ colorFrom: blue
5
+ colorTo: purple
6
+ sdk: streamlit
7
+ sdk_version: 1.32.0
8
+ app_file: app.py
9
+ pinned: false
10
+ license: apache-2.0
11
+ ---
12
 
13
+ # ๐Ÿ•ท๏ธ Code Crawler - AI Codebase Agent
14
 
15
+ An AI-powered codebase assistant that understands your code and helps you navigate, analyze, and modify it.
16
 
17
+ ## Features
 
 
 
18
 
19
+ - ๐Ÿ’ฌ **Chat Mode** - Ask questions about your codebase
20
+ - ๐Ÿ” **Search Mode** - Find patterns with regex
21
+ - ๐Ÿ”ง **Refactor Mode** - Automated code refactoring
22
+ - โœจ **Generate Mode** - Create new features
23
 
24
+ ## Usage
 
 
 
25
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
  1. Upload a ZIP file of your codebase
27
  2. Click "Process & Index"
28
  3. Start chatting or switch modes!
29
 
30
+ ## Requirements
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31
 
32
+ Set your `GOOGLE_API_KEY` in the Secrets section.