Asish Karthikeya Gogineni commited on
Commit
a360b5d
Β·
1 Parent(s): 6d5c110

Deploy to Hugging Face Spaces

Browse files
Files changed (1) hide show
  1. README.md +21 -133
README.md CHANGED
@@ -1,144 +1,32 @@
1
- # πŸ•·οΈ Code Crawler - Intelligent Codebase Agent
 
 
 
 
 
 
 
 
 
 
2
 
3
- An AI-powered codebase assistant that understands your code and helps you navigate, analyze, and modify it. Built with RAG (Retrieval-Augmented Generation), MCP (Model Context Protocol), and CrewAI multi-agent workflows.
4
 
5
- ## ✨ Features
6
 
7
- ### πŸ’¬ Chat Mode
8
- - Ask questions about your codebase
9
- - Get explanations of functions, modules, and workflows
10
- - Understand code architecture and data flow
11
 
12
- ### πŸ” Search Mode (MCP-Powered)
13
- - Regex pattern matching across your entire codebase
14
- - Context-aware search results with surrounding code
15
- - File pattern filtering (glob)
16
 
17
- ### πŸ”§ Refactor Mode (MCP-Powered)
18
- - Automated search-and-replace refactorings
19
- - Dry-run preview before applying changes
20
- - Common refactoring patterns built-in
21
 
22
- ### ✨ Generate Mode (AI-Powered)
23
- - Generate complete features from descriptions
24
- - Follows your codebase's existing patterns
25
- - Includes tests and documentation
26
-
27
- ## πŸš€ Quick Start
28
-
29
- ### 1. Install Dependencies
30
- ```bash
31
- pip install -r requirements.txt
32
- ```
33
-
34
- ### 2. Set Environment Variables
35
- ```bash
36
- export GOOGLE_API_KEY="your-api-key"
37
- ```
38
- Or create a `.env` file:
39
- ```
40
- GOOGLE_API_KEY=your-api-key
41
- ```
42
-
43
- ### 3. Run the App
44
- ```bash
45
- streamlit run app.py
46
- ```
47
-
48
- ### 4. Use the App
49
  1. Upload a ZIP file of your codebase
50
  2. Click "Process & Index"
51
  3. Start chatting or switch modes!
52
 
53
- ## πŸ“ Project Structure
54
-
55
- ```
56
- Codebase_Agent/
57
- β”œβ”€β”€ app.py # Main Streamlit application
58
- β”‚
59
- β”œβ”€β”€ code_chatbot/ # Core library
60
- β”‚ β”‚
61
- β”‚ │── Core RAG Engine
62
- β”‚ β”œβ”€β”€ rag.py # Chat engine with RAG
63
- β”‚ β”œβ”€β”€ prompts.py # System prompts
64
- β”‚ β”œβ”€β”€ config.py # Centralized configuration
65
- β”‚ β”‚
66
- β”‚ │── Indexing & Chunking
67
- β”‚ β”œβ”€β”€ indexer.py # Vector database indexing
68
- β”‚ β”œβ”€β”€ chunker.py # AST-aware code chunking
69
- β”‚ β”œβ”€β”€ merkle_tree.py # Incremental change detection
70
- β”‚ β”œβ”€β”€ incremental_indexing.py # Incremental indexing logic
71
- β”‚ β”œβ”€β”€ indexing_progress.py # Progress tracking UI
72
- β”‚ β”œβ”€β”€ path_obfuscator.py # Privacy-preserving paths
73
- β”‚ β”‚
74
- β”‚ │── Retrieval
75
- β”‚ β”œβ”€β”€ retriever_wrapper.py # Enhanced retriever
76
- β”‚ β”œβ”€β”€ llm_retriever.py # LLM-based retrieval
77
- β”‚ β”œβ”€β”€ reranker.py # Result reranking
78
- β”‚ β”œβ”€β”€ graph_rag.py # Graph-enhanced RAG
79
- β”‚ β”‚
80
- β”‚ │── Code Analysis
81
- β”‚ β”œβ”€β”€ ast_analysis.py # AST parsing & call graphs
82
- β”‚ β”œβ”€β”€ code_symbols.py # Symbol extraction
83
- β”‚ β”‚
84
- β”‚ │── MCP Tools
85
- β”‚ β”œβ”€β”€ mcp_server.py # MCP server (search, refactor)
86
- β”‚ β”œβ”€β”€ mcp_client.py # MCP client interface
87
- β”‚ β”‚
88
- β”‚ │── Multi-Agent (CrewAI)
89
- β”‚ β”œβ”€β”€ agents/ # Agent definitions
90
- β”‚ β”œβ”€β”€ crews/ # Crew workflows
91
- β”‚ β”œβ”€β”€ agent_workflow.py # Agent orchestration
92
- β”‚ β”œβ”€β”€ tools.py # Agent tools
93
- β”‚ β”‚
94
- β”‚ │── Utilities
95
- β”‚ β”œβ”€β”€ universal_ingestor.py # File ingestion (ZIP, GitHub, Web)
96
- β”‚ └── rate_limiter.py # API rate limiting
97
- β”‚
98
- β”œβ”€β”€ components/ # Streamlit UI components
99
- β”‚ └── multi_mode.py # Mode selector & interfaces
100
- β”‚
101
- β”œβ”€β”€ api/ # FastAPI REST endpoints
102
- β”‚ β”œβ”€β”€ main.py # API entry point
103
- β”‚ β”œβ”€β”€ routes/ # Route handlers
104
- β”‚ └── schemas.py # Pydantic models
105
- β”‚
106
- β”œβ”€β”€ docs/ # Documentation
107
- β”‚ └── RAG_PIPELINE.md # Technical documentation
108
- β”‚
109
- β”œβ”€β”€ tests/ # Test suite
110
- β”‚
111
- └── assets/ # Static assets (logo, etc.)
112
- ```
113
-
114
- ## πŸ”§ Configuration
115
-
116
- All configuration is centralized in `code_chatbot/config.py`:
117
-
118
- ```python
119
- from code_chatbot.config import get_default_config
120
-
121
- config = get_default_config()
122
- print(config.chunking.max_chunk_size) # 1000
123
- print(config.retrieval.top_k) # 10
124
- ```
125
-
126
- ## πŸ› οΈ Technology Stack
127
-
128
- | Component | Technology |
129
- |-----------|------------|
130
- | **UI** | Streamlit |
131
- | **LLM** | Google Gemini |
132
- | **Embeddings** | gemini-embedding-001 |
133
- | **Vector DB** | ChromaDB / FAISS / Qdrant |
134
- | **RAG** | LangChain |
135
- | **Agents** | CrewAI |
136
- | **Code Tools** | MCP (Model Context Protocol) |
137
-
138
- ## πŸ“– Documentation
139
-
140
- - [RAG Pipeline](docs/RAG_PIPELINE.md) - Technical deep-dive
141
-
142
- ## πŸ“„ License
143
 
144
- Apache 2.0 - See [LICENSE](LICENSE)
 
1
+ ---
2
+ title: Code Crawler
3
+ emoji: πŸ•·οΈ
4
+ colorFrom: blue
5
+ colorTo: purple
6
+ sdk: streamlit
7
+ sdk_version: 1.32.0
8
+ app_file: app.py
9
+ pinned: false
10
+ license: apache-2.0
11
+ ---
12
 
13
+ # πŸ•·οΈ Code Crawler - AI Codebase Agent
14
 
15
+ An AI-powered codebase assistant that understands your code and helps you navigate, analyze, and modify it.
16
 
17
+ ## Features
 
 
 
18
 
19
+ - πŸ’¬ **Chat Mode** - Ask questions about your codebase
20
+ - πŸ” **Search Mode** - Find patterns with regex
21
+ - πŸ”§ **Refactor Mode** - Automated code refactoring
22
+ - ✨ **Generate Mode** - Create new features
23
 
24
+ ## Usage
 
 
 
25
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
  1. Upload a ZIP file of your codebase
27
  2. Click "Process & Index"
28
  3. Start chatting or switch modes!
29
 
30
+ ## Requirements
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31
 
32
+ Set your `GOOGLE_API_KEY` in the Secrets section.