Spaces:
Sleeping
Sleeping
| # ๐code-compass | |
| An AI-powered tool for analyzing code repositories using hierarchical chunking and semantic search with Pinecone vector database. | |
| ## ๐ Features | |
| - **๐ฅ Multiple Input Methods**: GitHub URLs or ZIP file uploads | |
| - **๐ง Hierarchical Chunking**: Smart code parsing at multiple levels (file โ class โ function โ block) | |
| - **๐ Semantic Search**: AI-powered natural language queries using Pinecone vector database | |
| - **๐ค Intelligent Analysis**: Local LLM integration with Qwen2.5-Coder-7B-Instruct | |
| - **๐ฌ Conversation History**: Maintains context across multiple queries | |
| - **๐ Repository Analytics**: Comprehensive statistics and structure analysis | |
| - **๐ฏ Pinecone Integration**: Scalable vector database with automatic embedding generation | |
| - **โก Optimized Performance**: Quantized models for efficient local inference | |
| ## ๐ ๏ธ Setup | |
| ### Prerequisites | |
| 1. **Python 3.8+** | |
| 2. **Pinecone Account**: Create a free account at [Pinecone.io](https://www.pinecone.io/) | |
| 3. **System Requirements** for LLM: | |
| - **RAM**: 8GB minimum (16GB+ recommended) | |
| - **Storage**: 5-8GB free space for model | |
| - **CPU**: Multi-core processor (supports GPU acceleration if available) | |
| ### Installation | |
| 1. **Clone or download this project** | |
| ```bash | |
| git clone https://github.com/shahzeb171/code-compass.git | |
| cd code-compass | |
| ``` | |
| 2. **Install dependencies** | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| 3. **Download the LLM model** | |
| ``` | |
| wget https://huggingface.co/bartowski/Qwen2.5-Coder-7B-Instruct-GGUF/resolve/main/Qwen2.5-Coder-7B-Instruct-Q4_K_M.gguf | |
| ``` | |
| **Recommended**: Select Q4_K_M for the best balance of quality and performance. | |
| 4. **Set up Pinecone API Key** | |
| Create `config.py` file: | |
| ``` | |
| PINECONE_API_KEY=your-pinecone-api-key-here | |
| PINECONE_INDEX_NAME=index_name(eg. code_compass_index) | |
| PINECONE_EMBEDDING_MODEL=embedding_model(eg. llama-text-embed-v2 (check pinecone docs for more models)) | |
| MODEL_PATH=path_to_the_model | |
| ``` | |
| ### Getting Your Pinecone API Key | |
| 1. Go to [Pinecone.io](https://www.pinecone.io/) and sign up for a free account | |
| 2. Navigate to the "API Keys" section in your dashboard | |
| 3. Create a new API key or copy an existing one | |
| 4. The free tier includes: | |
| - 1 index | |
| - 5M vector dimensions | |
| - Enough for most code analysis projects! | |
| ## ๐ Usage | |
| 1. **Start the application** | |
| ```bash | |
| python main.py | |
| ``` | |
| 2. **Open your browser** to `http://localhost:7860` | |
| 3. **Load a repository** | |
| - Enter a GitHub URL (e.g., `https://github.com/pallets/flask`) | |
| - Or upload a ZIP file of your code | |
| - Click "๐ Load Repository" | |
| 4. **Process the repository** | |
| - Click "๐ Process Repository" to analyze and chunk your code | |
| - This creates hierarchical chunks and stores them in Pinecone with automatic embedding generation | |
| - Wait for processing to complete (may take 1-5 minutes depending on repo size) | |
| 5. **Initialize the AI model** (Optional but recommended) | |
| - Click "๐ Initialize LLM" to start loading the local AI model | |
| - This will load Qwen2.5-Coder-7B-Instruct for intelligent code analysis | |
| - Initial loading takes 1-3 minutes | |
| 6. **Query your code** | |
| - Ask natural language questions like: | |
| - "What does this repository do?" | |
| - "Show me authentication functions" | |
| - "How is error handling implemented?" | |
| - "What are the main classes?" | |
| - Toggle "Use AI Analysis" for intelligent responses vs basic search results | |
| - The AI maintains conversation context for follow-up questions | |
| ## ๐ How It Works | |
| ### Hierarchical Chunking Strategy | |
| The system creates multiple levels of code chunks: | |
| **Level 1: File Context** | |
| - Complete file overview with imports and purpose | |
| - Metadata: file path, language, total lines | |
| **Level 2: Class Chunks** | |
| - Full class definitions with inheritance and methods | |
| - Metadata: class name, methods list, relationships | |
| **Level 3: Function Chunks** | |
| - Individual function implementations with signatures | |
| - Metadata: function name, arguments, complexity score | |
| **Level 4: Code Block Chunks** | |
| - Sub-chunks for complex functions (loops, conditionals, error handling) | |
| - Metadata: block type, purpose, parent function | |
| ### Vector Search Process | |
| 1. **Embedding Generation**: Code chunks are converted to vector embeddings using SentenceTransformers | |
| 2. **Vector Storage**: Embeddings stored in Pinecone with rich metadata | |
| 3. **Semantic Search**: User queries are embedded and matched against stored vectors | |
| 4. **Hybrid Filtering**: Results filtered by chunk type, file path, repository, etc. | |
| 5. **Ranked Results**: Most relevant code sections returned with similarity scores | |
| ## ๐ง Configuration Options | |
| ### Supported Languages | |
| Currently optimized for Python with basic support for: | |
| - JavaScript/TypeScript | |
| - Java | |
| - C/C++ | |
| - Go | |
| - Rust | |
| - PHP | |
| - Ruby | |
| ## ๐ Example Repositories | |
| Try these public repositories: | |
| - **Flask**: `https://github.com/pallets/flask` - Web framework | |
| - **Requests**: `https://github.com/requests/requests` - HTTP library | |
| - **FastAPI**: `https://github.com/tiangolo/fastapi` - Modern web framework | |
| - **Black**: `https://github.com/psf/black` - Code formatter | |
| ## ๐ Example Queries | |
| ### General Repository Understanding | |
| - "What is the main purpose of this repository?" | |
| - "What are the core components and how do they interact?" | |
| - "Show me the project architecture overview" | |
| ### Function & Class Discovery | |
| - "What are the main classes and their responsibilities?" | |
| - "Show me all authentication-related functions" | |
| - "Find functions that handle file operations" | |
| - "What utility functions are available?" | |
| ### Implementation Analysis | |
| - "How is error handling implemented?" | |
| - "Show me configuration management code" | |
| - "Find database-related functions" | |
| - "How does logging work in this project?" | |
| ### Code Patterns | |
| - "Show me decorator implementations" | |
| - "Find async/await usage patterns" | |
| - "What design patterns are used?" | |
| - "How are tests structured?" | |
| ## ๐ Troubleshooting | |
| ### Common Issues | |
| **"Pinecone API key is required"** | |
| - Make sure you've set the `PINECONE_API_KEY` environment variable | |
| - Or enter it in the Advanced Options section | |
| **"Error downloading repository"** | |
| - Check that the GitHub URL is correct and public | |
| - Ensure you have internet connection | |
| - Large repositories may timeout - try smaller repos first | |
| **"No chunks generated"** | |
| - Make sure the repository contains supported code files | |
| - Check that ZIP files aren't corrupted | |
| - Python files work best currently | |
| **"Vector store initialization failed"** | |
| - Verify your Pinecone API key is valid | |
| - Check your Pinecone account hasn't exceeded free tier limits | |
| - Try a different environment region if needed | |
| ### Performance Tips | |
| - Start with smaller repositories (< 100 files) to test | |
| - Python repositories work best currently | |
| - Processing time scales with repository size | |
| - Queries are fast once processing is complete | |
| ## ๐ฎ Future Enhancements | |
| - **More Language Support**: Better parsing for JavaScript, Java, etc. | |
| - **Code Generation**: AI-powered code completion and generation | |
| - **Diff Analysis**: Compare changes between repository versions | |
| - **Team Collaboration**: Share analyzed repositories | |
| - **Custom Embeddings**: Fine-tuned models for specific domains | |
| - **API Integration**: REST API for programmatic access | |
| ## ๐ค Contributing | |
| Contributions welcome! Please open issues or submit pull requests. | |
| ## ๐ Support | |
| For issues or questions: | |
| 1. Check the troubleshooting section above | |
| 2. Open a GitHub issue with detailed error messages | |
| 3. Include your Python version and OS information |