# Chat with GitHub Repo - GenAI Project ## Project Overview A GenAI application that allows developers to paste a GitHub repository URL and ask natural language questions about the codebase. The system clones the repo, processes and embeds the code files, then uses RAG (Retrieval Augmented Generation) to answer questions about the code. ## Architecture ``` ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ Frontend │ │ Backend API │ │ Vector DB │ │ (React/Next) │───▶│ (FastAPI) │───▶│ (Pinecone/ │ │ │ │ │ │ Chroma) │ └─────────────────┘ └─────────────────┘ └─────────────────┘ │ ▼ ┌─────────────────┐ │ GitHub API │ │ + Git Clone │ └─────────────────┘ ``` ## Tech Stack ### Backend - **FastAPI** - High-performance Python web framework - **LangChain** - LLM orchestration and RAG implementation - **Free LLM Options** - Ollama (local), Groq (fast inference), or Hugging Face - **Free Embeddings** - Sentence Transformers (local) or Hugging Face - **Chroma** - Vector database for embeddings - **GitPython** - For cloning and processing repositories - **Celery + Redis** - For background processing ### Frontend - **Next.js** - React framework - **TypeScript** - Type safety - **Tailwind CSS** - Styling - **Socket.io** - Real-time updates ## Backend Implementation ### 1. Project Structure ``` backend/ ├── app/ │ ├── __init__.py │ ├── main.py │ ├── models/ │ │ ├── __init__.py │ │ └── schemas.py │ ├── services/ │ │ ├── __init__.py │ │ ├── github_service.py │ │ ├── embedding_service.py │ │ └── chat_service.py │ ├── utils/ │ │ ├── __init__.py │ │ └── file_processor.py │ └── config.py ├── requirements.txt ├── docker-compose.yml └── Dockerfile ``` ### 2. Core Dependencies (requirements.txt) ```txt fastapi==0.104.1 uvicorn[standard]==0.24.0 python-multipart==0.0.6 pydantic==2.5.0 langchain==0.1.0 langchain-community==0.0.10 chromadb==0.4.18 GitPython==3.1.40 python-dotenv==1.0.0 celery==5.3.4 redis==5.0.1 socketio==5.10.0 python-socketio==5.10.0 aiofiles==23.2.1 # Free LLM & Embedding Options sentence-transformers==2.2.2 transformers==4.36.0 torch==2.1.0 ollama==0.1.7 groq==0.4.1 huggingface-hub==0.19.4 ``` ### 3. Configuration (config.py) ```python import os from pydantic_settings import BaseSettings from enum import Enum class LLMProvider(str, Enum): OLLAMA = "ollama" GROQ = "groq" HUGGINGFACE = "huggingface" class EmbeddingProvider(str, Enum): SENTENCE_TRANSFORMERS = "sentence_transformers" HUGGINGFACE = "huggingface" class Settings(BaseSettings): # LLM Configuration llm_provider: LLMProvider = LLMProvider.OLLAMA ollama_base_url: str = "http://localhost:11434" ollama_model: str = "llama2" # or codellama, mistral, etc. groq_api_key: str = "" groq_model: str = "mixtral-8x7b-32768" huggingface_api_key: str = "" huggingface_model: str = "microsoft/DialoGPT-medium" # Embedding Configuration embedding_provider: EmbeddingProvider = EmbeddingProvider.SENTENCE_TRANSFORMERS sentence_transformer_model: str = "all-MiniLM-L6-v2" # Fast and good # Alternative: "all-mpnet-base-v2" (better quality, slower) # Other settings github_token: str = "" redis_url: str = "redis://localhost:6379" vector_db_path: str = "./chroma_db" max_file_size: int = 1024 * 1024 # 1MB supported_extensions: list = [ ".py", ".js", ".ts", ".jsx", ".tsx", ".java", ".cpp", ".c", ".cs", ".go", ".rs", ".php", ".rb", ".swift", ".kt", ".scala", ".md", ".txt", ".json", ".yaml", ".yml", ".toml" ] class Config: env_file = ".env" settings = Settings() ``` ### 4. Data Models (models/schemas.py) ```python from pydantic import BaseModel, HttpUrl from typing import List, Optional from enum import Enum class ProcessingStatus(str, Enum): PENDING = "pending" PROCESSING = "processing" COMPLETED = "completed" FAILED = "failed" class RepoProcessRequest(BaseModel): repo_url: HttpUrl branch: Optional[str] = "main" class ChatMessage(BaseModel): message: str repo_id: str class ChatResponse(BaseModel): response: str sources: List[dict] repo_id: str class RepoStatus(BaseModel): repo_id: str status: ProcessingStatus progress: int message: str total_files: Optional[int] = None processed_files: Optional[int] = None ``` ### 5. GitHub Service (services/github_service.py) ```python import os import tempfile import shutil from git import Repo from typing import List, Tuple import hashlib from urllib.parse import urlparse class GitHubService: def __init__(self, github_token: str = ""): self.github_token = github_token def generate_repo_id(self, repo_url: str) -> str: """Generate a unique ID for the repository""" return hashlib.md5(repo_url.encode()).hexdigest() def parse_github_url(self, url: str) -> Tuple[str, str]: """Extract owner and repo name from GitHub URL""" parsed = urlparse(url) path_parts = parsed.path.strip('/').split('/') if len(path_parts) >= 2: return path_parts[0], path_parts[1] raise ValueError("Invalid GitHub URL format") async def clone_repository(self, repo_url: str, branch: str = "main") -> str: """Clone repository to temporary directory""" temp_dir = tempfile.mkdtemp() try: if self.github_token: # Use token for private repos or higher rate limits auth_url = repo_url.replace("https://", f"https://{self.github_token}@") Repo.clone_from(auth_url, temp_dir, branch=branch, depth=1) else: Repo.clone_from(repo_url, temp_dir, branch=branch, depth=1) return temp_dir except Exception as e: shutil.rmtree(temp_dir, ignore_errors=True) raise Exception(f"Failed to clone repository: {str(e)}") def cleanup_repo(self, repo_path: str): """Clean up cloned repository""" if os.path.exists(repo_path): shutil.rmtree(repo_path, ignore_errors=True) ``` ### 6. File Processor (utils/file_processor.py) ```python import os from typing import List, Dict, Generator import mimetypes from pathlib import Path class FileProcessor: def __init__(self, supported_extensions: List[str], max_file_size: int): self.supported_extensions = supported_extensions self.max_file_size = max_file_size self.ignore_dirs = { '.git', '__pycache__', 'node_modules', '.pytest_cache', 'venv', 'env', '.venv', 'build', 'dist', '.next', 'coverage', '.coverage', 'logs', 'log' } self.ignore_files = { '.gitignore', '.env', '.env.local', '.DS_Store', 'package-lock.json', 'yarn.lock', 'poetry.lock' } def should_process_file(self, file_path: str) -> bool: """Check if file should be processed""" path = Path(file_path) # Check if any parent directory is in ignore list for parent in path.parents: if parent.name in self.ignore_dirs: return False # Check file name if path.name in self.ignore_files: return False # Check extension if path.suffix.lower() not in self.supported_extensions: return False # Check file size try: if os.path.getsize(file_path) > self.max_file_size: return False except OSError: return False return True def extract_files(self, repo_path: str) -> Generator[Dict, None, None]: """Extract and yield file information""" for root, dirs, files in os.walk(repo_path): # Filter out ignored directories dirs[:] = [d for d in dirs if d not in self.ignore_dirs] for file in files: file_path = os.path.join(root, file) relative_path = os.path.relpath(file_path, repo_path) if not self.should_process_file(file_path): continue try: with open(file_path, 'r', encoding='utf-8', errors='ignore') as f: content = f.read() yield { 'path': relative_path, 'content': content, 'extension': Path(file_path).suffix.lower(), 'size': len(content) } except Exception as e: print(f"Error reading file {relative_path}: {e}") continue ``` ### 7. Free Embedding Service (services/embedding_service.py) ```python from sentence_transformers import SentenceTransformer from langchain.text_splitter import RecursiveCharacterTextSplitter from langchain.schema import Document from langchain_community.vectorstores import Chroma from langchain_community.embeddings import HuggingFaceEmbeddings, SentenceTransformerEmbeddings import chromadb from typing import List, Dict import os class FreeEmbeddingService: def __init__(self, embedding_provider: str, vector_db_path: str, model_name: str = "all-MiniLM-L6-v2"): self.vector_db_path = vector_db_path self.embedding_provider = embedding_provider # Initialize embedding function based on provider if embedding_provider == "sentence_transformers": self.embeddings = SentenceTransformerEmbeddings( model_name=model_name, cache_folder="./models" # Cache models locally ) elif embedding_provider == "huggingface": self.embeddings = HuggingFaceEmbeddings( model_name=model_name, cache_folder="./models" ) else: raise ValueError(f"Unsupported embedding provider: {embedding_provider}") self.text_splitter = RecursiveCharacterTextSplitter( chunk_size=1000, chunk_overlap=200, separators=["\n\n", "\n", " ", ""] ) def create_documents(self, files: List[Dict], repo_id: str) -> List[Document]: """Create documents from file contents""" documents = [] for file_info in files: # Create document with metadata doc = Document( page_content=file_info['content'], metadata={ 'path': file_info['path'], 'extension': file_info['extension'], 'repo_id': repo_id, 'size': file_info['size'] } ) documents.append(doc) return documents def split_documents(self, documents: List[Document]) -> List[Document]: """Split documents into chunks""" return self.text_splitter.split_documents(documents) async def create_embeddings(self, files: List[Dict], repo_id: str): """Create and store embeddings for repository files""" # Create documents documents = self.create_documents(files, repo_id) # Split into chunks chunks = self.split_documents(documents) # Create vector store collection_name = f"repo_{repo_id}" vectorstore = Chroma( collection_name=collection_name, embedding_function=self.embeddings, persist_directory=self.vector_db_path ) # Add documents to vector store in batches batch_size = 100 for i in range(0, len(chunks), batch_size): batch = chunks[i:i + batch_size] vectorstore.add_documents(batch) return vectorstore def get_vectorstore(self, repo_id: str): """Get existing vector store for repository""" collection_name = f"repo_{repo_id}" return Chroma( collection_name=collection_name, embedding_function=self.embeddings, persist_directory=self.vector_db_path ) # Alternative: Direct SentenceTransformers implementation for more control class DirectEmbeddingService: def __init__(self, model_name: str = "all-MiniLM-L6-v2"): self.model = SentenceTransformer(model_name, cache_folder="./models") self.text_splitter = RecursiveCharacterTextSplitter( chunk_size=1000, chunk_overlap=200, separators=["\n\n", "\n", " ", ""] ) def embed_texts(self, texts: List[str]) -> List[List[float]]: """Generate embeddings for texts""" return self.model.encode(texts, convert_to_numpy=True).tolist() def embed_query(self, query: str) -> List[float]: """Generate embedding for a single query""" return self.model.encode([query], convert_to_numpy=True)[0].tolist() ``` ### 8. Free Chat Service (services/chat_service.py) ```python from langchain.chains import RetrievalQA from langchain.prompts import PromptTemplate from langchain_community.llms import Ollama from typing import Dict, List import json import requests import os # For Groq (free tier available) class GroqLLM: def __init__(self, api_key: str, model: str = "mixtral-8x7b-32768"): self.api_key = api_key self.model = model self.base_url = "https://api.groq.com/openai/v1" def __call__(self, prompt: str) -> str: headers = { "Authorization": f"Bearer {self.api_key}", "Content-Type": "application/json" } data = { "model": self.model, "messages": [{"role": "user", "content": prompt}], "temperature": 0.1, "max_tokens": 1024 } response = requests.post( f"{self.base_url}/chat/completions", headers=headers, json=data ) if response.status_code == 200: return response.json()["choices"][0]["message"]["content"] else: raise Exception(f"Groq API error: {response.text}") # For Hugging Face Inference API class HuggingFaceLLM: def __init__(self, api_key: str, model: str = "microsoft/DialoGPT-medium"): self.api_key = api_key self.model = model self.base_url = f"https://api-inference.huggingface.co/models/{model}" def __call__(self, prompt: str) -> str: headers = {"Authorization": f"Bearer {self.api_key}"} data = {"inputs": prompt, "parameters": {"max_length": 1000, "temperature": 0.1}} response = requests.post(self.base_url, headers=headers, json=data) if response.status_code == 200: result = response.json() if isinstance(result, list) and len(result) > 0: return result[0].get("generated_text", "").replace(prompt, "").strip() return str(result) else: raise Exception(f"HuggingFace API error: {response.text}") class FreeChatService: def __init__(self, llm_provider: str, **kwargs): self.llm_provider = llm_provider if llm_provider == "ollama": self.llm = Ollama( model=kwargs.get("model", "llama2"), base_url=kwargs.get("base_url", "http://localhost:11434"), temperature=0.1 ) elif llm_provider == "groq": self.llm = GroqLLM( api_key=kwargs.get("api_key"), model=kwargs.get("model", "mixtral-8x7b-32768") ) elif llm_provider == "huggingface": self.llm = HuggingFaceLLM( api_key=kwargs.get("api_key"), model=kwargs.get("model", "microsoft/DialoGPT-medium") ) else: raise ValueError(f"Unsupported LLM provider: {llm_provider}") self.prompt_template = PromptTemplate( input_variables=["context", "question"], template=""" You are a helpful AI assistant that analyzes code repositories. Use the following code snippets to answer the user's question about the repository. Context from repository: {context} Question: {question} Please provide a detailed answer based on the code context provided. If you reference specific files or functions, mention their file paths. If the question cannot be fully answered from the provided context, say so clearly. Answer:""" ) async def answer_question(self, question: str, vectorstore, repo_id: str) -> Dict: """Answer question using RAG with free LLM""" try: if self.llm_provider == "ollama": # Use LangChain's RetrievalQA for Ollama qa_chain = RetrievalQA.from_chain_type( llm=self.llm, chain_type="stuff", retriever=vectorstore.as_retriever(search_kwargs={"k": 5}), chain_type_kwargs={"prompt": self.prompt_template}, return_source_documents=True ) result = qa_chain({"query": question}) answer = result["result"] source_docs = result.get("source_documents", []) else: # Manual RAG for other providers docs = vectorstore.similarity_search(question, k=5) context = "\n\n".join([doc.page_content for doc in docs]) prompt = self.prompt_template.format( context=context, question=question ) answer = self.llm(prompt) source_docs = docs # Format sources sources = [] for doc in source_docs: sources.append({ "path": doc.metadata.get("path", "Unknown"), "content_preview": doc.page_content[:200] + "..." if len(doc.page_content) > 200 else doc.page_content }) return { "response": answer, "sources": sources, "repo_id": repo_id } except Exception as e: return { "response": f"Error processing question: {str(e)}", "sources": [], "repo_id": repo_id } ``` ### 9. Updated Main FastAPI Application (main.py) ```python from fastapi import FastAPI, HTTPException, BackgroundTasks from fastapi.middleware.cors import CORSMiddleware import uvicorn from typing import Dict import asyncio from datetime import datetime from models.schemas import RepoProcessRequest, ChatMessage, ChatResponse, RepoStatus, ProcessingStatus from services.github_service import GitHubService from services.embedding_service import FreeEmbeddingService from services.chat_service import FreeChatService from utils.file_processor import FileProcessor from config import settings app = FastAPI(title="Chat with GitHub Repo (Free Version)", version="1.0.0") # CORS middleware app.add_middleware( CORSMiddleware, allow_origins=["*"], # Configure properly for production allow_credentials=True, allow_methods=["*"], allow_headers=["*"], ) # Initialize services based on configuration github_service = GitHubService(settings.github_token) embedding_service = FreeEmbeddingService( embedding_provider=settings.embedding_provider.value, vector_db_path=settings.vector_db_path, model_name=settings.sentence_transformer_model ) # Initialize chat service based on provider chat_kwargs = {} if settings.llm_provider.value == "ollama": chat_kwargs = { "model": settings.ollama_model, "base_url": settings.ollama_base_url } elif settings.llm_provider.value == "groq": chat_kwargs = { "api_key": settings.groq_api_key, "model": settings.groq_model } elif settings.llm_provider.value == "huggingface": chat_kwargs = { "api_key": settings.huggingface_api_key, "model": settings.huggingface_model } chat_service = FreeChatService( llm_provider=settings.llm_provider.value, **chat_kwargs ) file_processor = FileProcessor(settings.supported_extensions, settings.max_file_size) # In-memory status tracking (use Redis in production) repo_status: Dict[str, RepoStatus] = {} async def process_repository(repo_url: str, branch: str, repo_id: str): """Background task to process repository""" try: repo_status[repo_id] = RepoStatus( repo_id=repo_id, status=ProcessingStatus.PROCESSING, progress=10, message="Cloning repository..." ) # Clone repository repo_path = await github_service.clone_repository(repo_url, branch) repo_status[repo_id].progress = 30 repo_status[repo_id].message = "Processing files..." # Extract files files = list(file_processor.extract_files(repo_path)) repo_status[repo_id].total_files = len(files) repo_status[repo_id].progress = 50 repo_status[repo_id].message = "Creating embeddings (this may take a while for large repos)..." # Create embeddings await embedding_service.create_embeddings(files, repo_id) # Cleanup github_service.cleanup_repo(repo_path) repo_status[repo_id].status = ProcessingStatus.COMPLETED repo_status[repo_id].progress = 100 repo_status[repo_id].message = f"Repository processed successfully! Using {settings.llm_provider.value} for chat." except Exception as e: repo_status[repo_id].status = ProcessingStatus.FAILED repo_status[repo_id].message = f"Error: {str(e)}" @app.post("/api/process-repo") async def process_repo(request: RepoProcessRequest, background_tasks: BackgroundTasks): """Process a GitHub repository""" repo_id = github_service.generate_repo_id(str(request.repo_url)) # Check if already processed if repo_id in repo_status and repo_status[repo_id].status == ProcessingStatus.COMPLETED: return {"repo_id": repo_id, "message": "Repository already processed"} # Start processing repo_status[repo_id] = RepoStatus( repo_id=repo_id, status=ProcessingStatus.PENDING, progress=0, message="Starting processing..." ) background_tasks.add_task(process_repository, str(request.repo_url), request.branch, repo_id) return {"repo_id": repo_id, "message": "Processing started"} @app.get("/api/status/{repo_id}", response_model=RepoStatus) async def get_repo_status(repo_id: str): """Get repository processing status""" if repo_id not in repo_status: raise HTTPException(status_code=404, detail="Repository not found") return repo_status[repo_id] @app.post("/api/chat", response_model=ChatResponse) async def chat_with_repo(message: ChatMessage): """Chat with repository""" repo_id = message.repo_id # Check if repo is processed if repo_id not in repo_status or repo_status[repo_id].status != ProcessingStatus.COMPLETED: raise HTTPException(status_code=400, detail="Repository not processed") try: # Get vector store vectorstore = embedding_service.get_vectorstore(repo_id) # Get answer result = await chat_service.answer_question(message.message, vectorstore, repo_id) return ChatResponse(**result) except Exception as e: raise HTTPException(status_code=500, detail=str(e)) @app.get("/api/health") async def health_check(): return { "status": "healthy", "timestamp": datetime.utcnow(), "llm_provider": settings.llm_provider.value, "embedding_provider": settings.embedding_provider.value } @app.get("/api/config") async def get_config(): """Get current configuration""" return { "llm_provider": settings.llm_provider.value, "embedding_provider": settings.embedding_provider.value, "embedding_model": settings.sentence_transformer_model, "supported_extensions": settings.supported_extensions } if __name__ == "__main__": uvicorn.run(app, host="0.0.0.0", port=8000, reload=True) ```_service.create_embeddings(files, repo_id) # Cleanup github_service.cleanup_repo(repo_path) repo_status[repo_id].status = ProcessingStatus.COMPLETED repo_status[repo_id].progress = 100 repo_status[repo_id].message = "Repository processed successfully!" except Exception as e: repo_status[repo_id].status = ProcessingStatus.FAILED repo_status[repo_id].message = f"Error: {str(e)}" @app.post("/api/process-repo") async def process_repo(request: RepoProcessRequest, background_tasks: BackgroundTasks): """Process a GitHub repository""" repo_id = github_service.generate_repo_id(str(request.repo_url)) # Check if already processed if repo_id in repo_status and repo_status[repo_id].status == ProcessingStatus.COMPLETED: return {"repo_id": repo_id, "message": "Repository already processed"} # Start processing repo_status[repo_id] = RepoStatus( repo_id=repo_id, status=ProcessingStatus.PENDING, progress=0, message="Starting processing..." ) background_tasks.add_task(process_repository, str(request.repo_url), request.branch, repo_id) return {"repo_id": repo_id, "message": "Processing started"} @app.get("/api/status/{repo_id}", response_model=RepoStatus) async def get_repo_status(repo_id: str): """Get repository processing status""" if repo_id not in repo_status: raise HTTPException(status_code=404, detail="Repository not found") return repo_status[repo_id] @app.post("/api/chat", response_model=ChatResponse) async def chat_with_repo(message: ChatMessage): """Chat with repository""" repo_id = message.repo_id # Check if repo is processed if repo_id not in repo_status or repo_status[repo_id].status != ProcessingStatus.COMPLETED: raise HTTPException(status_code=400, detail="Repository not processed") try: # Get vector store vectorstore = embedding_service.get_vectorstore(repo_id) # Get answer result = await chat_service.answer_question(message.message, vectorstore, repo_id) return ChatResponse(**result) except Exception as e: raise HTTPException(status_code=500, detail=str(e)) @app.get("/api/health") async def health_check(): return {"status": "healthy", "timestamp": datetime.utcnow()} if __name__ == "__main__": uvicorn.run(app, host="0.0.0.0", port=8000, reload=True) ``` ### 10. Environment Configuration (.env) ```env # LLM Provider Configuration LLM_PROVIDER=ollama # Options: ollama, groq, huggingface EMBEDDING_PROVIDER=sentence_transformers # Options: sentence_transformers, huggingface # Ollama Configuration (Local LLM - Free) OLLAMA_BASE_URL=http://localhost:11434 OLLAMA_MODEL=llama2 # Options: llama2, codellama, mistral, phi, etc. # Groq Configuration (Fast inference - Free tier available) GROQ_API_KEY=your_groq_api_key_here GROQ_MODEL=mixtral-8x7b-32768 # Options: mixtral-8x7b-32768, llama2-70b-4096 # Hugging Face Configuration (Free inference API) HUGGINGFACE_API_KEY=your_hf_api_key_here HUGGINGFACE_MODEL=microsoft/DialoGPT-medium # Embedding Model Configuration SENTENCE_TRANSFORMER_MODEL=all-MiniLM-L6-v2 # Fast and good quality # Other Configuration GITHUB_TOKEN=your_github_token_here REDIS_URL=redis://localhost:6379 VECTOR_DB_PATH=./chroma_db ``` ### 11. Updated Docker Configuration **Dockerfile:** ```dockerfile FROM python:3.11-slim WORKDIR /app # Install system dependencies RUN apt-get update && apt-get install -y \ git \ build-essential \ && rm -rf /var/lib/apt/lists/* COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt # Pre-download embedding models RUN python -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('all-MiniLM-L6-v2', cache_folder='./models')" COPY . . EXPOSE 8000 CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"] ``` **docker-compose.yml:** ```yaml version: '3.8' services: api: build: . ports: - "8000:8000" environment: - LLM_PROVIDER=${LLM_PROVIDER} - EMBEDDING_PROVIDER=${EMBEDDING_PROVIDER} - OLLAMA_BASE_URL=http://ollama:11434 - OLLAMA_MODEL=${OLLAMA_MODEL} - GROQ_API_KEY=${GROQ_API_KEY} - GROQ_MODEL=${GROQ_MODEL} - HUGGINGFACE_API_KEY=${HUGGINGFACE_API_KEY} - HUGGINGFACE_MODEL=${HUGGINGFACE_MODEL} - SENTENCE_TRANSFORMER_MODEL=${SENTENCE_TRANSFORMER_MODEL} - GITHUB_TOKEN=${GITHUB_TOKEN} - REDIS_URL=redis://redis:6379 volumes: - ./chroma_db:/app/chroma_db - ./models:/app/models # Cache for models depends_on: - redis - ollama ollama: image: ollama/ollama:latest ports: - "11434:11434" volumes: - ollama_data:/root/.ollama environment: - OLLAMA_KEEP_ALIVE=24h redis: image: redis:7-alpine ports: - "6379:6379" volumes: ollama_data: ``` **Setup script for Ollama models (setup_ollama.sh):** ```bash #!/bin/bash # Pull required models docker exec chat-with-github-repo-ollama-1 ollama pull llama2 docker exec chat-with-github-repo-ollama-1 ollama pull codellama docker exec chat-with-github-repo-ollama-1 ollama pull mistral ``` ## Performance Comparison | Provider | Speed | Quality | Cost | Setup Difficulty | |----------|-------|---------|------|------------------| | Ollama + Llama2 | Medium | Good | Free | Easy | | Ollama + CodeLlama | Medium | Excellent (Code) | Free | Easy | | Groq + Mixtral | Very Fast | Excellent | Free Tier | Very Easy | | HuggingFace | Slow | Variable | Free | Very Easy | | Local Transformers | Slow-Medium | Good | Free | Medium | ## Recommended Configurations ### For Development/Testing: ```env LLM_PROVIDER=groq GROQ_MODEL=mixtral-8x7b-32768 EMBEDDING_PROVIDER=sentence_transformers SENTENCE_TRANSFORMER_MODEL=all-MiniLM-L6-v2 ``` ### For Production (Local): ```env LLM_PROVIDER=ollama OLLAMA_MODEL=codellama EMBEDDING_PROVIDER=sentence_transformers SENTENCE_TRANSFORMER_MODEL=all-mpnet-base-v2 ``` ### For Minimal Resources: ```env LLM_PROVIDER=ollama OLLAMA_MODEL=phi EMBEDDING_PROVIDER=sentence_transformers SENTENCE_TRANSFORMER_MODEL=all-MiniLM-L6-v2 ``` ## Troubleshooting ### Ollama Issues: ```bash # Check if Ollama is running curl http://localhost:11434/api/version # List available models ollama list # Check logs ollama logs ``` ### Memory Issues: - Use smaller models (`phi` instead of `llama2`) - Reduce batch size in embedding service - Use quantized models - Process repos in smaller chunks ### Performance Optimization: ```python # For faster embeddings embedding_service = FreeEmbeddingService( embedding_provider="sentence_transformers", vector_db_path="./chroma_db", model_name="all-MiniLM-L6-v2" # Faster than all-mpnet-base-v2 ) # Batch processing async def create_embeddings_batch(self, files: List[Dict], repo_id: str, batch_size: int = 50): for i in range(0, len(files), batch_size): batch = files[i:i + batch_size] # Process batch... ``` ## Deployment & Scaling ### Project Structure ``` frontend/ ├── src/ │ ├── app/ │ │ ├── page.tsx │ │ ├── layout.tsx │ │ └── globals.css │ ├── components/ │ │ ├── RepoInput.tsx │ │ ├── ChatInterface.tsx │ │ ├── ProcessingStatus.tsx │ │ └── SourceDisplay.tsx │ └── lib/ │ └── api.ts ├── package.json ├── tailwind.config.js └── next.config.js ``` ### Key Features - Repository URL input with validation - Real-time processing status updates - Chat interface with message history - Source code display with syntax highlighting - Responsive design ## Free Alternatives to OpenAI ### 1. **Ollama (Recommended for Local Deployment)** **Pros:** - 100% free and private - Runs locally, no API calls - Supports many models (Llama2, CodeLlama, Mistral, Phi, etc.) - Good performance on decent hardware **Setup:** ```bash # Install Ollama curl -fsSL https://ollama.ai/install.sh | sh # Pull models ollama pull llama2 # General purpose ollama pull codellama # Better for code ollama pull mistral # Good balance ollama pull phi # Lightweight # Start Ollama server ollama serve ``` **Requirements:** 8GB+ RAM, preferably with GPU ### 2. **Groq (Fast Inference - Free Tier)** **Pros:** - Extremely fast inference - Free tier: 100 requests/minute - High-quality models (Mixtral, Llama2) - Simple API **Setup:** 1. Sign up at [groq.com](https://groq.com) 2. Get API key from console 3. Set `GROQ_API_KEY` in environment **Free Limits:** 100 requests/minute, 1000 requests/day ### 3. **Hugging Face Inference API (Free)** **Pros:** - Completely free - Access to thousands of models - No setup required - Good for experimentation **Setup:** 1. Sign up at [huggingface.co](https://huggingface.co) 2. Get API token from settings 3. Set `HUGGINGFACE_API_KEY` in environment **Note:** Can be slower due to cold starts ### 4. **Local Transformers (Completely Free)** For maximum control, you can run models directly: ```python # services/local_llm_service.py from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM import torch class LocalLLMService: def __init__(self, model_name="microsoft/DialoGPT-small"): self.tokenizer = AutoTokenizer.from_pretrained(model_name) self.model = AutoModelForCausalLM.from_pretrained(model_name) self.generator = pipeline( "text-generation", model=self.model, tokenizer=self.tokenizer, device=0 if torch.cuda.is_available() else -1 ) def generate_response(self, prompt: str, max_length: int = 512): response = self.generator( prompt, max_length=max_length, temperature=0.7, do_sample=True, pad_token_id=self.tokenizer.eos_token_id ) return response[0]["generated_text"].replace(prompt, "").strip() ``` ## Embedding Models Comparison ### 1. **Sentence Transformers (Recommended)** ```python # Best models for code: "all-MiniLM-L6-v2" # Fast, good quality "all-mpnet-base-v2" # Better quality, slower "multi-qa-MiniLM-L6-cos-v1" # Good for Q&A ``` ### 2. **Hugging Face Embeddings** ```python # Popular models: "sentence-transformers/all-MiniLM-L6-v2" "sentence-transformers/paraphrase-MiniLM-L6-v2" ``` ## Quick Start Guide ### Option 1: Ollama (Local) ```bash # 1. Install and start Ollama curl -fsSL https://ollama.ai/install.sh | sh ollama serve # 2. Pull a model ollama pull llama2 # 3. Set environment variables export LLM_PROVIDER=ollama export OLLAMA_MODEL=llama2 export EMBEDDING_PROVIDER=sentence_transformers # 4. Start the application python main.py ``` ### Option 2: Groq (Cloud) ```bash # 1. Get API key from groq.com export GROQ_API_KEY=your_api_key_here export LLM_PROVIDER=groq export GROQ_MODEL=mixtral-8x7b-32768 export EMBEDDING_PROVIDER=sentence_transformers # 2. Start the application python main.py ``` ### Option 3: Hugging Face (Cloud) ```bash # 1. Get token from huggingface.co export HUGGINGFACE_API_KEY=your_token_here export LLM_PROVIDER=huggingface export HUGGINGFACE_MODEL=microsoft/DialoGPT-medium export EMBEDDING_PROVIDER=sentence_transformers # 2. Start the application python main.py ``` ### Production Considerations 1. **Vector Database**: Use Pinecone for better scalability 2. **Background Processing**: Implement with Celery + Redis 3. **Caching**: Add Redis caching for frequent queries 4. **Rate Limiting**: Implement API rate limiting 5. **Authentication**: Add user authentication 6. **Monitoring**: Add logging and monitoring (Sentry, DataDog) ### Performance Optimizations 1. **Chunking Strategy**: Optimize chunk size and overlap 2. **Embedding Model**: Consider using smaller models for faster processing 3. **Retrieval**: Implement hybrid search (dense + sparse) 4. **Caching**: Cache embeddings and frequently asked questions ## Usage Examples ### Sample Questions Users Can Ask: - "How is authentication handled in this project?" - "Explain the UserService class and its methods" - "What testing framework is used and where are the tests?" - "How does the database connection work?" - "What are the main API endpoints?" - "Show me how error handling is implemented" ## Next Steps & Enhancements 1. **Multi-language Support**: Better handling of different programming languages 2. **Code Analysis**: Add static code analysis features 3. **Visualization**: Generate architecture diagrams 4. **Collaboration**: Multi-user support with shared repositories 5. **Integration**: GitHub webhook integration for auto-updates 6. **AI Features**: Code suggestions and improvements This project provides a solid foundation for building a production-ready "Chat with GitHub Repo" application that developers will find incredibly useful for understanding and navigating codebases!