# GetGit - Local LLM Usage Guide This guide explains the new local LLM features in GetGit and how to use them. ## Overview GetGit now supports running a local coding-optimized LLM (Qwen/Qwen2.5-Coder-7B) directly on your machine, with automatic fallback to Google Gemini if needed. ## Key Features ### 1. Local LLM (Primary) - **Model**: Qwen/Qwen2.5-Coder-7B from Hugging Face - **First Run**: Automatically downloads (~14GB) and caches in `./models/` - **Subsequent Runs**: Uses cached model (fully offline) - **Optimized For**: Code understanding, generation, and analysis - **No API Key Required**: Completely free and private ### 2. Gemini Fallback (Automatic) - **Trigger**: Only if local model fails to load or generate - **Model**: gemini-2.5-flash - **Requires**: `GEMINI_API_KEY` environment variable - **Use Case**: Backup for systems without sufficient resources ### 3. Repository Persistence - **Tracking**: Current repository URL stored in `data/source_repo.txt` - **Change Detection**: Automatically detects when a different repo is requested - **Smart Cleanup**: Removes old data only when necessary - **Efficiency**: Reuses existing data for the same repository ## Quick Start ### Using Docker (Recommended) 1. **Build the image:** ```bash docker build -t getgit . ``` 2. **Run without Gemini (local model only):** ```bash docker run -p 5001:5001 getgit ``` The local model will download on first run (~10-15 minutes depending on connection). 3. **Run with Gemini fallback (optional):** ```bash docker run -p 5001:5001 \ -e GEMINI_API_KEY="your_api_key_here" \ getgit ``` 4. **Access the web UI:** ``` http://localhost:5001 ``` ### Running Locally 1. **Install dependencies:** ```bash pip install -r requirements.txt ``` 2. **Start the server:** ```bash python server.py ``` 3. **Access the web UI:** ``` http://localhost:5001 ``` ## Model Download On first run, the local model will be downloaded automatically: ``` INFO - Loading local model: Qwen/Qwen2.5-Coder-7B INFO - This may take a few minutes on first run... INFO - Successfully loaded local model ``` **Download Size**: ~14GB **Cache Location**: `./models/` **Reusable**: Yes, persists across restarts ## System Requirements ### Minimum (CPU Mode) - **RAM**: 16GB - **Storage**: 20GB free - **CPU**: Multi-core processor ### Recommended (GPU Mode) - **RAM**: 16GB - **GPU**: NVIDIA GPU with 8GB+ VRAM - **Storage**: 20GB free - **CUDA**: 11.7 or higher ## LLM Selection Logic The system automatically selects the best available LLM: ``` 1. Attempt local Hugging Face model ├─ Success → Use local model └─ Failure → Try Gemini fallback ├─ API key available → Use Gemini └─ No API key → Error ``` **Note**: The fallback is automatic and transparent to the user. ## Repository Management ### How It Works 1. **First Repository**: ``` POST /initialize {"repo_url": "https://github.com/user/repo1.git"} → Clones repo1 → Stores URL in data/source_repo.txt → Indexes content ``` 2. **Same Repository Again**: ``` POST /initialize {"repo_url": "https://github.com/user/repo1.git"} → Detects same URL → Reuses existing clone and index → Fast startup ``` 3. **Different Repository**: ``` POST /initialize {"repo_url": "https://github.com/user/repo2.git"} → Detects URL change → Deletes source_repo/ directory → Deletes .rag_cache/ directory → Updates data/source_repo.txt → Clones repo2 → Re-indexes from scratch ``` ## Environment Variables | Variable | Required | Default | Description | |----------|----------|---------|-------------| | `GEMINI_API_KEY` | No | - | Fallback API key for Gemini | | `PORT` | No | 5001 | Server port | | `FLASK_ENV` | No | production | Flask environment | ## Troubleshooting ### Local Model Won't Load **Symptom**: "Local model unavailable, falling back to Gemini..." **Solutions**: 1. Check available RAM (need 16GB+) 2. Check available storage (need 20GB+) 3. Verify transformers/torch are installed 4. Check logs for specific error message ### Out of Memory **Symptom**: Process killed or memory error during model load **Solutions**: 1. Close other applications 2. Use smaller model (requires code changes) 3. Use Gemini fallback instead 4. Add more RAM or swap space ### Model Download Fails **Symptom**: Connection errors during first run **Solutions**: 1. Check internet connection 2. Check firewall settings 3. Retry (downloads resume automatically) 4. Use manual download and place in `./models/` ### Repository Not Updating **Symptom**: Old repository content shown for new URL **Solutions**: 1. Delete `data/source_repo.txt` 2. Delete `source_repo/` directory 3. Delete `.rag_cache/` directory 4. Restart application ## Performance Tips 1. **First Run**: Expect 10-15 minute model download 2. **Subsequent Runs**: Model loads in ~30-60 seconds 3. **GPU Usage**: Automatically detected and used if available 4. **CPU Usage**: Works but slower (~5-10x slower than GPU) 5. **Memory**: Keep 16GB+ free for optimal performance ## Security - **Local Model**: No data sent externally - **Gemini Fallback**: Only used if explicitly configured - **API Keys**: Never logged or stored in code - **Privacy**: Local mode is completely offline ## Limitations 1. **Model Size**: 7B parameters (large but manageable) 2. **Context Length**: 4096 tokens max 3. **GPU Memory**: Requires 8GB+ VRAM for best performance 4. **First Run**: Requires internet for model download ## Support For issues or questions: 1. Check logs for error messages 2. Review troubleshooting section above 3. Open an issue on GitHub 4. Include system specs and error logs