getgitspace / LOCAL_LLM_GUIDE.md
Samarth Naik
hf p1
0c87788
# GetGit - Local LLM Usage Guide
This guide explains the new local LLM features in GetGit and how to use them.
## Overview
GetGit now supports running a local coding-optimized LLM (Qwen/Qwen2.5-Coder-7B) directly on your machine, with automatic fallback to Google Gemini if needed.
## Key Features
### 1. Local LLM (Primary)
- **Model**: Qwen/Qwen2.5-Coder-7B from Hugging Face
- **First Run**: Automatically downloads (~14GB) and caches in `./models/`
- **Subsequent Runs**: Uses cached model (fully offline)
- **Optimized For**: Code understanding, generation, and analysis
- **No API Key Required**: Completely free and private
### 2. Gemini Fallback (Automatic)
- **Trigger**: Only if local model fails to load or generate
- **Model**: gemini-2.5-flash
- **Requires**: `GEMINI_API_KEY` environment variable
- **Use Case**: Backup for systems without sufficient resources
### 3. Repository Persistence
- **Tracking**: Current repository URL stored in `data/source_repo.txt`
- **Change Detection**: Automatically detects when a different repo is requested
- **Smart Cleanup**: Removes old data only when necessary
- **Efficiency**: Reuses existing data for the same repository
## Quick Start
### Using Docker (Recommended)
1. **Build the image:**
```bash
docker build -t getgit .
```
2. **Run without Gemini (local model only):**
```bash
docker run -p 5001:5001 getgit
```
The local model will download on first run (~10-15 minutes depending on connection).
3. **Run with Gemini fallback (optional):**
```bash
docker run -p 5001:5001 \
-e GEMINI_API_KEY="your_api_key_here" \
getgit
```
4. **Access the web UI:**
```
http://localhost:5001
```
### Running Locally
1. **Install dependencies:**
```bash
pip install -r requirements.txt
```
2. **Start the server:**
```bash
python server.py
```
3. **Access the web UI:**
```
http://localhost:5001
```
## Model Download
On first run, the local model will be downloaded automatically:
```
INFO - Loading local model: Qwen/Qwen2.5-Coder-7B
INFO - This may take a few minutes on first run...
INFO - Successfully loaded local model
```
**Download Size**: ~14GB
**Cache Location**: `./models/`
**Reusable**: Yes, persists across restarts
## System Requirements
### Minimum (CPU Mode)
- **RAM**: 16GB
- **Storage**: 20GB free
- **CPU**: Multi-core processor
### Recommended (GPU Mode)
- **RAM**: 16GB
- **GPU**: NVIDIA GPU with 8GB+ VRAM
- **Storage**: 20GB free
- **CUDA**: 11.7 or higher
## LLM Selection Logic
The system automatically selects the best available LLM:
```
1. Attempt local Hugging Face model
β”œβ”€ Success β†’ Use local model
└─ Failure β†’ Try Gemini fallback
β”œβ”€ API key available β†’ Use Gemini
└─ No API key β†’ Error
```
**Note**: The fallback is automatic and transparent to the user.
## Repository Management
### How It Works
1. **First Repository**:
```
POST /initialize {"repo_url": "https://github.com/user/repo1.git"}
β†’ Clones repo1
β†’ Stores URL in data/source_repo.txt
β†’ Indexes content
```
2. **Same Repository Again**:
```
POST /initialize {"repo_url": "https://github.com/user/repo1.git"}
β†’ Detects same URL
β†’ Reuses existing clone and index
β†’ Fast startup
```
3. **Different Repository**:
```
POST /initialize {"repo_url": "https://github.com/user/repo2.git"}
β†’ Detects URL change
β†’ Deletes source_repo/ directory
β†’ Deletes .rag_cache/ directory
β†’ Updates data/source_repo.txt
β†’ Clones repo2
β†’ Re-indexes from scratch
```
## Environment Variables
| Variable | Required | Default | Description |
|----------|----------|---------|-------------|
| `GEMINI_API_KEY` | No | - | Fallback API key for Gemini |
| `PORT` | No | 5001 | Server port |
| `FLASK_ENV` | No | production | Flask environment |
## Troubleshooting
### Local Model Won't Load
**Symptom**: "Local model unavailable, falling back to Gemini..."
**Solutions**:
1. Check available RAM (need 16GB+)
2. Check available storage (need 20GB+)
3. Verify transformers/torch are installed
4. Check logs for specific error message
### Out of Memory
**Symptom**: Process killed or memory error during model load
**Solutions**:
1. Close other applications
2. Use smaller model (requires code changes)
3. Use Gemini fallback instead
4. Add more RAM or swap space
### Model Download Fails
**Symptom**: Connection errors during first run
**Solutions**:
1. Check internet connection
2. Check firewall settings
3. Retry (downloads resume automatically)
4. Use manual download and place in `./models/`
### Repository Not Updating
**Symptom**: Old repository content shown for new URL
**Solutions**:
1. Delete `data/source_repo.txt`
2. Delete `source_repo/` directory
3. Delete `.rag_cache/` directory
4. Restart application
## Performance Tips
1. **First Run**: Expect 10-15 minute model download
2. **Subsequent Runs**: Model loads in ~30-60 seconds
3. **GPU Usage**: Automatically detected and used if available
4. **CPU Usage**: Works but slower (~5-10x slower than GPU)
5. **Memory**: Keep 16GB+ free for optimal performance
## Security
- **Local Model**: No data sent externally
- **Gemini Fallback**: Only used if explicitly configured
- **API Keys**: Never logged or stored in code
- **Privacy**: Local mode is completely offline
## Limitations
1. **Model Size**: 7B parameters (large but manageable)
2. **Context Length**: 4096 tokens max
3. **GPU Memory**: Requires 8GB+ VRAM for best performance
4. **First Run**: Requires internet for model download
## Support
For issues or questions:
1. Check logs for error messages
2. Review troubleshooting section above
3. Open an issue on GitHub
4. Include system specs and error logs