Spaces:
Runtime error
GetGit - Local LLM Usage Guide
This guide explains the new local LLM features in GetGit and how to use them.
Overview
GetGit now supports running a local coding-optimized LLM (Qwen/Qwen2.5-Coder-7B) directly on your machine, with automatic fallback to Google Gemini if needed.
Key Features
1. Local LLM (Primary)
- Model: Qwen/Qwen2.5-Coder-7B from Hugging Face
- First Run: Automatically downloads (~14GB) and caches in
./models/ - Subsequent Runs: Uses cached model (fully offline)
- Optimized For: Code understanding, generation, and analysis
- No API Key Required: Completely free and private
2. Gemini Fallback (Automatic)
- Trigger: Only if local model fails to load or generate
- Model: gemini-2.5-flash
- Requires:
GEMINI_API_KEYenvironment variable - Use Case: Backup for systems without sufficient resources
3. Repository Persistence
- Tracking: Current repository URL stored in
data/source_repo.txt - Change Detection: Automatically detects when a different repo is requested
- Smart Cleanup: Removes old data only when necessary
- Efficiency: Reuses existing data for the same repository
Quick Start
Using Docker (Recommended)
Build the image:
docker build -t getgit .Run without Gemini (local model only):
docker run -p 5001:5001 getgitThe local model will download on first run (~10-15 minutes depending on connection).
Run with Gemini fallback (optional):
docker run -p 5001:5001 \ -e GEMINI_API_KEY="your_api_key_here" \ getgitAccess the web UI:
http://localhost:5001
Running Locally
Install dependencies:
pip install -r requirements.txtStart the server:
python server.pyAccess the web UI:
http://localhost:5001
Model Download
On first run, the local model will be downloaded automatically:
INFO - Loading local model: Qwen/Qwen2.5-Coder-7B
INFO - This may take a few minutes on first run...
INFO - Successfully loaded local model
Download Size: ~14GB
Cache Location: ./models/
Reusable: Yes, persists across restarts
System Requirements
Minimum (CPU Mode)
- RAM: 16GB
- Storage: 20GB free
- CPU: Multi-core processor
Recommended (GPU Mode)
- RAM: 16GB
- GPU: NVIDIA GPU with 8GB+ VRAM
- Storage: 20GB free
- CUDA: 11.7 or higher
LLM Selection Logic
The system automatically selects the best available LLM:
1. Attempt local Hugging Face model
ββ Success β Use local model
ββ Failure β Try Gemini fallback
ββ API key available β Use Gemini
ββ No API key β Error
Note: The fallback is automatic and transparent to the user.
Repository Management
How It Works
First Repository:
POST /initialize {"repo_url": "https://github.com/user/repo1.git"} β Clones repo1 β Stores URL in data/source_repo.txt β Indexes contentSame Repository Again:
POST /initialize {"repo_url": "https://github.com/user/repo1.git"} β Detects same URL β Reuses existing clone and index β Fast startupDifferent Repository:
POST /initialize {"repo_url": "https://github.com/user/repo2.git"} β Detects URL change β Deletes source_repo/ directory β Deletes .rag_cache/ directory β Updates data/source_repo.txt β Clones repo2 β Re-indexes from scratch
Environment Variables
| Variable | Required | Default | Description |
|---|---|---|---|
GEMINI_API_KEY |
No | - | Fallback API key for Gemini |
PORT |
No | 5001 | Server port |
FLASK_ENV |
No | production | Flask environment |
Troubleshooting
Local Model Won't Load
Symptom: "Local model unavailable, falling back to Gemini..."
Solutions:
- Check available RAM (need 16GB+)
- Check available storage (need 20GB+)
- Verify transformers/torch are installed
- Check logs for specific error message
Out of Memory
Symptom: Process killed or memory error during model load
Solutions:
- Close other applications
- Use smaller model (requires code changes)
- Use Gemini fallback instead
- Add more RAM or swap space
Model Download Fails
Symptom: Connection errors during first run
Solutions:
- Check internet connection
- Check firewall settings
- Retry (downloads resume automatically)
- Use manual download and place in
./models/
Repository Not Updating
Symptom: Old repository content shown for new URL
Solutions:
- Delete
data/source_repo.txt - Delete
source_repo/directory - Delete
.rag_cache/directory - Restart application
Performance Tips
- First Run: Expect 10-15 minute model download
- Subsequent Runs: Model loads in ~30-60 seconds
- GPU Usage: Automatically detected and used if available
- CPU Usage: Works but slower (~5-10x slower than GPU)
- Memory: Keep 16GB+ free for optimal performance
Security
- Local Model: No data sent externally
- Gemini Fallback: Only used if explicitly configured
- API Keys: Never logged or stored in code
- Privacy: Local mode is completely offline
Limitations
- Model Size: 7B parameters (large but manageable)
- Context Length: 4096 tokens max
- GPU Memory: Requires 8GB+ VRAM for best performance
- First Run: Requires internet for model download
Support
For issues or questions:
- Check logs for error messages
- Review troubleshooting section above
- Open an issue on GitHub
- Include system specs and error logs