Spaces:

samarthnaikk
/

getgitspace

Runtime error

App Files Files Community

getgitspace / LOCAL_LLM_GUIDE.md

Samarth Naik

hf p1

0c87788 28 days ago

preview code

raw

history blame contribute delete

5.81 kB

GetGit - Local LLM Usage Guide

This guide explains the new local LLM features in GetGit and how to use them.

Overview

GetGit now supports running a local coding-optimized LLM (Qwen/Qwen2.5-Coder-7B) directly on your machine, with automatic fallback to Google Gemini if needed.

Key Features

1. Local LLM (Primary)

Model: Qwen/Qwen2.5-Coder-7B from Hugging Face
First Run: Automatically downloads (~14GB) and caches in ./models/
Subsequent Runs: Uses cached model (fully offline)
Optimized For: Code understanding, generation, and analysis
No API Key Required: Completely free and private

2. Gemini Fallback (Automatic)

Trigger: Only if local model fails to load or generate
Model: gemini-2.5-flash
Requires: GEMINI_API_KEY environment variable
Use Case: Backup for systems without sufficient resources

3. Repository Persistence

Tracking: Current repository URL stored in data/source_repo.txt
Change Detection: Automatically detects when a different repo is requested
Smart Cleanup: Removes old data only when necessary
Efficiency: Reuses existing data for the same repository

Quick Start

Using Docker (Recommended)

Build the image:
```
docker build -t getgit .
```
Run without Gemini (local model only):
```
docker run -p 5001:5001 getgit
```
The local model will download on first run (~10-15 minutes depending on connection).

Run with Gemini fallback (optional):

docker run -p 5001:5001 \
  -e GEMINI_API_KEY="your_api_key_here" \
  getgit

Access the web UI:
```
http://localhost:5001
```

Running Locally

Install dependencies:
```
pip install -r requirements.txt
```
Start the server:
```
python server.py
```
Access the web UI:
```
http://localhost:5001
```

Model Download

On first run, the local model will be downloaded automatically:

INFO - Loading local model: Qwen/Qwen2.5-Coder-7B
INFO - This may take a few minutes on first run...
INFO - Successfully loaded local model

Download Size: ~14GB
Cache Location: ./models/
Reusable: Yes, persists across restarts

System Requirements

Minimum (CPU Mode)

RAM: 16GB
Storage: 20GB free
CPU: Multi-core processor

Recommended (GPU Mode)

RAM: 16GB
GPU: NVIDIA GPU with 8GB+ VRAM
Storage: 20GB free
CUDA: 11.7 or higher

LLM Selection Logic

The system automatically selects the best available LLM:

1. Attempt local Hugging Face model
   ├─ Success → Use local model
   └─ Failure → Try Gemini fallback
       ├─ API key available → Use Gemini
       └─ No API key → Error

Note: The fallback is automatic and transparent to the user.

Repository Management

How It Works

First Repository:

POST /initialize {"repo_url": "https://github.com/user/repo1.git"}
→ Clones repo1
→ Stores URL in data/source_repo.txt
→ Indexes content

Same Repository Again:

POST /initialize {"repo_url": "https://github.com/user/repo1.git"}
→ Detects same URL
→ Reuses existing clone and index
→ Fast startup

Different Repository:

POST /initialize {"repo_url": "https://github.com/user/repo2.git"}
→ Detects URL change
→ Deletes source_repo/ directory
→ Deletes .rag_cache/ directory
→ Updates data/source_repo.txt
→ Clones repo2
→ Re-indexes from scratch

Environment Variables

Variable	Required	Default	Description
`GEMINI_API_KEY`	No	-	Fallback API key for Gemini
`PORT`	No	5001	Server port
`FLASK_ENV`	No	production	Flask environment

Troubleshooting

Local Model Won't Load

Symptom: "Local model unavailable, falling back to Gemini..."

Solutions:

Check available RAM (need 16GB+)
Check available storage (need 20GB+)
Verify transformers/torch are installed
Check logs for specific error message

Out of Memory

Symptom: Process killed or memory error during model load

Solutions:

Close other applications
Use smaller model (requires code changes)
Use Gemini fallback instead
Add more RAM or swap space

Model Download Fails

Symptom: Connection errors during first run

Solutions:

Check internet connection
Check firewall settings
Retry (downloads resume automatically)
Use manual download and place in ./models/

Repository Not Updating

Symptom: Old repository content shown for new URL

Solutions:

Delete data/source_repo.txt
Delete source_repo/ directory
Delete .rag_cache/ directory
Restart application

Performance Tips

First Run: Expect 10-15 minute model download
Subsequent Runs: Model loads in ~30-60 seconds
GPU Usage: Automatically detected and used if available
CPU Usage: Works but slower (~5-10x slower than GPU)
Memory: Keep 16GB+ free for optimal performance

Security

Local Model: No data sent externally
Gemini Fallback: Only used if explicitly configured
API Keys: Never logged or stored in code
Privacy: Local mode is completely offline

Limitations

Model Size: 7B parameters (large but manageable)
Context Length: 4096 tokens max
GPU Memory: Requires 8GB+ VRAM for best performance
First Run: Requires internet for model download

Support

For issues or questions:

Check logs for error messages
Review troubleshooting section above
Open an issue on GitHub
Include system specs and error logs