getgitspace / LOCAL_LLM_GUIDE.md
Samarth Naik
hf p1
0c87788

GetGit - Local LLM Usage Guide

This guide explains the new local LLM features in GetGit and how to use them.

Overview

GetGit now supports running a local coding-optimized LLM (Qwen/Qwen2.5-Coder-7B) directly on your machine, with automatic fallback to Google Gemini if needed.

Key Features

1. Local LLM (Primary)

  • Model: Qwen/Qwen2.5-Coder-7B from Hugging Face
  • First Run: Automatically downloads (~14GB) and caches in ./models/
  • Subsequent Runs: Uses cached model (fully offline)
  • Optimized For: Code understanding, generation, and analysis
  • No API Key Required: Completely free and private

2. Gemini Fallback (Automatic)

  • Trigger: Only if local model fails to load or generate
  • Model: gemini-2.5-flash
  • Requires: GEMINI_API_KEY environment variable
  • Use Case: Backup for systems without sufficient resources

3. Repository Persistence

  • Tracking: Current repository URL stored in data/source_repo.txt
  • Change Detection: Automatically detects when a different repo is requested
  • Smart Cleanup: Removes old data only when necessary
  • Efficiency: Reuses existing data for the same repository

Quick Start

Using Docker (Recommended)

  1. Build the image:

    docker build -t getgit .
    
  2. Run without Gemini (local model only):

    docker run -p 5001:5001 getgit
    

    The local model will download on first run (~10-15 minutes depending on connection).

  3. Run with Gemini fallback (optional):

    docker run -p 5001:5001 \
      -e GEMINI_API_KEY="your_api_key_here" \
      getgit
    
  4. Access the web UI:

    http://localhost:5001
    

Running Locally

  1. Install dependencies:

    pip install -r requirements.txt
    
  2. Start the server:

    python server.py
    
  3. Access the web UI:

    http://localhost:5001
    

Model Download

On first run, the local model will be downloaded automatically:

INFO - Loading local model: Qwen/Qwen2.5-Coder-7B
INFO - This may take a few minutes on first run...
INFO - Successfully loaded local model

Download Size: ~14GB
Cache Location: ./models/
Reusable: Yes, persists across restarts

System Requirements

Minimum (CPU Mode)

  • RAM: 16GB
  • Storage: 20GB free
  • CPU: Multi-core processor

Recommended (GPU Mode)

  • RAM: 16GB
  • GPU: NVIDIA GPU with 8GB+ VRAM
  • Storage: 20GB free
  • CUDA: 11.7 or higher

LLM Selection Logic

The system automatically selects the best available LLM:

1. Attempt local Hugging Face model
   β”œβ”€ Success β†’ Use local model
   └─ Failure β†’ Try Gemini fallback
       β”œβ”€ API key available β†’ Use Gemini
       └─ No API key β†’ Error

Note: The fallback is automatic and transparent to the user.

Repository Management

How It Works

  1. First Repository:

    POST /initialize {"repo_url": "https://github.com/user/repo1.git"}
    β†’ Clones repo1
    β†’ Stores URL in data/source_repo.txt
    β†’ Indexes content
    
  2. Same Repository Again:

    POST /initialize {"repo_url": "https://github.com/user/repo1.git"}
    β†’ Detects same URL
    β†’ Reuses existing clone and index
    β†’ Fast startup
    
  3. Different Repository:

    POST /initialize {"repo_url": "https://github.com/user/repo2.git"}
    β†’ Detects URL change
    β†’ Deletes source_repo/ directory
    β†’ Deletes .rag_cache/ directory
    β†’ Updates data/source_repo.txt
    β†’ Clones repo2
    β†’ Re-indexes from scratch
    

Environment Variables

Variable Required Default Description
GEMINI_API_KEY No - Fallback API key for Gemini
PORT No 5001 Server port
FLASK_ENV No production Flask environment

Troubleshooting

Local Model Won't Load

Symptom: "Local model unavailable, falling back to Gemini..."

Solutions:

  1. Check available RAM (need 16GB+)
  2. Check available storage (need 20GB+)
  3. Verify transformers/torch are installed
  4. Check logs for specific error message

Out of Memory

Symptom: Process killed or memory error during model load

Solutions:

  1. Close other applications
  2. Use smaller model (requires code changes)
  3. Use Gemini fallback instead
  4. Add more RAM or swap space

Model Download Fails

Symptom: Connection errors during first run

Solutions:

  1. Check internet connection
  2. Check firewall settings
  3. Retry (downloads resume automatically)
  4. Use manual download and place in ./models/

Repository Not Updating

Symptom: Old repository content shown for new URL

Solutions:

  1. Delete data/source_repo.txt
  2. Delete source_repo/ directory
  3. Delete .rag_cache/ directory
  4. Restart application

Performance Tips

  1. First Run: Expect 10-15 minute model download
  2. Subsequent Runs: Model loads in ~30-60 seconds
  3. GPU Usage: Automatically detected and used if available
  4. CPU Usage: Works but slower (~5-10x slower than GPU)
  5. Memory: Keep 16GB+ free for optimal performance

Security

  • Local Model: No data sent externally
  • Gemini Fallback: Only used if explicitly configured
  • API Keys: Never logged or stored in code
  • Privacy: Local mode is completely offline

Limitations

  1. Model Size: 7B parameters (large but manageable)
  2. Context Length: 4096 tokens max
  3. GPU Memory: Requires 8GB+ VRAM for best performance
  4. First Run: Requires internet for model download

Support

For issues or questions:

  1. Check logs for error messages
  2. Review troubleshooting section above
  3. Open an issue on GitHub
  4. Include system specs and error logs