# HuggingFace Spaces GPU Setup Guide 🚀

This guide will help you enable GPU acceleration for GRDN AI on HuggingFace Spaces with your Nvidia T4 grant.

## Prerequisites
- HuggingFace Space with GPU enabled (Nvidia T4 small: 4 vCPU, 15GB RAM, 16GB GPU)
- Model files uploaded to your Space

## Setup Steps

### 1. Enable GPU in Space Settings
1. Go to your Space settings on HuggingFace
2. Navigate to "Hardware" section
3. Select "T4 small" (or your granted GPU tier)
4. Save changes

### 2. Upload Model Files
Your Space needs the GGUF model files in the `src/models/` directory:
- `llama-2-7b-chat.Q4_K_M.gguf` (for Llama2)
- `decilm-7b-uniform-gqa-q8_0.gguf` (for DeciLM)

You can upload these via:
- HuggingFace web interface (Files tab)
- Git LFS (recommended for large files)
- HuggingFace Hub CLI

### 3. Install Dependencies
Make sure your Space has the updated `requirements.txt` which includes:
```
torch>=2.0.0
```

### 4. Verify GPU Detection
Once your Space restarts, check the sidebar in the app for:
- 🚀 **GPU Acceleration: ENABLED** - GPU is working!
- ⚠️ **GPU Acceleration: DISABLED** - Something's wrong

You should also see in the logs:
```
🤗 Running on HuggingFace Spaces
🚀 GPU detected: Tesla T4 with 15.xx GB memory
🚀 Will offload all layers to GPU (n_gpu_layers=-1)
✅ GPU acceleration ENABLED with -1 layers
```

## How It Works

The app now automatically:
1. **Detects HuggingFace Spaces environment** via `SPACE_ID` or `SPACE_AUTHOR_NAME` env variables
2. **Checks for GPU availability** using PyTorch's `torch.cuda.is_available()`
3. **Configures LlamaCPP** to use GPU with `n_gpu_layers=-1` (all layers on GPU)
4. **Shows status** in the sidebar UI

### GPU Configuration
- **CPU Mode**: `n_gpu_layers=0` - All computation on CPU (slow)
- **GPU Mode**: `n_gpu_layers=-1` - All model layers offloaded to GPU (fast)

## Performance Expectations

With GPU acceleration on Nvidia T4:
- **Response time**: ~2-5 seconds (vs 30-60+ seconds on CPU)
- **Token generation**: ~20-50 tokens/sec (vs 1-3 tokens/sec on CPU)
- **Memory**: Model fits comfortably in 16GB VRAM

## Troubleshooting

### GPU Not Detected
1. **Check Space hardware**: Ensure T4 is selected in settings
2. **Check logs**: Look for GPU detection messages
3. **Verify torch installation**: `torch.cuda.is_available()` should return `True`
4. **Try restarting**: Sometimes requires Space restart after hardware change

### Model File Not Found
If you see: `⚠️ Model not found at src/models/...`
- Upload the model files to the correct path
- Check file names match exactly
- Ensure files aren't corrupted during upload

### Out of Memory Errors
If GPU runs out of memory:
- The quantized models (Q4_K_M, q8_0) are designed to fit in 16GB
- Try restarting the Space
- Check if other processes are using GPU memory

### Still Slow After GPU Setup
1. Verify GPU is actually being used (check logs)
2. Ensure `n_gpu_layers=-1` is set (check initialization logs)
3. Check HuggingFace Space isn't in "Sleeping" mode
4. Verify model is fully loaded before making requests

## Code Changes Summary

The following changes enable automatic GPU detection:

1. **`src/backend/chatbot.py`**:
   - Added `detect_gpu_and_environment()` function
   - Modified `init_llm()` to use dynamic GPU configuration
   - Automatic path detection for HF Spaces vs local

2. **`app.py`**:
   - Added GPU status indicator in sidebar
   - Shows real-time GPU availability

3. **`src/requirements.txt`**:
   - Added `torch>=2.0.0` for GPU detection

## Testing Locally

To test GPU detection locally (if you have an Nvidia GPU):
```bash
# Install CUDA-enabled PyTorch
pip install torch --index-url https://download.pytorch.org/whl/cu118

# Run the app
streamlit run app.py
```

Without GPU locally, you'll see:
```
⚠️ No GPU detected via torch.cuda
⚠️ Running on CPU (no GPU detected)
```

## Additional Resources

- [HuggingFace Spaces Hardware Documentation](https://huggingface.co/docs/hub/spaces-gpus)
- [LlamaCPP GPU Acceleration Guide](https://github.com/ggerganov/llama.cpp#cublas)
- [PyTorch CUDA Setup](https://pytorch.org/get-started/locally/)

---

**Note**: This GPU setup is backward compatible - the app will still work on CPU if no GPU is available!