# 🔧 Model Configuration Guide

The backend now supports **configurable models via environment variables**, making it easy to switch between different AI models without code changes.

## 📋 Environment Variables

### **Primary Configuration**

```bash
# Main AI model for text generation (required)
export AI_MODEL="deepseek-ai/DeepSeek-R1-0528-Qwen3-8B"

# Vision model for image processing (optional)
export VISION_MODEL="Salesforce/blip-image-captioning-base"

# HuggingFace token for private models (optional)
export HF_TOKEN="your_huggingface_token_here"
```

---

## 🚀 Usage Examples

### **1. Use DeepSeek-R1 (Default)**

```bash
# Uses your originally requested model
export AI_MODEL="deepseek-ai/DeepSeek-R1-0528-Qwen3-8B"
./gradio_env/bin/python backend_service.py
```

### **2. Use DialoGPT (Faster, smaller)**

```bash
# Switch to lighter model for development/testing
export AI_MODEL="microsoft/DialoGPT-medium"
./gradio_env/bin/python backend_service.py
```

### **3. Use Unsloth 4-bit Quantized Models**

```bash
# Use Unsloth 4-bit Mistral model (memory efficient)
export AI_MODEL="unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit"
./gradio_env/bin/python backend_service.py

# Use other Unsloth models
export AI_MODEL="unsloth/llama-3-8b-Instruct-bnb-4bit"
./gradio_env/bin/python backend_service.py
```

### **4. Use Other Popular Models**

```bash
# Use Zephyr chat model
export AI_MODEL="HuggingFaceH4/zephyr-7b-beta"
./gradio_env/bin/python backend_service.py

# Use CodeLlama for code generation
export AI_MODEL="codellama/CodeLlama-7b-Instruct-hf"
./gradio_env/bin/python backend_service.py

# Use Mistral
export AI_MODEL="mistralai/Mistral-7B-Instruct-v0.2"
./gradio_env/bin/python backend_service.py
```

### **5. Use Different Vision Model**

```bash
export AI_MODEL="microsoft/DialoGPT-medium"
export VISION_MODEL="nlpconnect/vit-gpt2-image-captioning"
./gradio_env/bin/python backend_service.py
```

---

## 📝 Startup Script Examples

### **Development Mode (Fast startup)**

```bash
#!/bin/bash
# dev_mode.sh
export AI_MODEL="microsoft/DialoGPT-medium"
export VISION_MODEL="Salesforce/blip-image-captioning-base"
./gradio_env/bin/python backend_service.py
```

### **Production Mode (Your preferred model)**

```bash
#!/bin/bash
# production_mode.sh
export AI_MODEL="deepseek-ai/DeepSeek-R1-0528-Qwen3-8B"
export VISION_MODEL="Salesforce/blip-image-captioning-base"
export HF_TOKEN="$YOUR_HF_TOKEN"
./gradio_env/bin/python backend_service.py
```

### **Testing Mode (Lightweight)**

```bash
#!/bin/bash
# test_mode.sh
export AI_MODEL="microsoft/DialoGPT-medium"
export VISION_MODEL="Salesforce/blip-image-captioning-base"
./gradio_env/bin/python backend_service.py
```

---

## 🔍 Model Verification

After starting the backend, check which model is loaded:

```bash
curl http://localhost:8000/health
```

Response will show:

```json
{
  "status": "healthy",
  "model": "deepseek-ai/DeepSeek-R1-0528-Qwen3-8B",
  "version": "1.0.0"
}
```

---

## 📊 Model Comparison

| Model                                         | Size   | Speed     | Quality      | Use Case            |
| --------------------------------------------- | ------ | --------- | ------------ | ------------------- |
| `microsoft/DialoGPT-medium`                   | ~355MB | ⚡ Fast   | Good         | Development/Testing |
| `deepseek-ai/DeepSeek-R1-0528-Qwen3-8B`       | ~16GB  | 🐌 Slow   | ⭐ Excellent | Production          |
| `unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit` | ~7GB   | 🚀 Medium | ⭐ Excellent | Production (4-bit)  |
| `HuggingFaceH4/zephyr-7b-beta`                | ~14GB  | 🐌 Slow   | ⭐ Excellent | Chat/Conversation   |
| `codellama/CodeLlama-7b-Instruct-hf`          | ~13GB  | 🐌 Slow   | ⭐ Good      | Code Generation     |

---

## 🛠️ Troubleshooting

### **Model Not Found**

```bash
# Verify model exists on HuggingFace
./gradio_env/bin/python -c "
from huggingface_hub import HfApi
api = HfApi()
try:
    info = api.model_info('your-model-name')
    print(f'✅ Model exists: {info.id}')
except:
    print('❌ Model not found')
"
```

### **Memory Issues**

```bash
# Use smaller model for limited RAM
export AI_MODEL="microsoft/DialoGPT-medium"  # ~355MB
# or
export AI_MODEL="distilgpt2"  # ~82MB
```

### **Authentication Issues**

```bash
# Set HuggingFace token for private models
export HF_TOKEN="hf_your_token_here"
```

---

## 🎯 Quick Switch Commands

```bash
# Quick switch to development mode
export AI_MODEL="microsoft/DialoGPT-medium" && ./gradio_env/bin/python backend_service.py

# Quick switch to production mode
export AI_MODEL="deepseek-ai/DeepSeek-R1-0528-Qwen3-8B" && ./gradio_env/bin/python backend_service.py

# Quick switch with custom vision model
export AI_MODEL="microsoft/DialoGPT-medium" AI_VISION="nlpconnect/vit-gpt2-image-captioning" && ./gradio_env/bin/python backend_service.py
```

---

## ✅ Summary

- **Environment Variable**: `AI_MODEL` controls the main text generation model
- **Default**: `deepseek-ai/DeepSeek-R1-0528-Qwen3-8B` (your original preference)
- **Alternative**: `microsoft/DialoGPT-medium` (faster for development)
- **Vision Model**: `VISION_MODEL` controls image processing model
- **No Code Changes**: Switch models by changing environment variables only

**Your original DeepSeek-R1 model is still the default** - I simply made it configurable so you can easily switch when needed!