File size: 5,406 Bytes
8208c22 172b424 8208c22 172b424 8208c22 172b424 8208c22 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 |
# π§ Model Configuration Guide
The backend now supports **configurable models via environment variables**, making it easy to switch between different AI models without code changes.
## π Environment Variables
### **Primary Configuration**
```bash
# Main AI model for text generation (required)
export AI_MODEL="deepseek-ai/DeepSeek-R1-0528-Qwen3-8B"
# Vision model for image processing (optional)
export VISION_MODEL="Salesforce/blip-image-captioning-base"
# HuggingFace token for private models (optional)
export HF_TOKEN="your_huggingface_token_here"
```
---
## π Usage Examples
### **1. Use DeepSeek-R1 (Default)**
```bash
# Uses your originally requested model
export AI_MODEL="deepseek-ai/DeepSeek-R1-0528-Qwen3-8B"
./gradio_env/bin/python backend_service.py
```
### **2. Use DialoGPT (Faster, smaller)**
```bash
# Switch to lighter model for development/testing
export AI_MODEL="microsoft/DialoGPT-medium"
./gradio_env/bin/python backend_service.py
```
### **3. Use Unsloth 4-bit Quantized Models**
```bash
# Use Unsloth 4-bit Mistral model (memory efficient)
export AI_MODEL="unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit"
./gradio_env/bin/python backend_service.py
# Use other Unsloth models
export AI_MODEL="unsloth/llama-3-8b-Instruct-bnb-4bit"
./gradio_env/bin/python backend_service.py
```
### **4. Use Other Popular Models**
```bash
# Use Zephyr chat model
export AI_MODEL="HuggingFaceH4/zephyr-7b-beta"
./gradio_env/bin/python backend_service.py
# Use CodeLlama for code generation
export AI_MODEL="codellama/CodeLlama-7b-Instruct-hf"
./gradio_env/bin/python backend_service.py
# Use Mistral
export AI_MODEL="mistralai/Mistral-7B-Instruct-v0.2"
./gradio_env/bin/python backend_service.py
```
### **5. Use Different Vision Model**
```bash
export AI_MODEL="microsoft/DialoGPT-medium"
export VISION_MODEL="nlpconnect/vit-gpt2-image-captioning"
./gradio_env/bin/python backend_service.py
```
---
## π Startup Script Examples
### **Development Mode (Fast startup)**
```bash
#!/bin/bash
# dev_mode.sh
export AI_MODEL="microsoft/DialoGPT-medium"
export VISION_MODEL="Salesforce/blip-image-captioning-base"
./gradio_env/bin/python backend_service.py
```
### **Production Mode (Your preferred model)**
```bash
#!/bin/bash
# production_mode.sh
export AI_MODEL="deepseek-ai/DeepSeek-R1-0528-Qwen3-8B"
export VISION_MODEL="Salesforce/blip-image-captioning-base"
export HF_TOKEN="$YOUR_HF_TOKEN"
./gradio_env/bin/python backend_service.py
```
### **Testing Mode (Lightweight)**
```bash
#!/bin/bash
# test_mode.sh
export AI_MODEL="microsoft/DialoGPT-medium"
export VISION_MODEL="Salesforce/blip-image-captioning-base"
./gradio_env/bin/python backend_service.py
```
---
## π Model Verification
After starting the backend, check which model is loaded:
```bash
curl http://localhost:8000/health
```
Response will show:
```json
{
"status": "healthy",
"model": "deepseek-ai/DeepSeek-R1-0528-Qwen3-8B",
"version": "1.0.0"
}
```
---
## π Model Comparison
| Model | Size | Speed | Quality | Use Case |
| --------------------------------------------- | ------ | --------- | ------------ | ------------------- |
| `microsoft/DialoGPT-medium` | ~355MB | β‘ Fast | Good | Development/Testing |
| `deepseek-ai/DeepSeek-R1-0528-Qwen3-8B` | ~16GB | π Slow | β Excellent | Production |
| `unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit` | ~7GB | π Medium | β Excellent | Production (4-bit) |
| `HuggingFaceH4/zephyr-7b-beta` | ~14GB | π Slow | β Excellent | Chat/Conversation |
| `codellama/CodeLlama-7b-Instruct-hf` | ~13GB | π Slow | β Good | Code Generation |
---
## π οΈ Troubleshooting
### **Model Not Found**
```bash
# Verify model exists on HuggingFace
./gradio_env/bin/python -c "
from huggingface_hub import HfApi
api = HfApi()
try:
info = api.model_info('your-model-name')
print(f'β
Model exists: {info.id}')
except:
print('β Model not found')
"
```
### **Memory Issues**
```bash
# Use smaller model for limited RAM
export AI_MODEL="microsoft/DialoGPT-medium" # ~355MB
# or
export AI_MODEL="distilgpt2" # ~82MB
```
### **Authentication Issues**
```bash
# Set HuggingFace token for private models
export HF_TOKEN="hf_your_token_here"
```
---
## π― Quick Switch Commands
```bash
# Quick switch to development mode
export AI_MODEL="microsoft/DialoGPT-medium" && ./gradio_env/bin/python backend_service.py
# Quick switch to production mode
export AI_MODEL="deepseek-ai/DeepSeek-R1-0528-Qwen3-8B" && ./gradio_env/bin/python backend_service.py
# Quick switch with custom vision model
export AI_MODEL="microsoft/DialoGPT-medium" AI_VISION="nlpconnect/vit-gpt2-image-captioning" && ./gradio_env/bin/python backend_service.py
```
---
## β
Summary
- **Environment Variable**: `AI_MODEL` controls the main text generation model
- **Default**: `deepseek-ai/DeepSeek-R1-0528-Qwen3-8B` (your original preference)
- **Alternative**: `microsoft/DialoGPT-medium` (faster for development)
- **Vision Model**: `VISION_MODEL` controls image processing model
- **No Code Changes**: Switch models by changing environment variables only
**Your original DeepSeek-R1 model is still the default** - I simply made it configurable so you can easily switch when needed!
|