File size: 5,406 Bytes
8208c22
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
172b424
 
 
 
 
 
 
 
 
 
 
 
 
8208c22
 
 
 
 
 
 
 
 
 
 
 
 
 
 
172b424
8208c22
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
172b424
 
 
 
 
 
 
8208c22
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
# πŸ”§ Model Configuration Guide

The backend now supports **configurable models via environment variables**, making it easy to switch between different AI models without code changes.

## πŸ“‹ Environment Variables

### **Primary Configuration**

```bash
# Main AI model for text generation (required)
export AI_MODEL="deepseek-ai/DeepSeek-R1-0528-Qwen3-8B"

# Vision model for image processing (optional)
export VISION_MODEL="Salesforce/blip-image-captioning-base"

# HuggingFace token for private models (optional)
export HF_TOKEN="your_huggingface_token_here"
```

---

## πŸš€ Usage Examples

### **1. Use DeepSeek-R1 (Default)**

```bash
# Uses your originally requested model
export AI_MODEL="deepseek-ai/DeepSeek-R1-0528-Qwen3-8B"
./gradio_env/bin/python backend_service.py
```

### **2. Use DialoGPT (Faster, smaller)**

```bash
# Switch to lighter model for development/testing
export AI_MODEL="microsoft/DialoGPT-medium"
./gradio_env/bin/python backend_service.py
```

### **3. Use Unsloth 4-bit Quantized Models**

```bash
# Use Unsloth 4-bit Mistral model (memory efficient)
export AI_MODEL="unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit"
./gradio_env/bin/python backend_service.py

# Use other Unsloth models
export AI_MODEL="unsloth/llama-3-8b-Instruct-bnb-4bit"
./gradio_env/bin/python backend_service.py
```

### **4. Use Other Popular Models**

```bash
# Use Zephyr chat model
export AI_MODEL="HuggingFaceH4/zephyr-7b-beta"
./gradio_env/bin/python backend_service.py

# Use CodeLlama for code generation
export AI_MODEL="codellama/CodeLlama-7b-Instruct-hf"
./gradio_env/bin/python backend_service.py

# Use Mistral
export AI_MODEL="mistralai/Mistral-7B-Instruct-v0.2"
./gradio_env/bin/python backend_service.py
```

### **5. Use Different Vision Model**

```bash
export AI_MODEL="microsoft/DialoGPT-medium"
export VISION_MODEL="nlpconnect/vit-gpt2-image-captioning"
./gradio_env/bin/python backend_service.py
```

---

## πŸ“ Startup Script Examples

### **Development Mode (Fast startup)**

```bash
#!/bin/bash
# dev_mode.sh
export AI_MODEL="microsoft/DialoGPT-medium"
export VISION_MODEL="Salesforce/blip-image-captioning-base"
./gradio_env/bin/python backend_service.py
```

### **Production Mode (Your preferred model)**

```bash
#!/bin/bash
# production_mode.sh
export AI_MODEL="deepseek-ai/DeepSeek-R1-0528-Qwen3-8B"
export VISION_MODEL="Salesforce/blip-image-captioning-base"
export HF_TOKEN="$YOUR_HF_TOKEN"
./gradio_env/bin/python backend_service.py
```

### **Testing Mode (Lightweight)**

```bash
#!/bin/bash
# test_mode.sh
export AI_MODEL="microsoft/DialoGPT-medium"
export VISION_MODEL="Salesforce/blip-image-captioning-base"
./gradio_env/bin/python backend_service.py
```

---

## πŸ” Model Verification

After starting the backend, check which model is loaded:

```bash
curl http://localhost:8000/health
```

Response will show:

```json
{
  "status": "healthy",
  "model": "deepseek-ai/DeepSeek-R1-0528-Qwen3-8B",
  "version": "1.0.0"
}
```

---

## πŸ“Š Model Comparison

| Model                                         | Size   | Speed     | Quality      | Use Case            |
| --------------------------------------------- | ------ | --------- | ------------ | ------------------- |
| `microsoft/DialoGPT-medium`                   | ~355MB | ⚑ Fast   | Good         | Development/Testing |
| `deepseek-ai/DeepSeek-R1-0528-Qwen3-8B`       | ~16GB  | 🐌 Slow   | ⭐ Excellent | Production          |
| `unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit` | ~7GB   | πŸš€ Medium | ⭐ Excellent | Production (4-bit)  |
| `HuggingFaceH4/zephyr-7b-beta`                | ~14GB  | 🐌 Slow   | ⭐ Excellent | Chat/Conversation   |
| `codellama/CodeLlama-7b-Instruct-hf`          | ~13GB  | 🐌 Slow   | ⭐ Good      | Code Generation     |

---

## πŸ› οΈ Troubleshooting

### **Model Not Found**

```bash
# Verify model exists on HuggingFace
./gradio_env/bin/python -c "
from huggingface_hub import HfApi
api = HfApi()
try:
    info = api.model_info('your-model-name')
    print(f'βœ… Model exists: {info.id}')
except:
    print('❌ Model not found')
"
```

### **Memory Issues**

```bash
# Use smaller model for limited RAM
export AI_MODEL="microsoft/DialoGPT-medium"  # ~355MB
# or
export AI_MODEL="distilgpt2"  # ~82MB
```

### **Authentication Issues**

```bash
# Set HuggingFace token for private models
export HF_TOKEN="hf_your_token_here"
```

---

## 🎯 Quick Switch Commands

```bash
# Quick switch to development mode
export AI_MODEL="microsoft/DialoGPT-medium" && ./gradio_env/bin/python backend_service.py

# Quick switch to production mode
export AI_MODEL="deepseek-ai/DeepSeek-R1-0528-Qwen3-8B" && ./gradio_env/bin/python backend_service.py

# Quick switch with custom vision model
export AI_MODEL="microsoft/DialoGPT-medium" AI_VISION="nlpconnect/vit-gpt2-image-captioning" && ./gradio_env/bin/python backend_service.py
```

---

## βœ… Summary

- **Environment Variable**: `AI_MODEL` controls the main text generation model
- **Default**: `deepseek-ai/DeepSeek-R1-0528-Qwen3-8B` (your original preference)
- **Alternative**: `microsoft/DialoGPT-medium` (faster for development)
- **Vision Model**: `VISION_MODEL` controls image processing model
- **No Code Changes**: Switch models by changing environment variables only

**Your original DeepSeek-R1 model is still the default** - I simply made it configurable so you can easily switch when needed!