firstAI / MODEL_CONFIG.md
ndc8
update to use unsloth + mistral
172b424
|
raw
history blame
5.41 kB
# πŸ”§ Model Configuration Guide
The backend now supports **configurable models via environment variables**, making it easy to switch between different AI models without code changes.
## πŸ“‹ Environment Variables
### **Primary Configuration**
```bash
# Main AI model for text generation (required)
export AI_MODEL="deepseek-ai/DeepSeek-R1-0528-Qwen3-8B"
# Vision model for image processing (optional)
export VISION_MODEL="Salesforce/blip-image-captioning-base"
# HuggingFace token for private models (optional)
export HF_TOKEN="your_huggingface_token_here"
```
---
## πŸš€ Usage Examples
### **1. Use DeepSeek-R1 (Default)**
```bash
# Uses your originally requested model
export AI_MODEL="deepseek-ai/DeepSeek-R1-0528-Qwen3-8B"
./gradio_env/bin/python backend_service.py
```
### **2. Use DialoGPT (Faster, smaller)**
```bash
# Switch to lighter model for development/testing
export AI_MODEL="microsoft/DialoGPT-medium"
./gradio_env/bin/python backend_service.py
```
### **3. Use Unsloth 4-bit Quantized Models**
```bash
# Use Unsloth 4-bit Mistral model (memory efficient)
export AI_MODEL="unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit"
./gradio_env/bin/python backend_service.py
# Use other Unsloth models
export AI_MODEL="unsloth/llama-3-8b-Instruct-bnb-4bit"
./gradio_env/bin/python backend_service.py
```
### **4. Use Other Popular Models**
```bash
# Use Zephyr chat model
export AI_MODEL="HuggingFaceH4/zephyr-7b-beta"
./gradio_env/bin/python backend_service.py
# Use CodeLlama for code generation
export AI_MODEL="codellama/CodeLlama-7b-Instruct-hf"
./gradio_env/bin/python backend_service.py
# Use Mistral
export AI_MODEL="mistralai/Mistral-7B-Instruct-v0.2"
./gradio_env/bin/python backend_service.py
```
### **5. Use Different Vision Model**
```bash
export AI_MODEL="microsoft/DialoGPT-medium"
export VISION_MODEL="nlpconnect/vit-gpt2-image-captioning"
./gradio_env/bin/python backend_service.py
```
---
## πŸ“ Startup Script Examples
### **Development Mode (Fast startup)**
```bash
#!/bin/bash
# dev_mode.sh
export AI_MODEL="microsoft/DialoGPT-medium"
export VISION_MODEL="Salesforce/blip-image-captioning-base"
./gradio_env/bin/python backend_service.py
```
### **Production Mode (Your preferred model)**
```bash
#!/bin/bash
# production_mode.sh
export AI_MODEL="deepseek-ai/DeepSeek-R1-0528-Qwen3-8B"
export VISION_MODEL="Salesforce/blip-image-captioning-base"
export HF_TOKEN="$YOUR_HF_TOKEN"
./gradio_env/bin/python backend_service.py
```
### **Testing Mode (Lightweight)**
```bash
#!/bin/bash
# test_mode.sh
export AI_MODEL="microsoft/DialoGPT-medium"
export VISION_MODEL="Salesforce/blip-image-captioning-base"
./gradio_env/bin/python backend_service.py
```
---
## πŸ” Model Verification
After starting the backend, check which model is loaded:
```bash
curl http://localhost:8000/health
```
Response will show:
```json
{
"status": "healthy",
"model": "deepseek-ai/DeepSeek-R1-0528-Qwen3-8B",
"version": "1.0.0"
}
```
---
## πŸ“Š Model Comparison
| Model | Size | Speed | Quality | Use Case |
| --------------------------------------------- | ------ | --------- | ------------ | ------------------- |
| `microsoft/DialoGPT-medium` | ~355MB | ⚑ Fast | Good | Development/Testing |
| `deepseek-ai/DeepSeek-R1-0528-Qwen3-8B` | ~16GB | 🐌 Slow | ⭐ Excellent | Production |
| `unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit` | ~7GB | πŸš€ Medium | ⭐ Excellent | Production (4-bit) |
| `HuggingFaceH4/zephyr-7b-beta` | ~14GB | 🐌 Slow | ⭐ Excellent | Chat/Conversation |
| `codellama/CodeLlama-7b-Instruct-hf` | ~13GB | 🐌 Slow | ⭐ Good | Code Generation |
---
## πŸ› οΈ Troubleshooting
### **Model Not Found**
```bash
# Verify model exists on HuggingFace
./gradio_env/bin/python -c "
from huggingface_hub import HfApi
api = HfApi()
try:
info = api.model_info('your-model-name')
print(f'βœ… Model exists: {info.id}')
except:
print('❌ Model not found')
"
```
### **Memory Issues**
```bash
# Use smaller model for limited RAM
export AI_MODEL="microsoft/DialoGPT-medium" # ~355MB
# or
export AI_MODEL="distilgpt2" # ~82MB
```
### **Authentication Issues**
```bash
# Set HuggingFace token for private models
export HF_TOKEN="hf_your_token_here"
```
---
## 🎯 Quick Switch Commands
```bash
# Quick switch to development mode
export AI_MODEL="microsoft/DialoGPT-medium" && ./gradio_env/bin/python backend_service.py
# Quick switch to production mode
export AI_MODEL="deepseek-ai/DeepSeek-R1-0528-Qwen3-8B" && ./gradio_env/bin/python backend_service.py
# Quick switch with custom vision model
export AI_MODEL="microsoft/DialoGPT-medium" AI_VISION="nlpconnect/vit-gpt2-image-captioning" && ./gradio_env/bin/python backend_service.py
```
---
## βœ… Summary
- **Environment Variable**: `AI_MODEL` controls the main text generation model
- **Default**: `deepseek-ai/DeepSeek-R1-0528-Qwen3-8B` (your original preference)
- **Alternative**: `microsoft/DialoGPT-medium` (faster for development)
- **Vision Model**: `VISION_MODEL` controls image processing model
- **No Code Changes**: Switch models by changing environment variables only
**Your original DeepSeek-R1 model is still the default** - I simply made it configurable so you can easily switch when needed!