Spaces:

cong182
/

firstAI

Sleeping

App Files Files Community

firstAI / MODEL_CONFIG.md

ndc8

update to use unsloth + mistral

172b424 5 months ago

preview code

raw

history blame

5.41 kB

	# 🔧 Model Configuration Guide

	The backend now supports configurable models via environment variables, making it easy to switch between different AI models without code changes.

	## 📋 Environment Variables

	### Primary Configuration

	```bash
	# Main AI model for text generation (required)
	export AI_MODEL="deepseek-ai/DeepSeek-R1-0528-Qwen3-8B"

	# Vision model for image processing (optional)
	export VISION_MODEL="Salesforce/blip-image-captioning-base"

	# HuggingFace token for private models (optional)
	export HF_TOKEN="your_huggingface_token_here"
	```

	---

	## 🚀 Usage Examples

	### 1. Use DeepSeek-R1 (Default)

	```bash
	# Uses your originally requested model
	export AI_MODEL="deepseek-ai/DeepSeek-R1-0528-Qwen3-8B"
	./gradio_env/bin/python backend_service.py
	```

	### 2. Use DialoGPT (Faster, smaller)

	```bash
	# Switch to lighter model for development/testing
	export AI_MODEL="microsoft/DialoGPT-medium"
	./gradio_env/bin/python backend_service.py
	```

	### 3. Use Unsloth 4-bit Quantized Models

	```bash
	# Use Unsloth 4-bit Mistral model (memory efficient)
	export AI_MODEL="unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit"
	./gradio_env/bin/python backend_service.py

	# Use other Unsloth models
	export AI_MODEL="unsloth/llama-3-8b-Instruct-bnb-4bit"
	./gradio_env/bin/python backend_service.py
	```

	### 4. Use Other Popular Models

	```bash
	# Use Zephyr chat model
	export AI_MODEL="HuggingFaceH4/zephyr-7b-beta"
	./gradio_env/bin/python backend_service.py

	# Use CodeLlama for code generation
	export AI_MODEL="codellama/CodeLlama-7b-Instruct-hf"
	./gradio_env/bin/python backend_service.py

	# Use Mistral
	export AI_MODEL="mistralai/Mistral-7B-Instruct-v0.2"
	./gradio_env/bin/python backend_service.py
	```

	### 5. Use Different Vision Model

	```bash
	export AI_MODEL="microsoft/DialoGPT-medium"
	export VISION_MODEL="nlpconnect/vit-gpt2-image-captioning"
	./gradio_env/bin/python backend_service.py
	```

	---

	## 📝 Startup Script Examples

	### Development Mode (Fast startup)

	```bash
	#!/bin/bash
	# dev_mode.sh
	export AI_MODEL="microsoft/DialoGPT-medium"
	export VISION_MODEL="Salesforce/blip-image-captioning-base"
	./gradio_env/bin/python backend_service.py
	```

	### Production Mode (Your preferred model)

	```bash
	#!/bin/bash
	# production_mode.sh
	export AI_MODEL="deepseek-ai/DeepSeek-R1-0528-Qwen3-8B"
	export VISION_MODEL="Salesforce/blip-image-captioning-base"
	export HF_TOKEN="$YOUR_HF_TOKEN"
	./gradio_env/bin/python backend_service.py
	```

	### Testing Mode (Lightweight)

	```bash
	#!/bin/bash
	# test_mode.sh
	export AI_MODEL="microsoft/DialoGPT-medium"
	export VISION_MODEL="Salesforce/blip-image-captioning-base"
	./gradio_env/bin/python backend_service.py
	```

	---

	## 🔍 Model Verification

	After starting the backend, check which model is loaded:

	```bash
	curl http://localhost:8000/health
	```

	Response will show:

	```json
	{
	"status": "healthy",
	"model": "deepseek-ai/DeepSeek-R1-0528-Qwen3-8B",
	"version": "1.0.0"
	}
	```

	---

	## 📊 Model Comparison

	\| Model \| Size \| Speed \| Quality \| Use Case \|
	\| --------------------------------------------- \| ------ \| --------- \| ------------ \| ------------------- \|
	\| `microsoft/DialoGPT-medium` \| ~355MB \| ⚡ Fast \| Good \| Development/Testing \|
	\| `deepseek-ai/DeepSeek-R1-0528-Qwen3-8B` \| ~16GB \| 🐌 Slow \| ⭐ Excellent \| Production \|
	\| `unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit` \| ~7GB \| 🚀 Medium \| ⭐ Excellent \| Production (4-bit) \|
	\| `HuggingFaceH4/zephyr-7b-beta` \| ~14GB \| 🐌 Slow \| ⭐ Excellent \| Chat/Conversation \|
	\| `codellama/CodeLlama-7b-Instruct-hf` \| ~13GB \| 🐌 Slow \| ⭐ Good \| Code Generation \|

	---

	## 🛠️ Troubleshooting

	### Model Not Found

	```bash
	# Verify model exists on HuggingFace
	./gradio_env/bin/python -c "
	from huggingface_hub import HfApi
	api = HfApi()
	try:
	info = api.model_info('your-model-name')
	print(f'✅ Model exists: {info.id}')
	except:
	print('❌ Model not found')
	"
	```

	### Memory Issues

	```bash
	# Use smaller model for limited RAM
	export AI_MODEL="microsoft/DialoGPT-medium" # ~355MB
	# or
	export AI_MODEL="distilgpt2" # ~82MB
	```

	### Authentication Issues

	```bash
	# Set HuggingFace token for private models
	export HF_TOKEN="hf_your_token_here"
	```

	---

	## 🎯 Quick Switch Commands

	```bash
	# Quick switch to development mode
	export AI_MODEL="microsoft/DialoGPT-medium" && ./gradio_env/bin/python backend_service.py

	# Quick switch to production mode
	export AI_MODEL="deepseek-ai/DeepSeek-R1-0528-Qwen3-8B" && ./gradio_env/bin/python backend_service.py

	# Quick switch with custom vision model
	export AI_MODEL="microsoft/DialoGPT-medium" AI_VISION="nlpconnect/vit-gpt2-image-captioning" && ./gradio_env/bin/python backend_service.py
	```

	---

	## ✅ Summary

	- Environment Variable: `AI_MODEL` controls the main text generation model
	- Default: `deepseek-ai/DeepSeek-R1-0528-Qwen3-8B` (your original preference)
	- Alternative: `microsoft/DialoGPT-medium` (faster for development)
	- Vision Model: `VISION_MODEL` controls image processing model
	- No Code Changes: Switch models by changing environment variables only

	Your original DeepSeek-R1 model is still the default - I simply made it configurable so you can easily switch when needed!