| # π§ Model Configuration Guide | |
| The backend now supports **configurable models via environment variables**, making it easy to switch between different AI models without code changes. | |
| ## π Environment Variables | |
| ### **Primary Configuration** | |
| ```bash | |
| # Main AI model for text generation (required) | |
| export AI_MODEL="deepseek-ai/DeepSeek-R1-0528-Qwen3-8B" | |
| # Vision model for image processing (optional) | |
| export VISION_MODEL="Salesforce/blip-image-captioning-base" | |
| # HuggingFace token for private models (optional) | |
| export HF_TOKEN="your_huggingface_token_here" | |
| ``` | |
| --- | |
| ## π Usage Examples | |
| ### **1. Use DeepSeek-R1 (Default)** | |
| ```bash | |
| # Uses your originally requested model | |
| export AI_MODEL="deepseek-ai/DeepSeek-R1-0528-Qwen3-8B" | |
| ./gradio_env/bin/python backend_service.py | |
| ``` | |
| ### **2. Use DialoGPT (Faster, smaller)** | |
| ```bash | |
| # Switch to lighter model for development/testing | |
| export AI_MODEL="microsoft/DialoGPT-medium" | |
| ./gradio_env/bin/python backend_service.py | |
| ``` | |
| ### **3. Use Unsloth 4-bit Quantized Models** | |
| ```bash | |
| # Use Unsloth 4-bit Mistral model (memory efficient) | |
| export AI_MODEL="unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit" | |
| ./gradio_env/bin/python backend_service.py | |
| # Use other Unsloth models | |
| export AI_MODEL="unsloth/llama-3-8b-Instruct-bnb-4bit" | |
| ./gradio_env/bin/python backend_service.py | |
| ``` | |
| ### **4. Use Other Popular Models** | |
| ```bash | |
| # Use Zephyr chat model | |
| export AI_MODEL="HuggingFaceH4/zephyr-7b-beta" | |
| ./gradio_env/bin/python backend_service.py | |
| # Use CodeLlama for code generation | |
| export AI_MODEL="codellama/CodeLlama-7b-Instruct-hf" | |
| ./gradio_env/bin/python backend_service.py | |
| # Use Mistral | |
| export AI_MODEL="mistralai/Mistral-7B-Instruct-v0.2" | |
| ./gradio_env/bin/python backend_service.py | |
| ``` | |
| ### **5. Use Different Vision Model** | |
| ```bash | |
| export AI_MODEL="microsoft/DialoGPT-medium" | |
| export VISION_MODEL="nlpconnect/vit-gpt2-image-captioning" | |
| ./gradio_env/bin/python backend_service.py | |
| ``` | |
| --- | |
| ## π Startup Script Examples | |
| ### **Development Mode (Fast startup)** | |
| ```bash | |
| #!/bin/bash | |
| # dev_mode.sh | |
| export AI_MODEL="microsoft/DialoGPT-medium" | |
| export VISION_MODEL="Salesforce/blip-image-captioning-base" | |
| ./gradio_env/bin/python backend_service.py | |
| ``` | |
| ### **Production Mode (Your preferred model)** | |
| ```bash | |
| #!/bin/bash | |
| # production_mode.sh | |
| export AI_MODEL="deepseek-ai/DeepSeek-R1-0528-Qwen3-8B" | |
| export VISION_MODEL="Salesforce/blip-image-captioning-base" | |
| export HF_TOKEN="$YOUR_HF_TOKEN" | |
| ./gradio_env/bin/python backend_service.py | |
| ``` | |
| ### **Testing Mode (Lightweight)** | |
| ```bash | |
| #!/bin/bash | |
| # test_mode.sh | |
| export AI_MODEL="microsoft/DialoGPT-medium" | |
| export VISION_MODEL="Salesforce/blip-image-captioning-base" | |
| ./gradio_env/bin/python backend_service.py | |
| ``` | |
| --- | |
| ## π Model Verification | |
| After starting the backend, check which model is loaded: | |
| ```bash | |
| curl http://localhost:8000/health | |
| ``` | |
| Response will show: | |
| ```json | |
| { | |
| "status": "healthy", | |
| "model": "deepseek-ai/DeepSeek-R1-0528-Qwen3-8B", | |
| "version": "1.0.0" | |
| } | |
| ``` | |
| --- | |
| ## π Model Comparison | |
| | Model | Size | Speed | Quality | Use Case | | |
| | --------------------------------------------- | ------ | --------- | ------------ | ------------------- | | |
| | `microsoft/DialoGPT-medium` | ~355MB | β‘ Fast | Good | Development/Testing | | |
| | `deepseek-ai/DeepSeek-R1-0528-Qwen3-8B` | ~16GB | π Slow | β Excellent | Production | | |
| | `unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit` | ~7GB | π Medium | β Excellent | Production (4-bit) | | |
| | `HuggingFaceH4/zephyr-7b-beta` | ~14GB | π Slow | β Excellent | Chat/Conversation | | |
| | `codellama/CodeLlama-7b-Instruct-hf` | ~13GB | π Slow | β Good | Code Generation | | |
| --- | |
| ## π οΈ Troubleshooting | |
| ### **Model Not Found** | |
| ```bash | |
| # Verify model exists on HuggingFace | |
| ./gradio_env/bin/python -c " | |
| from huggingface_hub import HfApi | |
| api = HfApi() | |
| try: | |
| info = api.model_info('your-model-name') | |
| print(f'β Model exists: {info.id}') | |
| except: | |
| print('β Model not found') | |
| " | |
| ``` | |
| ### **Memory Issues** | |
| ```bash | |
| # Use smaller model for limited RAM | |
| export AI_MODEL="microsoft/DialoGPT-medium" # ~355MB | |
| # or | |
| export AI_MODEL="distilgpt2" # ~82MB | |
| ``` | |
| ### **Authentication Issues** | |
| ```bash | |
| # Set HuggingFace token for private models | |
| export HF_TOKEN="hf_your_token_here" | |
| ``` | |
| --- | |
| ## π― Quick Switch Commands | |
| ```bash | |
| # Quick switch to development mode | |
| export AI_MODEL="microsoft/DialoGPT-medium" && ./gradio_env/bin/python backend_service.py | |
| # Quick switch to production mode | |
| export AI_MODEL="deepseek-ai/DeepSeek-R1-0528-Qwen3-8B" && ./gradio_env/bin/python backend_service.py | |
| # Quick switch with custom vision model | |
| export AI_MODEL="microsoft/DialoGPT-medium" AI_VISION="nlpconnect/vit-gpt2-image-captioning" && ./gradio_env/bin/python backend_service.py | |
| ``` | |
| --- | |
| ## β Summary | |
| - **Environment Variable**: `AI_MODEL` controls the main text generation model | |
| - **Default**: `deepseek-ai/DeepSeek-R1-0528-Qwen3-8B` (your original preference) | |
| - **Alternative**: `microsoft/DialoGPT-medium` (faster for development) | |
| - **Vision Model**: `VISION_MODEL` controls image processing model | |
| - **No Code Changes**: Switch models by changing environment variables only | |
| **Your original DeepSeek-R1 model is still the default** - I simply made it configurable so you can easily switch when needed! | |