# ๐Ÿ”ง Sema Chat API Configuration Guide ## ๐ŸŽฏ **MiniMax Integration** ### Configuration ```bash MODEL_TYPE=minimax MODEL_NAME=MiniMax-M1 MINIMAX_API_KEY=your_minimax_api_key MINIMAX_API_URL=https://api.minimax.chat/v1/text/chatcompletion_v2 MINIMAX_MODEL_VERSION=abab6.5s-chat ``` ### Features - โœ… **Reasoning Capabilities**: Shows model's thinking process - โœ… **Streaming Support**: Real-time response generation - โœ… **Custom API Integration**: Direct integration with MiniMax API - โœ… **Reasoning Content**: Displays both reasoning and final response ### Example Usage ```bash curl -X POST "http://localhost:7860/api/v1/chat" \ -H "Content-Type: application/json" \ -d '{ "message": "Solve this math problem: 2x + 5 = 15", "session_id": "minimax-test" }' ``` **Response includes reasoning:** ```json { "message": "[Reasoning: I need to solve for x. First, subtract 5 from both sides: 2x = 10. Then divide by 2: x = 5]\n\nTo solve 2x + 5 = 15:\n1. Subtract 5 from both sides: 2x = 10\n2. Divide by 2: x = 5\n\nTherefore, x = 5.", "session_id": "minimax-test", "model_name": "MiniMax-M1" } ``` --- ## ๐Ÿ”ฅ **Gemma Integration** ### Option 1: Local Gemma (Free Tier) ```bash MODEL_TYPE=local MODEL_NAME=google/gemma-2b-it DEVICE=auto ``` ### Option 2: Gemma via HuggingFace API ```bash MODEL_TYPE=hf_api MODEL_NAME=google/gemma-2b-it HF_API_TOKEN=your_hf_token ``` ### Option 3: Gemma via Google AI Studio ```bash MODEL_TYPE=google MODEL_NAME=gemma-2-9b-it GOOGLE_API_KEY=your_google_api_key ``` ### Available Gemma Models - **gemma-2-2b-it** (2B parameters, instruction-tuned) - **gemma-2-9b-it** (9B parameters, instruction-tuned) - **gemma-2-27b-it** (27B parameters, instruction-tuned) - **gemini-1.5-flash** (Fast, efficient) - **gemini-1.5-pro** (Most capable) ### Example Usage ```bash curl -X POST "http://localhost:7860/api/v1/chat" \ -H "Content-Type: application/json" \ -d '{ "message": "Explain quantum computing in simple terms", "session_id": "gemma-test", "temperature": 0.7 }' ``` --- ## ๐Ÿš€ **Complete Backend Comparison** | Backend | Cost | Setup | Streaming | Special Features | |---------|------|-------|-----------|------------------| | **Local** | Free | Medium | โœ… | Offline, Private | | **HF API** | Free/Paid | Easy | โœ… | Many models | | **OpenAI** | Paid | Easy | โœ… | High quality | | **Anthropic** | Paid | Easy | โœ… | Long context | | **MiniMax** | Paid | Easy | โœ… | Reasoning | | **Google** | Free/Paid | Easy | โœ… | Multimodal | --- ## ๐Ÿ”ง **Configuration Examples** ### Free Tier Setup (HuggingFace Spaces) ```bash # Best for free deployment MODEL_TYPE=local MODEL_NAME=TinyLlama/TinyLlama-1.1B-Chat-v1.0 DEVICE=cpu MAX_NEW_TOKENS=256 TEMPERATURE=0.7 ``` ### Production Setup (API-based) ```bash # Best for production with fallbacks MODEL_TYPE=openai MODEL_NAME=gpt-3.5-turbo OPENAI_API_KEY=your_key # Fallback configuration FALLBACK_MODEL_TYPE=hf_api FALLBACK_MODEL_NAME=microsoft/DialoGPT-medium HF_API_TOKEN=your_token ``` ### Research Setup (Multiple Models) ```bash # Primary: Latest Gemini MODEL_TYPE=google MODEL_NAME=gemini-1.5-pro GOOGLE_API_KEY=your_key # For reasoning tasks REASONING_MODEL_TYPE=minimax REASONING_MODEL_NAME=MiniMax-M1 MINIMAX_API_KEY=your_key ``` --- ## ๐ŸŽฏ **Model Selection Guide** ### For **Free Deployment** (HuggingFace Spaces): 1. **TinyLlama/TinyLlama-1.1B-Chat-v1.0** - Smallest, fastest 2. **microsoft/DialoGPT-medium** - Better conversations 3. **Qwen/Qwen2.5-0.5B-Instruct** - Good instruction following ### For **Reasoning Tasks**: 1. **MiniMax M1** - Shows thinking process 2. **Claude-3 Opus** - Deep reasoning 3. **GPT-4** - Complex problem solving ### For **Conversations**: 1. **Claude-3 Haiku** - Natural, fast 2. **GPT-3.5-turbo** - Balanced cost/quality 3. **Gemini-1.5-flash** - Fast, capable ### For **Multilingual**: 1. **Gemma-2-9b-it** - Good multilingual 2. **GPT-4** - Excellent multilingual 3. **Local models** - Depends on training --- ## ๐Ÿ”„ **Dynamic Model Switching** The API supports runtime model switching: ```python # Switch to MiniMax for reasoning POST /api/v1/model/switch { "model_type": "minimax", "model_name": "MiniMax-M1" } # Switch back to fast model POST /api/v1/model/switch { "model_type": "google", "model_name": "gemini-1.5-flash" } ``` --- ## ๐Ÿงช **Testing Your Setup** ### Test All Backends ```bash python examples/test_backends.py ``` ### Test Specific Backend ```bash # Test MiniMax MINIMAX_API_KEY=your_key python -c " import asyncio from app.services.model_backends.minimax_api import MiniMaxAPIBackend from app.models.schemas import ChatMessage async def test(): backend = MiniMaxAPIBackend('MiniMax-M1', api_key='your_key', api_url='your_url') await backend.load_model() messages = [ChatMessage(role='user', content='Hello')] response = await backend.generate_response(messages) print(response.message) asyncio.run(test()) " ``` ### Test Gemma ```bash # Test local Gemma MODEL_TYPE=local MODEL_NAME=google/gemma-2b-it python tests/test_api.py # Test Gemma via Google AI MODEL_TYPE=google MODEL_NAME=gemma-2-9b-it GOOGLE_API_KEY=your_key python tests/test_api.py ``` --- ## ๐Ÿš€ **Deployment Examples** ### HuggingFace Spaces (Free) ```yaml # In your Space settings MODEL_TYPE: local MODEL_NAME: TinyLlama/TinyLlama-1.1B-Chat-v1.0 DEVICE: cpu ``` ### HuggingFace Spaces (With API) ```yaml # In your Space settings MODEL_TYPE: google MODEL_NAME: gemma-2-9b-it GOOGLE_API_KEY: your_secret_key ``` ### Docker Deployment ```bash docker run -e MODEL_TYPE=minimax \ -e MINIMAX_API_KEY=your_key \ -e MINIMAX_API_URL=your_url \ -p 8000:7860 \ sema-chat-api ``` --- ## ๐Ÿ’ก **Pro Tips** 1. **Start Small**: Begin with TinyLlama for testing 2. **Use APIs for Production**: More reliable than local models 3. **Enable Streaming**: Better user experience 4. **Monitor Usage**: Track API costs and limits 5. **Have Fallbacks**: Configure multiple backends 6. **Test Thoroughly**: Use the provided test scripts --- ## ๐Ÿ”— **Getting API Keys** - **HuggingFace**: https://huggingface.co/settings/tokens - **OpenAI**: https://platform.openai.com/api-keys - **Anthropic**: https://console.anthropic.com/ - **Google AI**: https://aistudio.google.com/ - **MiniMax**: Contact MiniMax for API access --- **Your architecture is now ready for both MiniMax and Gemma! ๐ŸŽ‰**