--- title: Multi-Model Replicate OpenAI API emoji: ๐Ÿค– colorFrom: blue colorTo: purple sdk: docker app_port: 7860 suggested_hardware: cpu-basic tags: - openai - claude - gpt - replicate - api - multi-model - streaming - function-calling --- # ๐Ÿš€ Multi-Model Replicate OpenAI API - Hugging Face Spaces Deploy a complete OpenAI-compatible API with 7 AI models (Claude & GPT) to Hugging Face Spaces. ## ๐Ÿค– Supported Models ### Anthropic Claude Models - `claude-4-sonnet` - Latest Claude 4 Sonnet (Most Capable) - `claude-3.7-sonnet` - Claude 3.7 Sonnet - `claude-3.5-sonnet` - Claude 3.5 Sonnet (Balanced) - `claude-3.5-haiku` - Claude 3.5 Haiku (Fastest) ### OpenAI GPT Models - `gpt-4.1` - Latest GPT-4.1 - `gpt-4.1-mini` - GPT-4.1 Mini (Cost-Effective) - `gpt-4.1-nano` - GPT-4.1 Nano (Ultra-Fast) ## โœจ Features - ๐ŸŽฏ **100% OpenAI Compatible** - Drop-in replacement - ๐ŸŒŠ **Streaming Support** - Real-time responses - ๐Ÿ”ง **Function Calling** - Tool/function calling - ๐Ÿ” **Secure** - Obfuscated API keys - ๐Ÿ“Š **Monitoring** - Health checks & stats - ๐Ÿš€ **Multi-Model** - 7 models in one API ## ๐Ÿš€ Deploy to Hugging Face Spaces ### Step 1: Create New Space 1. Go to [huggingface.co/spaces](https://huggingface.co/spaces) 2. Click **"Create new Space"** 3. Choose: - **Name**: `replicate-multi-model-api` - **SDK**: **Docker** โš ๏ธ (Important!) - **Hardware**: CPU Basic (free tier) - **Visibility**: Public ### Step 2: Upload Files Upload these files to your Space: ``` ๐Ÿ“ Your Hugging Face Space: โ”œโ”€โ”€ app.py โ† Upload replicate_server.py as app.py โ”œโ”€โ”€ requirements.txt โ† Upload requirements.txt โ”œโ”€โ”€ Dockerfile โ† Upload Dockerfile โ”œโ”€โ”€ README.md โ† Upload this file as README.md โ”œโ”€โ”€ test_all_models.py โ† Upload test_all_models.py (optional) โ””โ”€โ”€ quick_test.py โ† Upload quick_test.py (optional) ``` ### Step 3: Set Environment Variables (Optional) In your Space settings, you can set: - `REPLICATE_API_TOKEN` - Your Replicate API token (if you want to use your own) **Note**: The app includes an obfuscated token, so this is optional. ### Step 4: Deploy - Hugging Face will automatically build and deploy - Wait 5-10 minutes for build completion - Your API will be live! ## ๐ŸŽฏ Your API Endpoints Once deployed at `https://your-username-replicate-multi-model-api.hf.space`: ### Main Endpoints - `POST /v1/chat/completions` - Chat completions (all models) - `GET /v1/models` - List all 7 models - `GET /health` - Health check ### Alternative Endpoints - `POST /chat/completions` - Alternative chat endpoint - `GET /models` - Alternative models endpoint ## ๐Ÿงช Test Your Deployment ### 1. Health Check ```bash curl https://your-username-replicate-multi-model-api.hf.space/health ``` ### 2. List Models ```bash curl https://your-username-replicate-multi-model-api.hf.space/v1/models ``` ### 3. Test Claude 4 Sonnet ```bash curl -X POST https://your-username-replicate-multi-model-api.hf.space/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "claude-4-sonnet", "messages": [ {"role": "user", "content": "Write a haiku about AI"} ], "max_tokens": 100 }' ``` ### 4. Test GPT-4.1 Mini ```bash curl -X POST https://your-username-replicate-multi-model-api.hf.space/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-4.1-mini", "messages": [ {"role": "user", "content": "Quick math: What is 15 * 23?"} ], "stream": false }' ``` ### 5. Test Streaming ```bash curl -X POST https://your-username-replicate-multi-model-api.hf.space/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "claude-3.5-haiku", "messages": [ {"role": "user", "content": "Count from 1 to 10"} ], "stream": true }' ``` ## ๐Ÿ”Œ OpenAI SDK Compatibility Your deployed API works with the OpenAI SDK: ```python import openai client = openai.OpenAI( base_url="https://your-username-replicate-multi-model-api.hf.space/v1", api_key="dummy" # Not required ) # Use any of the 7 models completion = client.chat.completions.create( model="claude-3.5-sonnet", messages=[ {"role": "user", "content": "Hello, world!"} ] ) print(completion.choices[0].message.content) ``` ## ๐Ÿ“Š Model Selection Guide ### For Different Use Cases: **๐Ÿง  Complex Reasoning & Analysis** - `claude-4-sonnet` - Best for complex tasks, analysis, coding **โšก Speed & Quick Responses** - `claude-3.5-haiku` - Fastest Claude model - `gpt-4.1-nano` - Ultra-fast GPT model **๐Ÿ’ฐ Cost-Effective** - `gpt-4.1-mini` - Good balance of cost and capability **๐ŸŽฏ General Purpose** - `claude-3.5-sonnet` - Excellent all-around model - `gpt-4.1` - Latest GPT capabilities **๐Ÿ“ Writing & Creative Tasks** - `claude-3.7-sonnet` - Great for creative writing - `claude-3.5-sonnet` - Balanced creativity and logic ## ๐Ÿ”ง Configuration ### Environment Variables - `PORT` - Server port (default: 7860 for HF) - `HOST` - Server host (default: 0.0.0.0) - `REPLICATE_API_TOKEN` - Your Replicate token (optional) ### Request Parameters All models support: - `max_tokens` - Maximum response tokens - `temperature` - Creativity (0.0-2.0) - `top_p` - Nucleus sampling - `stream` - Enable streaming - `tools` - Function calling tools ## ๐Ÿ“ˆ Expected Performance ### Response Times (approximate): - **Claude 3.5 Haiku**: ~2-5 seconds - **GPT-4.1 Nano**: ~2-4 seconds - **GPT-4.1 Mini**: ~3-6 seconds - **Claude 3.5 Sonnet**: ~4-8 seconds - **Claude 3.7 Sonnet**: ~5-10 seconds - **GPT-4.1**: ~6-12 seconds - **Claude 4 Sonnet**: ~8-15 seconds ### Context Lengths: - **Claude Models**: 200,000 tokens - **GPT Models**: 128,000 tokens ## ๐Ÿ†˜ Troubleshooting ### Build Issues 1. **Docker build fails**: Check Dockerfile syntax 2. **Dependencies fail**: Verify requirements.txt 3. **Port issues**: Ensure using port 7860 ### Runtime Issues 1. **Health check fails**: Check server logs in HF 2. **Models not working**: Verify Replicate API access 3. **Slow responses**: Try faster models (haiku, nano) ### API Issues 1. **Model not found**: Check model name spelling 2. **Streaming broken**: Verify SSE support 3. **Function calling fails**: Check tool definition format ## โœ… Success Checklist - [ ] Space created with Docker SDK - [ ] All files uploaded correctly - [ ] Build completes without errors - [ ] Health endpoint returns 200 - [ ] Models endpoint lists 7 models - [ ] At least one model responds correctly - [ ] Streaming works - [ ] OpenAI SDK compatibility verified ## ๐ŸŽ‰ You're Live! Once deployed, your API provides: โœ… **7 AI Models** in one endpoint โœ… **OpenAI Compatibility** for easy integration โœ… **Streaming Support** for real-time responses โœ… **Function Calling** for tool integration โœ… **Global Access** via Hugging Face โœ… **Free Hosting** on HF Spaces ## ๐Ÿ“ž Support For issues: 1. Check Hugging Face Space logs 2. Test locally first: `python replicate_server.py` 3. Verify model names match supported list 4. Check Replicate API status ## ๐Ÿš€ Example Applications Your deployed API can power: - **Chatbots** with multiple personality models - **Code Assistants** using Claude for analysis - **Writing Tools** with model selection - **Research Tools** with different reasoning models - **Customer Support** with fast response models **Your Multi-Model API URL**: `https://your-username-replicate-multi-model-api.hf.space` ๐ŸŽŠ **Congratulations! You now have 7 AI models in one OpenAI-compatible API!** ๐ŸŽŠ