Spaces:
Paused
Paused
| title: Multi-Model Replicate OpenAI API | |
| emoji: π€ | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: docker | |
| app_port: 7860 | |
| suggested_hardware: cpu-basic | |
| tags: | |
| - openai | |
| - claude | |
| - gpt | |
| - replicate | |
| - api | |
| - multi-model | |
| - streaming | |
| - function-calling | |
| # π Multi-Model Replicate OpenAI API - Hugging Face Spaces | |
| Deploy a complete OpenAI-compatible API with 7 AI models (Claude & GPT) to Hugging Face Spaces. | |
| ## π€ Supported Models | |
| ### Anthropic Claude Models | |
| - `claude-4-sonnet` - Latest Claude 4 Sonnet (Most Capable) | |
| - `claude-3.7-sonnet` - Claude 3.7 Sonnet | |
| - `claude-3.5-sonnet` - Claude 3.5 Sonnet (Balanced) | |
| - `claude-3.5-haiku` - Claude 3.5 Haiku (Fastest) | |
| ### OpenAI GPT Models | |
| - `gpt-4.1` - Latest GPT-4.1 | |
| - `gpt-4.1-mini` - GPT-4.1 Mini (Cost-Effective) | |
| - `gpt-4.1-nano` - GPT-4.1 Nano (Ultra-Fast) | |
| ## β¨ Features | |
| - π― **100% OpenAI Compatible** - Drop-in replacement | |
| - π **Streaming Support** - Real-time responses | |
| - π§ **Function Calling** - Tool/function calling | |
| - π **Secure** - Obfuscated API keys | |
| - π **Monitoring** - Health checks & stats | |
| - π **Multi-Model** - 7 models in one API | |
| ## π Deploy to Hugging Face Spaces | |
| ### Step 1: Create New Space | |
| 1. Go to [huggingface.co/spaces](https://huggingface.co/spaces) | |
| 2. Click **"Create new Space"** | |
| 3. Choose: | |
| - **Name**: `replicate-multi-model-api` | |
| - **SDK**: **Docker** β οΈ (Important!) | |
| - **Hardware**: CPU Basic (free tier) | |
| - **Visibility**: Public | |
| ### Step 2: Upload Files | |
| Upload these files to your Space: | |
| ``` | |
| π Your Hugging Face Space: | |
| βββ app.py β Upload replicate_server.py as app.py | |
| βββ requirements.txt β Upload requirements.txt | |
| βββ Dockerfile β Upload Dockerfile | |
| βββ README.md β Upload this file as README.md | |
| βββ test_all_models.py β Upload test_all_models.py (optional) | |
| βββ quick_test.py β Upload quick_test.py (optional) | |
| ``` | |
| ### Step 3: Set Environment Variables (Optional) | |
| In your Space settings, you can set: | |
| - `REPLICATE_API_TOKEN` - Your Replicate API token (if you want to use your own) | |
| **Note**: The app includes an obfuscated token, so this is optional. | |
| ### Step 4: Deploy | |
| - Hugging Face will automatically build and deploy | |
| - Wait 5-10 minutes for build completion | |
| - Your API will be live! | |
| ## π― Your API Endpoints | |
| Once deployed at `https://your-username-replicate-multi-model-api.hf.space`: | |
| ### Main Endpoints | |
| - `POST /v1/chat/completions` - Chat completions (all models) | |
| - `GET /v1/models` - List all 7 models | |
| - `GET /health` - Health check | |
| ### Alternative Endpoints | |
| - `POST /chat/completions` - Alternative chat endpoint | |
| - `GET /models` - Alternative models endpoint | |
| ## π§ͺ Test Your Deployment | |
| ### 1. Health Check | |
| ```bash | |
| curl https://your-username-replicate-multi-model-api.hf.space/health | |
| ``` | |
| ### 2. List Models | |
| ```bash | |
| curl https://your-username-replicate-multi-model-api.hf.space/v1/models | |
| ``` | |
| ### 3. Test Claude 4 Sonnet | |
| ```bash | |
| curl -X POST https://your-username-replicate-multi-model-api.hf.space/v1/chat/completions \ | |
| -H "Content-Type: application/json" \ | |
| -d '{ | |
| "model": "claude-4-sonnet", | |
| "messages": [ | |
| {"role": "user", "content": "Write a haiku about AI"} | |
| ], | |
| "max_tokens": 100 | |
| }' | |
| ``` | |
| ### 4. Test GPT-4.1 Mini | |
| ```bash | |
| curl -X POST https://your-username-replicate-multi-model-api.hf.space/v1/chat/completions \ | |
| -H "Content-Type: application/json" \ | |
| -d '{ | |
| "model": "gpt-4.1-mini", | |
| "messages": [ | |
| {"role": "user", "content": "Quick math: What is 15 * 23?"} | |
| ], | |
| "stream": false | |
| }' | |
| ``` | |
| ### 5. Test Streaming | |
| ```bash | |
| curl -X POST https://your-username-replicate-multi-model-api.hf.space/v1/chat/completions \ | |
| -H "Content-Type: application/json" \ | |
| -d '{ | |
| "model": "claude-3.5-haiku", | |
| "messages": [ | |
| {"role": "user", "content": "Count from 1 to 10"} | |
| ], | |
| "stream": true | |
| }' | |
| ``` | |
| ## π OpenAI SDK Compatibility | |
| Your deployed API works with the OpenAI SDK: | |
| ```python | |
| import openai | |
| client = openai.OpenAI( | |
| base_url="https://your-username-replicate-multi-model-api.hf.space/v1", | |
| api_key="dummy" # Not required | |
| ) | |
| # Use any of the 7 models | |
| completion = client.chat.completions.create( | |
| model="claude-3.5-sonnet", | |
| messages=[ | |
| {"role": "user", "content": "Hello, world!"} | |
| ] | |
| ) | |
| print(completion.choices[0].message.content) | |
| ``` | |
| ## π Model Selection Guide | |
| ### For Different Use Cases: | |
| **π§ Complex Reasoning & Analysis** | |
| - `claude-4-sonnet` - Best for complex tasks, analysis, coding | |
| **β‘ Speed & Quick Responses** | |
| - `claude-3.5-haiku` - Fastest Claude model | |
| - `gpt-4.1-nano` - Ultra-fast GPT model | |
| **π° Cost-Effective** | |
| - `gpt-4.1-mini` - Good balance of cost and capability | |
| **π― General Purpose** | |
| - `claude-3.5-sonnet` - Excellent all-around model | |
| - `gpt-4.1` - Latest GPT capabilities | |
| **π Writing & Creative Tasks** | |
| - `claude-3.7-sonnet` - Great for creative writing | |
| - `claude-3.5-sonnet` - Balanced creativity and logic | |
| ## π§ Configuration | |
| ### Environment Variables | |
| - `PORT` - Server port (default: 7860 for HF) | |
| - `HOST` - Server host (default: 0.0.0.0) | |
| - `REPLICATE_API_TOKEN` - Your Replicate token (optional) | |
| ### Request Parameters | |
| All models support: | |
| - `max_tokens` - Maximum response tokens | |
| - `temperature` - Creativity (0.0-2.0) | |
| - `top_p` - Nucleus sampling | |
| - `stream` - Enable streaming | |
| - `tools` - Function calling tools | |
| ## π Expected Performance | |
| ### Response Times (approximate): | |
| - **Claude 3.5 Haiku**: ~2-5 seconds | |
| - **GPT-4.1 Nano**: ~2-4 seconds | |
| - **GPT-4.1 Mini**: ~3-6 seconds | |
| - **Claude 3.5 Sonnet**: ~4-8 seconds | |
| - **Claude 3.7 Sonnet**: ~5-10 seconds | |
| - **GPT-4.1**: ~6-12 seconds | |
| - **Claude 4 Sonnet**: ~8-15 seconds | |
| ### Context Lengths: | |
| - **Claude Models**: 200,000 tokens | |
| - **GPT Models**: 128,000 tokens | |
| ## π Troubleshooting | |
| ### Build Issues | |
| 1. **Docker build fails**: Check Dockerfile syntax | |
| 2. **Dependencies fail**: Verify requirements.txt | |
| 3. **Port issues**: Ensure using port 7860 | |
| ### Runtime Issues | |
| 1. **Health check fails**: Check server logs in HF | |
| 2. **Models not working**: Verify Replicate API access | |
| 3. **Slow responses**: Try faster models (haiku, nano) | |
| ### API Issues | |
| 1. **Model not found**: Check model name spelling | |
| 2. **Streaming broken**: Verify SSE support | |
| 3. **Function calling fails**: Check tool definition format | |
| ## β Success Checklist | |
| - [ ] Space created with Docker SDK | |
| - [ ] All files uploaded correctly | |
| - [ ] Build completes without errors | |
| - [ ] Health endpoint returns 200 | |
| - [ ] Models endpoint lists 7 models | |
| - [ ] At least one model responds correctly | |
| - [ ] Streaming works | |
| - [ ] OpenAI SDK compatibility verified | |
| ## π You're Live! | |
| Once deployed, your API provides: | |
| β **7 AI Models** in one endpoint | |
| β **OpenAI Compatibility** for easy integration | |
| β **Streaming Support** for real-time responses | |
| β **Function Calling** for tool integration | |
| β **Global Access** via Hugging Face | |
| β **Free Hosting** on HF Spaces | |
| ## π Support | |
| For issues: | |
| 1. Check Hugging Face Space logs | |
| 2. Test locally first: `python replicate_server.py` | |
| 3. Verify model names match supported list | |
| 4. Check Replicate API status | |
| ## π Example Applications | |
| Your deployed API can power: | |
| - **Chatbots** with multiple personality models | |
| - **Code Assistants** using Claude for analysis | |
| - **Writing Tools** with model selection | |
| - **Research Tools** with different reasoning models | |
| - **Customer Support** with fast response models | |
| **Your Multi-Model API URL**: | |
| `https://your-username-replicate-multi-model-api.hf.space` | |
| π **Congratulations! You now have 7 AI models in one OpenAI-compatible API!** π | |