title: Multi-Model Replicate OpenAI API
emoji: π€
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
suggested_hardware: cpu-basic
tags:
- openai
- claude
- gpt
- replicate
- api
- multi-model
- streaming
- function-calling
π Multi-Model Replicate OpenAI API - Hugging Face Spaces
Deploy a complete OpenAI-compatible API with 7 AI models (Claude & GPT) to Hugging Face Spaces.
π€ Supported Models
Anthropic Claude Models
claude-4-sonnet- Latest Claude 4 Sonnet (Most Capable)claude-3.7-sonnet- Claude 3.7 Sonnetclaude-3.5-sonnet- Claude 3.5 Sonnet (Balanced)claude-3.5-haiku- Claude 3.5 Haiku (Fastest)
OpenAI GPT Models
gpt-4.1- Latest GPT-4.1gpt-4.1-mini- GPT-4.1 Mini (Cost-Effective)gpt-4.1-nano- GPT-4.1 Nano (Ultra-Fast)
β¨ Features
- π― 100% OpenAI Compatible - Drop-in replacement
- π Streaming Support - Real-time responses
- π§ Function Calling - Tool/function calling
- π Secure - Obfuscated API keys
- π Monitoring - Health checks & stats
- π Multi-Model - 7 models in one API
π Deploy to Hugging Face Spaces
Step 1: Create New Space
- Go to huggingface.co/spaces
- Click "Create new Space"
- Choose:
- Name:
replicate-multi-model-api - SDK: Docker β οΈ (Important!)
- Hardware: CPU Basic (free tier)
- Visibility: Public
- Name:
Step 2: Upload Files
Upload these files to your Space:
π Your Hugging Face Space:
βββ app.py β Upload replicate_server.py as app.py
βββ requirements.txt β Upload requirements.txt
βββ Dockerfile β Upload Dockerfile
βββ README.md β Upload this file as README.md
βββ test_all_models.py β Upload test_all_models.py (optional)
βββ quick_test.py β Upload quick_test.py (optional)
Step 3: Set Environment Variables (Optional)
In your Space settings, you can set:
REPLICATE_API_TOKEN- Your Replicate API token (if you want to use your own)
Note: The app includes an obfuscated token, so this is optional.
Step 4: Deploy
- Hugging Face will automatically build and deploy
- Wait 5-10 minutes for build completion
- Your API will be live!
π― Your API Endpoints
Once deployed at https://your-username-replicate-multi-model-api.hf.space:
Main Endpoints
POST /v1/chat/completions- Chat completions (all models)GET /v1/models- List all 7 modelsGET /health- Health check
Alternative Endpoints
POST /chat/completions- Alternative chat endpointGET /models- Alternative models endpoint
π§ͺ Test Your Deployment
1. Health Check
curl https://your-username-replicate-multi-model-api.hf.space/health
2. List Models
curl https://your-username-replicate-multi-model-api.hf.space/v1/models
3. Test Claude 4 Sonnet
curl -X POST https://your-username-replicate-multi-model-api.hf.space/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "claude-4-sonnet",
"messages": [
{"role": "user", "content": "Write a haiku about AI"}
],
"max_tokens": 100
}'
4. Test GPT-4.1 Mini
curl -X POST https://your-username-replicate-multi-model-api.hf.space/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4.1-mini",
"messages": [
{"role": "user", "content": "Quick math: What is 15 * 23?"}
],
"stream": false
}'
5. Test Streaming
curl -X POST https://your-username-replicate-multi-model-api.hf.space/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "claude-3.5-haiku",
"messages": [
{"role": "user", "content": "Count from 1 to 10"}
],
"stream": true
}'
π OpenAI SDK Compatibility
Your deployed API works with the OpenAI SDK:
import openai
client = openai.OpenAI(
base_url="https://your-username-replicate-multi-model-api.hf.space/v1",
api_key="dummy" # Not required
)
# Use any of the 7 models
completion = client.chat.completions.create(
model="claude-3.5-sonnet",
messages=[
{"role": "user", "content": "Hello, world!"}
]
)
print(completion.choices[0].message.content)
π Model Selection Guide
For Different Use Cases:
π§ Complex Reasoning & Analysis
claude-4-sonnet- Best for complex tasks, analysis, coding
β‘ Speed & Quick Responses
claude-3.5-haiku- Fastest Claude modelgpt-4.1-nano- Ultra-fast GPT model
π° Cost-Effective
gpt-4.1-mini- Good balance of cost and capability
π― General Purpose
claude-3.5-sonnet- Excellent all-around modelgpt-4.1- Latest GPT capabilities
π Writing & Creative Tasks
claude-3.7-sonnet- Great for creative writingclaude-3.5-sonnet- Balanced creativity and logic
π§ Configuration
Environment Variables
PORT- Server port (default: 7860 for HF)HOST- Server host (default: 0.0.0.0)REPLICATE_API_TOKEN- Your Replicate token (optional)
Request Parameters
All models support:
max_tokens- Maximum response tokenstemperature- Creativity (0.0-2.0)top_p- Nucleus samplingstream- Enable streamingtools- Function calling tools
π Expected Performance
Response Times (approximate):
- Claude 3.5 Haiku: ~2-5 seconds
- GPT-4.1 Nano: ~2-4 seconds
- GPT-4.1 Mini: ~3-6 seconds
- Claude 3.5 Sonnet: ~4-8 seconds
- Claude 3.7 Sonnet: ~5-10 seconds
- GPT-4.1: ~6-12 seconds
- Claude 4 Sonnet: ~8-15 seconds
Context Lengths:
- Claude Models: 200,000 tokens
- GPT Models: 128,000 tokens
π Troubleshooting
Build Issues
- Docker build fails: Check Dockerfile syntax
- Dependencies fail: Verify requirements.txt
- Port issues: Ensure using port 7860
Runtime Issues
- Health check fails: Check server logs in HF
- Models not working: Verify Replicate API access
- Slow responses: Try faster models (haiku, nano)
API Issues
- Model not found: Check model name spelling
- Streaming broken: Verify SSE support
- Function calling fails: Check tool definition format
β Success Checklist
- Space created with Docker SDK
- All files uploaded correctly
- Build completes without errors
- Health endpoint returns 200
- Models endpoint lists 7 models
- At least one model responds correctly
- Streaming works
- OpenAI SDK compatibility verified
π You're Live!
Once deployed, your API provides:
β
7 AI Models in one endpoint
β
OpenAI Compatibility for easy integration
β
Streaming Support for real-time responses
β
Function Calling for tool integration
β
Global Access via Hugging Face
β
Free Hosting on HF Spaces
π Support
For issues:
- Check Hugging Face Space logs
- Test locally first:
python replicate_server.py - Verify model names match supported list
- Check Replicate API status
π Example Applications
Your deployed API can power:
- Chatbots with multiple personality models
- Code Assistants using Claude for analysis
- Writing Tools with model selection
- Research Tools with different reasoning models
- Customer Support with fast response models
Your Multi-Model API URL:
https://your-username-replicate-multi-model-api.hf.space
π Congratulations! You now have 7 AI models in one OpenAI-compatible API! π