|
|
---
|
|
|
title: Multi-Model Replicate OpenAI API
|
|
|
emoji: π€
|
|
|
colorFrom: blue
|
|
|
colorTo: purple
|
|
|
sdk: docker
|
|
|
app_port: 7860
|
|
|
suggested_hardware: cpu-basic
|
|
|
tags:
|
|
|
- openai
|
|
|
- claude
|
|
|
- gpt
|
|
|
- replicate
|
|
|
- api
|
|
|
- multi-model
|
|
|
- streaming
|
|
|
- function-calling
|
|
|
---
|
|
|
|
|
|
# π Multi-Model Replicate OpenAI API - Hugging Face Spaces
|
|
|
|
|
|
Deploy a complete OpenAI-compatible API with 7 AI models (Claude & GPT) to Hugging Face Spaces.
|
|
|
|
|
|
## π€ Supported Models
|
|
|
|
|
|
### Anthropic Claude Models
|
|
|
- `claude-4-sonnet` - Latest Claude 4 Sonnet (Most Capable)
|
|
|
- `claude-3.7-sonnet` - Claude 3.7 Sonnet
|
|
|
- `claude-3.5-sonnet` - Claude 3.5 Sonnet (Balanced)
|
|
|
- `claude-3.5-haiku` - Claude 3.5 Haiku (Fastest)
|
|
|
|
|
|
### OpenAI GPT Models
|
|
|
- `gpt-4.1` - Latest GPT-4.1
|
|
|
- `gpt-4.1-mini` - GPT-4.1 Mini (Cost-Effective)
|
|
|
- `gpt-4.1-nano` - GPT-4.1 Nano (Ultra-Fast)
|
|
|
|
|
|
## β¨ Features
|
|
|
|
|
|
- π― **100% OpenAI Compatible** - Drop-in replacement
|
|
|
- π **Streaming Support** - Real-time responses
|
|
|
- π§ **Function Calling** - Tool/function calling
|
|
|
- π **Secure** - Obfuscated API keys
|
|
|
- π **Monitoring** - Health checks & stats
|
|
|
- π **Multi-Model** - 7 models in one API
|
|
|
|
|
|
## π Deploy to Hugging Face Spaces
|
|
|
|
|
|
### Step 1: Create New Space
|
|
|
1. Go to [huggingface.co/spaces](https://huggingface.co/spaces)
|
|
|
2. Click **"Create new Space"**
|
|
|
3. Choose:
|
|
|
- **Name**: `replicate-multi-model-api`
|
|
|
- **SDK**: **Docker** β οΈ (Important!)
|
|
|
- **Hardware**: CPU Basic (free tier)
|
|
|
- **Visibility**: Public
|
|
|
|
|
|
### Step 2: Upload Files
|
|
|
Upload these files to your Space:
|
|
|
|
|
|
```
|
|
|
π Your Hugging Face Space:
|
|
|
βββ app.py β Upload replicate_server.py as app.py
|
|
|
βββ requirements.txt β Upload requirements.txt
|
|
|
βββ Dockerfile β Upload Dockerfile
|
|
|
βββ README.md β Upload this file as README.md
|
|
|
βββ test_all_models.py β Upload test_all_models.py (optional)
|
|
|
βββ quick_test.py β Upload quick_test.py (optional)
|
|
|
```
|
|
|
|
|
|
### Step 3: Set Environment Variables (Optional)
|
|
|
In your Space settings, you can set:
|
|
|
- `REPLICATE_API_TOKEN` - Your Replicate API token (if you want to use your own)
|
|
|
|
|
|
**Note**: The app includes an obfuscated token, so this is optional.
|
|
|
|
|
|
### Step 4: Deploy
|
|
|
- Hugging Face will automatically build and deploy
|
|
|
- Wait 5-10 minutes for build completion
|
|
|
- Your API will be live!
|
|
|
|
|
|
## π― Your API Endpoints
|
|
|
|
|
|
Once deployed at `https://your-username-replicate-multi-model-api.hf.space`:
|
|
|
|
|
|
### Main Endpoints
|
|
|
- `POST /v1/chat/completions` - Chat completions (all models)
|
|
|
- `GET /v1/models` - List all 7 models
|
|
|
- `GET /health` - Health check
|
|
|
|
|
|
### Alternative Endpoints
|
|
|
- `POST /chat/completions` - Alternative chat endpoint
|
|
|
- `GET /models` - Alternative models endpoint
|
|
|
|
|
|
## π§ͺ Test Your Deployment
|
|
|
|
|
|
### 1. Health Check
|
|
|
```bash
|
|
|
curl https://your-username-replicate-multi-model-api.hf.space/health
|
|
|
```
|
|
|
|
|
|
### 2. List Models
|
|
|
```bash
|
|
|
curl https://your-username-replicate-multi-model-api.hf.space/v1/models
|
|
|
```
|
|
|
|
|
|
### 3. Test Claude 4 Sonnet
|
|
|
```bash
|
|
|
curl -X POST https://your-username-replicate-multi-model-api.hf.space/v1/chat/completions \
|
|
|
-H "Content-Type: application/json" \
|
|
|
-d '{
|
|
|
"model": "claude-4-sonnet",
|
|
|
"messages": [
|
|
|
{"role": "user", "content": "Write a haiku about AI"}
|
|
|
],
|
|
|
"max_tokens": 100
|
|
|
}'
|
|
|
```
|
|
|
|
|
|
### 4. Test GPT-4.1 Mini
|
|
|
```bash
|
|
|
curl -X POST https://your-username-replicate-multi-model-api.hf.space/v1/chat/completions \
|
|
|
-H "Content-Type: application/json" \
|
|
|
-d '{
|
|
|
"model": "gpt-4.1-mini",
|
|
|
"messages": [
|
|
|
{"role": "user", "content": "Quick math: What is 15 * 23?"}
|
|
|
],
|
|
|
"stream": false
|
|
|
}'
|
|
|
```
|
|
|
|
|
|
### 5. Test Streaming
|
|
|
```bash
|
|
|
curl -X POST https://your-username-replicate-multi-model-api.hf.space/v1/chat/completions \
|
|
|
-H "Content-Type: application/json" \
|
|
|
-d '{
|
|
|
"model": "claude-3.5-haiku",
|
|
|
"messages": [
|
|
|
{"role": "user", "content": "Count from 1 to 10"}
|
|
|
],
|
|
|
"stream": true
|
|
|
}'
|
|
|
```
|
|
|
|
|
|
## π OpenAI SDK Compatibility
|
|
|
|
|
|
Your deployed API works with the OpenAI SDK:
|
|
|
|
|
|
```python
|
|
|
import openai
|
|
|
|
|
|
client = openai.OpenAI(
|
|
|
base_url="https://your-username-replicate-multi-model-api.hf.space/v1",
|
|
|
api_key="dummy" # Not required
|
|
|
)
|
|
|
|
|
|
# Use any of the 7 models
|
|
|
completion = client.chat.completions.create(
|
|
|
model="claude-3.5-sonnet",
|
|
|
messages=[
|
|
|
{"role": "user", "content": "Hello, world!"}
|
|
|
]
|
|
|
)
|
|
|
|
|
|
print(completion.choices[0].message.content)
|
|
|
```
|
|
|
|
|
|
## π Model Selection Guide
|
|
|
|
|
|
### For Different Use Cases:
|
|
|
|
|
|
**π§ Complex Reasoning & Analysis**
|
|
|
- `claude-4-sonnet` - Best for complex tasks, analysis, coding
|
|
|
|
|
|
**β‘ Speed & Quick Responses**
|
|
|
- `claude-3.5-haiku` - Fastest Claude model
|
|
|
- `gpt-4.1-nano` - Ultra-fast GPT model
|
|
|
|
|
|
**π° Cost-Effective**
|
|
|
- `gpt-4.1-mini` - Good balance of cost and capability
|
|
|
|
|
|
**π― General Purpose**
|
|
|
- `claude-3.5-sonnet` - Excellent all-around model
|
|
|
- `gpt-4.1` - Latest GPT capabilities
|
|
|
|
|
|
**π Writing & Creative Tasks**
|
|
|
- `claude-3.7-sonnet` - Great for creative writing
|
|
|
- `claude-3.5-sonnet` - Balanced creativity and logic
|
|
|
|
|
|
## π§ Configuration
|
|
|
|
|
|
### Environment Variables
|
|
|
- `PORT` - Server port (default: 7860 for HF)
|
|
|
- `HOST` - Server host (default: 0.0.0.0)
|
|
|
- `REPLICATE_API_TOKEN` - Your Replicate token (optional)
|
|
|
|
|
|
### Request Parameters
|
|
|
All models support:
|
|
|
- `max_tokens` - Maximum response tokens
|
|
|
- `temperature` - Creativity (0.0-2.0)
|
|
|
- `top_p` - Nucleus sampling
|
|
|
- `stream` - Enable streaming
|
|
|
- `tools` - Function calling tools
|
|
|
|
|
|
## π Expected Performance
|
|
|
|
|
|
### Response Times (approximate):
|
|
|
- **Claude 3.5 Haiku**: ~2-5 seconds
|
|
|
- **GPT-4.1 Nano**: ~2-4 seconds
|
|
|
- **GPT-4.1 Mini**: ~3-6 seconds
|
|
|
- **Claude 3.5 Sonnet**: ~4-8 seconds
|
|
|
- **Claude 3.7 Sonnet**: ~5-10 seconds
|
|
|
- **GPT-4.1**: ~6-12 seconds
|
|
|
- **Claude 4 Sonnet**: ~8-15 seconds
|
|
|
|
|
|
### Context Lengths:
|
|
|
- **Claude Models**: 200,000 tokens
|
|
|
- **GPT Models**: 128,000 tokens
|
|
|
|
|
|
## π Troubleshooting
|
|
|
|
|
|
### Build Issues
|
|
|
1. **Docker build fails**: Check Dockerfile syntax
|
|
|
2. **Dependencies fail**: Verify requirements.txt
|
|
|
3. **Port issues**: Ensure using port 7860
|
|
|
|
|
|
### Runtime Issues
|
|
|
1. **Health check fails**: Check server logs in HF
|
|
|
2. **Models not working**: Verify Replicate API access
|
|
|
3. **Slow responses**: Try faster models (haiku, nano)
|
|
|
|
|
|
### API Issues
|
|
|
1. **Model not found**: Check model name spelling
|
|
|
2. **Streaming broken**: Verify SSE support
|
|
|
3. **Function calling fails**: Check tool definition format
|
|
|
|
|
|
## β
Success Checklist
|
|
|
|
|
|
- [ ] Space created with Docker SDK
|
|
|
- [ ] All files uploaded correctly
|
|
|
- [ ] Build completes without errors
|
|
|
- [ ] Health endpoint returns 200
|
|
|
- [ ] Models endpoint lists 7 models
|
|
|
- [ ] At least one model responds correctly
|
|
|
- [ ] Streaming works
|
|
|
- [ ] OpenAI SDK compatibility verified
|
|
|
|
|
|
## π You're Live!
|
|
|
|
|
|
Once deployed, your API provides:
|
|
|
|
|
|
β
**7 AI Models** in one endpoint
|
|
|
β
**OpenAI Compatibility** for easy integration
|
|
|
β
**Streaming Support** for real-time responses
|
|
|
β
**Function Calling** for tool integration
|
|
|
β
**Global Access** via Hugging Face
|
|
|
β
**Free Hosting** on HF Spaces
|
|
|
|
|
|
## π Support
|
|
|
|
|
|
For issues:
|
|
|
1. Check Hugging Face Space logs
|
|
|
2. Test locally first: `python replicate_server.py`
|
|
|
3. Verify model names match supported list
|
|
|
4. Check Replicate API status
|
|
|
|
|
|
## π Example Applications
|
|
|
|
|
|
Your deployed API can power:
|
|
|
- **Chatbots** with multiple personality models
|
|
|
- **Code Assistants** using Claude for analysis
|
|
|
- **Writing Tools** with model selection
|
|
|
- **Research Tools** with different reasoning models
|
|
|
- **Customer Support** with fast response models
|
|
|
|
|
|
**Your Multi-Model API URL**:
|
|
|
`https://your-username-replicate-multi-model-api.hf.space`
|
|
|
|
|
|
π **Congratulations! You now have 7 AI models in one OpenAI-compatible API!** π
|
|
|
|