replcitae / README.md
Samfy001's picture
Upload 4 files
b0fe79f verified
---
title: Multi-Model Replicate OpenAI API
emoji: πŸ€–
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
suggested_hardware: cpu-basic
tags:
- openai
- claude
- gpt
- replicate
- api
- multi-model
- streaming
- function-calling
---
# πŸš€ Multi-Model Replicate OpenAI API - Hugging Face Spaces
Deploy a complete OpenAI-compatible API with 7 AI models (Claude & GPT) to Hugging Face Spaces.
## πŸ€– Supported Models
### Anthropic Claude Models
- `claude-4-sonnet` - Latest Claude 4 Sonnet (Most Capable)
- `claude-3.7-sonnet` - Claude 3.7 Sonnet
- `claude-3.5-sonnet` - Claude 3.5 Sonnet (Balanced)
- `claude-3.5-haiku` - Claude 3.5 Haiku (Fastest)
### OpenAI GPT Models
- `gpt-4.1` - Latest GPT-4.1
- `gpt-4.1-mini` - GPT-4.1 Mini (Cost-Effective)
- `gpt-4.1-nano` - GPT-4.1 Nano (Ultra-Fast)
## ✨ Features
- 🎯 **100% OpenAI Compatible** - Drop-in replacement
- 🌊 **Streaming Support** - Real-time responses
- πŸ”§ **Function Calling** - Tool/function calling
- πŸ” **Secure** - Obfuscated API keys
- πŸ“Š **Monitoring** - Health checks & stats
- πŸš€ **Multi-Model** - 7 models in one API
## πŸš€ Deploy to Hugging Face Spaces
### Step 1: Create New Space
1. Go to [huggingface.co/spaces](https://huggingface.co/spaces)
2. Click **"Create new Space"**
3. Choose:
- **Name**: `replicate-multi-model-api`
- **SDK**: **Docker** ⚠️ (Important!)
- **Hardware**: CPU Basic (free tier)
- **Visibility**: Public
### Step 2: Upload Files
Upload these files to your Space:
```
πŸ“ Your Hugging Face Space:
β”œβ”€β”€ app.py ← Upload replicate_server.py as app.py
β”œβ”€β”€ requirements.txt ← Upload requirements.txt
β”œβ”€β”€ Dockerfile ← Upload Dockerfile
β”œβ”€β”€ README.md ← Upload this file as README.md
β”œβ”€β”€ test_all_models.py ← Upload test_all_models.py (optional)
└── quick_test.py ← Upload quick_test.py (optional)
```
### Step 3: Set Environment Variables (Optional)
In your Space settings, you can set:
- `REPLICATE_API_TOKEN` - Your Replicate API token (if you want to use your own)
**Note**: The app includes an obfuscated token, so this is optional.
### Step 4: Deploy
- Hugging Face will automatically build and deploy
- Wait 5-10 minutes for build completion
- Your API will be live!
## 🎯 Your API Endpoints
Once deployed at `https://your-username-replicate-multi-model-api.hf.space`:
### Main Endpoints
- `POST /v1/chat/completions` - Chat completions (all models)
- `GET /v1/models` - List all 7 models
- `GET /health` - Health check
### Alternative Endpoints
- `POST /chat/completions` - Alternative chat endpoint
- `GET /models` - Alternative models endpoint
## πŸ§ͺ Test Your Deployment
### 1. Health Check
```bash
curl https://your-username-replicate-multi-model-api.hf.space/health
```
### 2. List Models
```bash
curl https://your-username-replicate-multi-model-api.hf.space/v1/models
```
### 3. Test Claude 4 Sonnet
```bash
curl -X POST https://your-username-replicate-multi-model-api.hf.space/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "claude-4-sonnet",
"messages": [
{"role": "user", "content": "Write a haiku about AI"}
],
"max_tokens": 100
}'
```
### 4. Test GPT-4.1 Mini
```bash
curl -X POST https://your-username-replicate-multi-model-api.hf.space/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4.1-mini",
"messages": [
{"role": "user", "content": "Quick math: What is 15 * 23?"}
],
"stream": false
}'
```
### 5. Test Streaming
```bash
curl -X POST https://your-username-replicate-multi-model-api.hf.space/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "claude-3.5-haiku",
"messages": [
{"role": "user", "content": "Count from 1 to 10"}
],
"stream": true
}'
```
## πŸ”Œ OpenAI SDK Compatibility
Your deployed API works with the OpenAI SDK:
```python
import openai
client = openai.OpenAI(
base_url="https://your-username-replicate-multi-model-api.hf.space/v1",
api_key="dummy" # Not required
)
# Use any of the 7 models
completion = client.chat.completions.create(
model="claude-3.5-sonnet",
messages=[
{"role": "user", "content": "Hello, world!"}
]
)
print(completion.choices[0].message.content)
```
## πŸ“Š Model Selection Guide
### For Different Use Cases:
**🧠 Complex Reasoning & Analysis**
- `claude-4-sonnet` - Best for complex tasks, analysis, coding
**⚑ Speed & Quick Responses**
- `claude-3.5-haiku` - Fastest Claude model
- `gpt-4.1-nano` - Ultra-fast GPT model
**πŸ’° Cost-Effective**
- `gpt-4.1-mini` - Good balance of cost and capability
**🎯 General Purpose**
- `claude-3.5-sonnet` - Excellent all-around model
- `gpt-4.1` - Latest GPT capabilities
**πŸ“ Writing & Creative Tasks**
- `claude-3.7-sonnet` - Great for creative writing
- `claude-3.5-sonnet` - Balanced creativity and logic
## πŸ”§ Configuration
### Environment Variables
- `PORT` - Server port (default: 7860 for HF)
- `HOST` - Server host (default: 0.0.0.0)
- `REPLICATE_API_TOKEN` - Your Replicate token (optional)
### Request Parameters
All models support:
- `max_tokens` - Maximum response tokens
- `temperature` - Creativity (0.0-2.0)
- `top_p` - Nucleus sampling
- `stream` - Enable streaming
- `tools` - Function calling tools
## πŸ“ˆ Expected Performance
### Response Times (approximate):
- **Claude 3.5 Haiku**: ~2-5 seconds
- **GPT-4.1 Nano**: ~2-4 seconds
- **GPT-4.1 Mini**: ~3-6 seconds
- **Claude 3.5 Sonnet**: ~4-8 seconds
- **Claude 3.7 Sonnet**: ~5-10 seconds
- **GPT-4.1**: ~6-12 seconds
- **Claude 4 Sonnet**: ~8-15 seconds
### Context Lengths:
- **Claude Models**: 200,000 tokens
- **GPT Models**: 128,000 tokens
## πŸ†˜ Troubleshooting
### Build Issues
1. **Docker build fails**: Check Dockerfile syntax
2. **Dependencies fail**: Verify requirements.txt
3. **Port issues**: Ensure using port 7860
### Runtime Issues
1. **Health check fails**: Check server logs in HF
2. **Models not working**: Verify Replicate API access
3. **Slow responses**: Try faster models (haiku, nano)
### API Issues
1. **Model not found**: Check model name spelling
2. **Streaming broken**: Verify SSE support
3. **Function calling fails**: Check tool definition format
## βœ… Success Checklist
- [ ] Space created with Docker SDK
- [ ] All files uploaded correctly
- [ ] Build completes without errors
- [ ] Health endpoint returns 200
- [ ] Models endpoint lists 7 models
- [ ] At least one model responds correctly
- [ ] Streaming works
- [ ] OpenAI SDK compatibility verified
## πŸŽ‰ You're Live!
Once deployed, your API provides:
βœ… **7 AI Models** in one endpoint
βœ… **OpenAI Compatibility** for easy integration
βœ… **Streaming Support** for real-time responses
βœ… **Function Calling** for tool integration
βœ… **Global Access** via Hugging Face
βœ… **Free Hosting** on HF Spaces
## πŸ“ž Support
For issues:
1. Check Hugging Face Space logs
2. Test locally first: `python replicate_server.py`
3. Verify model names match supported list
4. Check Replicate API status
## πŸš€ Example Applications
Your deployed API can power:
- **Chatbots** with multiple personality models
- **Code Assistants** using Claude for analysis
- **Writing Tools** with model selection
- **Research Tools** with different reasoning models
- **Customer Support** with fast response models
**Your Multi-Model API URL**:
`https://your-username-replicate-multi-model-api.hf.space`
🎊 **Congratulations! You now have 7 AI models in one OpenAI-compatible API!** 🎊