---
title: Multi-Model Replicate OpenAI API
emoji: 🤖
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
suggested_hardware: cpu-basic
tags:
  - openai
  - claude
  - gpt
  - replicate
  - api
  - multi-model
  - streaming
  - function-calling
---

# 🚀 Multi-Model Replicate OpenAI API - Hugging Face Spaces

Deploy a complete OpenAI-compatible API with 7 AI models (Claude & GPT) to Hugging Face Spaces.

## 🤖 Supported Models

### Anthropic Claude Models
- `claude-4-sonnet` - Latest Claude 4 Sonnet (Most Capable)
- `claude-3.7-sonnet` - Claude 3.7 Sonnet
- `claude-3.5-sonnet` - Claude 3.5 Sonnet (Balanced)
- `claude-3.5-haiku` - Claude 3.5 Haiku (Fastest)

### OpenAI GPT Models  
- `gpt-4.1` - Latest GPT-4.1
- `gpt-4.1-mini` - GPT-4.1 Mini (Cost-Effective)
- `gpt-4.1-nano` - GPT-4.1 Nano (Ultra-Fast)

## ✨ Features

- 🎯 **100% OpenAI Compatible** - Drop-in replacement
- 🌊 **Streaming Support** - Real-time responses
- 🔧 **Function Calling** - Tool/function calling
- 🔐 **Secure** - Obfuscated API keys
- 📊 **Monitoring** - Health checks & stats
- 🚀 **Multi-Model** - 7 models in one API

## 🚀 Deploy to Hugging Face Spaces

### Step 1: Create New Space
1. Go to [huggingface.co/spaces](https://huggingface.co/spaces)
2. Click **"Create new Space"**
3. Choose:
   - **Name**: `replicate-multi-model-api`
   - **SDK**: **Docker** ⚠️ (Important!)
   - **Hardware**: CPU Basic (free tier)
   - **Visibility**: Public

### Step 2: Upload Files
Upload these files to your Space:

```
📁 Your Hugging Face Space:
├── app.py                 ← Upload replicate_server.py as app.py
├── requirements.txt       ← Upload requirements.txt
├── Dockerfile            ← Upload Dockerfile
├── README.md             ← Upload this file as README.md
├── test_all_models.py    ← Upload test_all_models.py (optional)
└── quick_test.py         ← Upload quick_test.py (optional)
```

### Step 3: Set Environment Variables (Optional)
In your Space settings, you can set:
- `REPLICATE_API_TOKEN` - Your Replicate API token (if you want to use your own)

**Note**: The app includes an obfuscated token, so this is optional.

### Step 4: Deploy
- Hugging Face will automatically build and deploy
- Wait 5-10 minutes for build completion
- Your API will be live!

## 🎯 Your API Endpoints

Once deployed at `https://your-username-replicate-multi-model-api.hf.space`:

### Main Endpoints
- `POST /v1/chat/completions` - Chat completions (all models)
- `GET /v1/models` - List all 7 models
- `GET /health` - Health check

### Alternative Endpoints
- `POST /chat/completions` - Alternative chat endpoint
- `GET /models` - Alternative models endpoint

## 🧪 Test Your Deployment

### 1. Health Check
```bash
curl https://your-username-replicate-multi-model-api.hf.space/health
```

### 2. List Models
```bash
curl https://your-username-replicate-multi-model-api.hf.space/v1/models
```

### 3. Test Claude 4 Sonnet
```bash
curl -X POST https://your-username-replicate-multi-model-api.hf.space/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-4-sonnet",
    "messages": [
      {"role": "user", "content": "Write a haiku about AI"}
    ],
    "max_tokens": 100
  }'
```

### 4. Test GPT-4.1 Mini
```bash
curl -X POST https://your-username-replicate-multi-model-api.hf.space/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4.1-mini",
    "messages": [
      {"role": "user", "content": "Quick math: What is 15 * 23?"}
    ],
    "stream": false
  }'
```

### 5. Test Streaming
```bash
curl -X POST https://your-username-replicate-multi-model-api.hf.space/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-3.5-haiku",
    "messages": [
      {"role": "user", "content": "Count from 1 to 10"}
    ],
    "stream": true
  }'
```

## 🔌 OpenAI SDK Compatibility

Your deployed API works with the OpenAI SDK:

```python
import openai

client = openai.OpenAI(
    base_url="https://your-username-replicate-multi-model-api.hf.space/v1",
    api_key="dummy"  # Not required
)

# Use any of the 7 models
completion = client.chat.completions.create(
    model="claude-3.5-sonnet",
    messages=[
        {"role": "user", "content": "Hello, world!"}
    ]
)

print(completion.choices[0].message.content)
```

## 📊 Model Selection Guide

### For Different Use Cases:

**🧠 Complex Reasoning & Analysis**
- `claude-4-sonnet` - Best for complex tasks, analysis, coding

**⚡ Speed & Quick Responses**  
- `claude-3.5-haiku` - Fastest Claude model
- `gpt-4.1-nano` - Ultra-fast GPT model

**💰 Cost-Effective**
- `gpt-4.1-mini` - Good balance of cost and capability

**🎯 General Purpose**
- `claude-3.5-sonnet` - Excellent all-around model
- `gpt-4.1` - Latest GPT capabilities

**📝 Writing & Creative Tasks**
- `claude-3.7-sonnet` - Great for creative writing
- `claude-3.5-sonnet` - Balanced creativity and logic

## 🔧 Configuration

### Environment Variables
- `PORT` - Server port (default: 7860 for HF)
- `HOST` - Server host (default: 0.0.0.0)
- `REPLICATE_API_TOKEN` - Your Replicate token (optional)

### Request Parameters
All models support:
- `max_tokens` - Maximum response tokens
- `temperature` - Creativity (0.0-2.0)
- `top_p` - Nucleus sampling
- `stream` - Enable streaming
- `tools` - Function calling tools

## 📈 Expected Performance

### Response Times (approximate):
- **Claude 3.5 Haiku**: ~2-5 seconds
- **GPT-4.1 Nano**: ~2-4 seconds  
- **GPT-4.1 Mini**: ~3-6 seconds
- **Claude 3.5 Sonnet**: ~4-8 seconds
- **Claude 3.7 Sonnet**: ~5-10 seconds
- **GPT-4.1**: ~6-12 seconds
- **Claude 4 Sonnet**: ~8-15 seconds

### Context Lengths:
- **Claude Models**: 200,000 tokens
- **GPT Models**: 128,000 tokens

## 🆘 Troubleshooting

### Build Issues
1. **Docker build fails**: Check Dockerfile syntax
2. **Dependencies fail**: Verify requirements.txt
3. **Port issues**: Ensure using port 7860

### Runtime Issues
1. **Health check fails**: Check server logs in HF
2. **Models not working**: Verify Replicate API access
3. **Slow responses**: Try faster models (haiku, nano)

### API Issues
1. **Model not found**: Check model name spelling
2. **Streaming broken**: Verify SSE support
3. **Function calling fails**: Check tool definition format

## ✅ Success Checklist

- [ ] Space created with Docker SDK
- [ ] All files uploaded correctly
- [ ] Build completes without errors
- [ ] Health endpoint returns 200
- [ ] Models endpoint lists 7 models
- [ ] At least one model responds correctly
- [ ] Streaming works
- [ ] OpenAI SDK compatibility verified

## 🎉 You're Live!

Once deployed, your API provides:

✅ **7 AI Models** in one endpoint
✅ **OpenAI Compatibility** for easy integration  
✅ **Streaming Support** for real-time responses
✅ **Function Calling** for tool integration
✅ **Global Access** via Hugging Face
✅ **Free Hosting** on HF Spaces

## 📞 Support

For issues:
1. Check Hugging Face Space logs
2. Test locally first: `python replicate_server.py`
3. Verify model names match supported list
4. Check Replicate API status

## 🚀 Example Applications

Your deployed API can power:
- **Chatbots** with multiple personality models
- **Code Assistants** using Claude for analysis
- **Writing Tools** with model selection
- **Research Tools** with different reasoning models
- **Customer Support** with fast response models

**Your Multi-Model API URL**: 
`https://your-username-replicate-multi-model-api.hf.space`

🎊 **Congratulations! You now have 7 AI models in one OpenAI-compatible API!** 🎊