File size: 7,937 Bytes
b0fe79f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
---

title: Multi-Model Replicate OpenAI API
emoji: πŸ€–
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
suggested_hardware: cpu-basic
tags:
  - openai
  - claude
  - gpt
  - replicate
  - api
  - multi-model
  - streaming
  - function-calling
---


# πŸš€ Multi-Model Replicate OpenAI API - Hugging Face Spaces

Deploy a complete OpenAI-compatible API with 7 AI models (Claude & GPT) to Hugging Face Spaces.

## πŸ€– Supported Models

### Anthropic Claude Models
- `claude-4-sonnet` - Latest Claude 4 Sonnet (Most Capable)
- `claude-3.7-sonnet` - Claude 3.7 Sonnet
- `claude-3.5-sonnet` - Claude 3.5 Sonnet (Balanced)
- `claude-3.5-haiku` - Claude 3.5 Haiku (Fastest)

### OpenAI GPT Models  
- `gpt-4.1` - Latest GPT-4.1
- `gpt-4.1-mini` - GPT-4.1 Mini (Cost-Effective)
- `gpt-4.1-nano` - GPT-4.1 Nano (Ultra-Fast)

## ✨ Features

- 🎯 **100% OpenAI Compatible** - Drop-in replacement
- 🌊 **Streaming Support** - Real-time responses
- πŸ”§ **Function Calling** - Tool/function calling
- πŸ” **Secure** - Obfuscated API keys
- πŸ“Š **Monitoring** - Health checks & stats
- πŸš€ **Multi-Model** - 7 models in one API

## πŸš€ Deploy to Hugging Face Spaces

### Step 1: Create New Space
1. Go to [huggingface.co/spaces](https://huggingface.co/spaces)
2. Click **"Create new Space"**
3. Choose:
   - **Name**: `replicate-multi-model-api`
   - **SDK**: **Docker** ⚠️ (Important!)
   - **Hardware**: CPU Basic (free tier)
   - **Visibility**: Public

### Step 2: Upload Files
Upload these files to your Space:

```

πŸ“ Your Hugging Face Space:

β”œβ”€β”€ app.py                 ← Upload replicate_server.py as app.py

β”œβ”€β”€ requirements.txt       ← Upload requirements.txt

β”œβ”€β”€ Dockerfile            ← Upload Dockerfile

β”œβ”€β”€ README.md             ← Upload this file as README.md

β”œβ”€β”€ test_all_models.py    ← Upload test_all_models.py (optional)

└── quick_test.py         ← Upload quick_test.py (optional)

```

### Step 3: Set Environment Variables (Optional)
In your Space settings, you can set:
- `REPLICATE_API_TOKEN` - Your Replicate API token (if you want to use your own)

**Note**: The app includes an obfuscated token, so this is optional.

### Step 4: Deploy
- Hugging Face will automatically build and deploy
- Wait 5-10 minutes for build completion
- Your API will be live!

## 🎯 Your API Endpoints

Once deployed at `https://your-username-replicate-multi-model-api.hf.space`:

### Main Endpoints
- `POST /v1/chat/completions` - Chat completions (all models)
- `GET /v1/models` - List all 7 models
- `GET /health` - Health check

### Alternative Endpoints
- `POST /chat/completions` - Alternative chat endpoint
- `GET /models` - Alternative models endpoint

## πŸ§ͺ Test Your Deployment

### 1. Health Check
```bash

curl https://your-username-replicate-multi-model-api.hf.space/health

```

### 2. List Models
```bash

curl https://your-username-replicate-multi-model-api.hf.space/v1/models

```

### 3. Test Claude 4 Sonnet
```bash

curl -X POST https://your-username-replicate-multi-model-api.hf.space/v1/chat/completions \

  -H "Content-Type: application/json" \

  -d '{

    "model": "claude-4-sonnet",

    "messages": [

      {"role": "user", "content": "Write a haiku about AI"}

    ],

    "max_tokens": 100

  }'

```

### 4. Test GPT-4.1 Mini
```bash

curl -X POST https://your-username-replicate-multi-model-api.hf.space/v1/chat/completions \

  -H "Content-Type: application/json" \

  -d '{

    "model": "gpt-4.1-mini",

    "messages": [

      {"role": "user", "content": "Quick math: What is 15 * 23?"}

    ],

    "stream": false

  }'

```

### 5. Test Streaming
```bash

curl -X POST https://your-username-replicate-multi-model-api.hf.space/v1/chat/completions \

  -H "Content-Type: application/json" \

  -d '{

    "model": "claude-3.5-haiku",

    "messages": [

      {"role": "user", "content": "Count from 1 to 10"}

    ],

    "stream": true

  }'

```

## πŸ”Œ OpenAI SDK Compatibility

Your deployed API works with the OpenAI SDK:

```python

import openai



client = openai.OpenAI(

    base_url="https://your-username-replicate-multi-model-api.hf.space/v1",

    api_key="dummy"  # Not required

)



# Use any of the 7 models

completion = client.chat.completions.create(

    model="claude-3.5-sonnet",

    messages=[

        {"role": "user", "content": "Hello, world!"}

    ]

)



print(completion.choices[0].message.content)

```

## πŸ“Š Model Selection Guide

### For Different Use Cases:

**🧠 Complex Reasoning & Analysis**
- `claude-4-sonnet` - Best for complex tasks, analysis, coding

**⚑ Speed & Quick Responses**  
- `claude-3.5-haiku` - Fastest Claude model
- `gpt-4.1-nano` - Ultra-fast GPT model

**πŸ’° Cost-Effective**
- `gpt-4.1-mini` - Good balance of cost and capability

**🎯 General Purpose**
- `claude-3.5-sonnet` - Excellent all-around model
- `gpt-4.1` - Latest GPT capabilities

**πŸ“ Writing & Creative Tasks**
- `claude-3.7-sonnet` - Great for creative writing
- `claude-3.5-sonnet` - Balanced creativity and logic

## πŸ”§ Configuration

### Environment Variables
- `PORT` - Server port (default: 7860 for HF)
- `HOST` - Server host (default: 0.0.0.0)
- `REPLICATE_API_TOKEN` - Your Replicate token (optional)

### Request Parameters
All models support:
- `max_tokens` - Maximum response tokens
- `temperature` - Creativity (0.0-2.0)
- `top_p` - Nucleus sampling
- `stream` - Enable streaming
- `tools` - Function calling tools

## πŸ“ˆ Expected Performance

### Response Times (approximate):
- **Claude 3.5 Haiku**: ~2-5 seconds
- **GPT-4.1 Nano**: ~2-4 seconds  
- **GPT-4.1 Mini**: ~3-6 seconds
- **Claude 3.5 Sonnet**: ~4-8 seconds
- **Claude 3.7 Sonnet**: ~5-10 seconds
- **GPT-4.1**: ~6-12 seconds
- **Claude 4 Sonnet**: ~8-15 seconds

### Context Lengths:
- **Claude Models**: 200,000 tokens
- **GPT Models**: 128,000 tokens

## πŸ†˜ Troubleshooting

### Build Issues
1. **Docker build fails**: Check Dockerfile syntax
2. **Dependencies fail**: Verify requirements.txt
3. **Port issues**: Ensure using port 7860

### Runtime Issues
1. **Health check fails**: Check server logs in HF
2. **Models not working**: Verify Replicate API access
3. **Slow responses**: Try faster models (haiku, nano)

### API Issues
1. **Model not found**: Check model name spelling
2. **Streaming broken**: Verify SSE support
3. **Function calling fails**: Check tool definition format

## βœ… Success Checklist

- [ ] Space created with Docker SDK
- [ ] All files uploaded correctly
- [ ] Build completes without errors
- [ ] Health endpoint returns 200
- [ ] Models endpoint lists 7 models
- [ ] At least one model responds correctly
- [ ] Streaming works
- [ ] OpenAI SDK compatibility verified

## πŸŽ‰ You're Live!

Once deployed, your API provides:

βœ… **7 AI Models** in one endpoint
βœ… **OpenAI Compatibility** for easy integration  
βœ… **Streaming Support** for real-time responses
βœ… **Function Calling** for tool integration
βœ… **Global Access** via Hugging Face
βœ… **Free Hosting** on HF Spaces

## πŸ“ž Support

For issues:
1. Check Hugging Face Space logs
2. Test locally first: `python replicate_server.py`
3. Verify model names match supported list
4. Check Replicate API status

## πŸš€ Example Applications

Your deployed API can power:
- **Chatbots** with multiple personality models
- **Code Assistants** using Claude for analysis
- **Writing Tools** with model selection
- **Research Tools** with different reasoning models
- **Customer Support** with fast response models

**Your Multi-Model API URL**: 
`https://your-username-replicate-multi-model-api.hf.space`

🎊 **Congratulations! You now have 7 AI models in one OpenAI-compatible API!** 🎊