Spaces:

Samfy001
/

replcitae

Paused

App Files Files Community

replcitae / README.md

Samfy001

Upload 4 files

b0fe79f verified 6 months ago

preview code

raw

history blame contribute delete

7.94 kB

	---
	title: Multi-Model Replicate OpenAI API
	emoji: 🤖
	colorFrom: blue
	colorTo: purple
	sdk: docker
	app_port: 7860
	suggested_hardware: cpu-basic
	tags:
	- openai
	- claude
	- gpt
	- replicate
	- api
	- multi-model
	- streaming
	- function-calling
	---

	# 🚀 Multi-Model Replicate OpenAI API - Hugging Face Spaces

	Deploy a complete OpenAI-compatible API with 7 AI models (Claude & GPT) to Hugging Face Spaces.

	## 🤖 Supported Models

	### Anthropic Claude Models
	- `claude-4-sonnet` - Latest Claude 4 Sonnet (Most Capable)
	- `claude-3.7-sonnet` - Claude 3.7 Sonnet
	- `claude-3.5-sonnet` - Claude 3.5 Sonnet (Balanced)
	- `claude-3.5-haiku` - Claude 3.5 Haiku (Fastest)

	### OpenAI GPT Models
	- `gpt-4.1` - Latest GPT-4.1
	- `gpt-4.1-mini` - GPT-4.1 Mini (Cost-Effective)
	- `gpt-4.1-nano` - GPT-4.1 Nano (Ultra-Fast)

	## ✨ Features

	- 🎯 100% OpenAI Compatible - Drop-in replacement
	- 🌊 Streaming Support - Real-time responses
	- 🔧 Function Calling - Tool/function calling
	- 🔐 Secure - Obfuscated API keys
	- 📊 Monitoring - Health checks & stats
	- 🚀 Multi-Model - 7 models in one API

	## 🚀 Deploy to Hugging Face Spaces

	### Step 1: Create New Space
	1. Go to [huggingface.co/spaces](https://huggingface.co/spaces)
	2. Click "Create new Space"
	3. Choose:
	- Name: `replicate-multi-model-api`
	- SDK: Docker ⚠️ (Important!)
	- Hardware: CPU Basic (free tier)
	- Visibility: Public

	### Step 2: Upload Files
	Upload these files to your Space:

	```
	📁 Your Hugging Face Space:
	├── app.py ← Upload replicate_server.py as app.py
	├── requirements.txt ← Upload requirements.txt
	├── Dockerfile ← Upload Dockerfile
	├── README.md ← Upload this file as README.md
	├── test_all_models.py ← Upload test_all_models.py (optional)
	└── quick_test.py ← Upload quick_test.py (optional)
	```

	### Step 3: Set Environment Variables (Optional)
	In your Space settings, you can set:
	- `REPLICATE_API_TOKEN` - Your Replicate API token (if you want to use your own)

	Note: The app includes an obfuscated token, so this is optional.

	### Step 4: Deploy
	- Hugging Face will automatically build and deploy
	- Wait 5-10 minutes for build completion
	- Your API will be live!

	## 🎯 Your API Endpoints

	Once deployed at `https://your-username-replicate-multi-model-api.hf.space`:

	### Main Endpoints
	- `POST /v1/chat/completions` - Chat completions (all models)
	- `GET /v1/models` - List all 7 models
	- `GET /health` - Health check

	### Alternative Endpoints
	- `POST /chat/completions` - Alternative chat endpoint
	- `GET /models` - Alternative models endpoint

	## 🧪 Test Your Deployment

	### 1. Health Check
	```bash
	curl https://your-username-replicate-multi-model-api.hf.space/health
	```

	### 2. List Models
	```bash
	curl https://your-username-replicate-multi-model-api.hf.space/v1/models
	```

	### 3. Test Claude 4 Sonnet
	```bash
	curl -X POST https://your-username-replicate-multi-model-api.hf.space/v1/chat/completions \
	-H "Content-Type: application/json" \
	-d '{
	"model": "claude-4-sonnet",
	"messages": [
	{"role": "user", "content": "Write a haiku about AI"}
	],
	"max_tokens": 100
	}'
	```

	### 4. Test GPT-4.1 Mini
	```bash
	curl -X POST https://your-username-replicate-multi-model-api.hf.space/v1/chat/completions \
	-H "Content-Type: application/json" \
	-d '{
	"model": "gpt-4.1-mini",
	"messages": [
	{"role": "user", "content": "Quick math: What is 15 * 23?"}
	],
	"stream": false
	}'
	```

	### 5. Test Streaming
	```bash
	curl -X POST https://your-username-replicate-multi-model-api.hf.space/v1/chat/completions \
	-H "Content-Type: application/json" \
	-d '{
	"model": "claude-3.5-haiku",
	"messages": [
	{"role": "user", "content": "Count from 1 to 10"}
	],
	"stream": true
	}'
	```

	## 🔌 OpenAI SDK Compatibility

	Your deployed API works with the OpenAI SDK:

	```python
	import openai

	client = openai.OpenAI(
	base_url="https://your-username-replicate-multi-model-api.hf.space/v1",
	api_key="dummy" # Not required
	)

	# Use any of the 7 models
	completion = client.chat.completions.create(
	model="claude-3.5-sonnet",
	messages=[
	{"role": "user", "content": "Hello, world!"}
	]
	)

	print(completion.choices[0].message.content)
	```

	## 📊 Model Selection Guide

	### For Different Use Cases:

	🧠 Complex Reasoning & Analysis
	- `claude-4-sonnet` - Best for complex tasks, analysis, coding

	⚡ Speed & Quick Responses
	- `claude-3.5-haiku` - Fastest Claude model
	- `gpt-4.1-nano` - Ultra-fast GPT model

	💰 Cost-Effective
	- `gpt-4.1-mini` - Good balance of cost and capability

	🎯 General Purpose
	- `claude-3.5-sonnet` - Excellent all-around model
	- `gpt-4.1` - Latest GPT capabilities

	📝 Writing & Creative Tasks
	- `claude-3.7-sonnet` - Great for creative writing
	- `claude-3.5-sonnet` - Balanced creativity and logic

	## 🔧 Configuration

	### Environment Variables
	- `PORT` - Server port (default: 7860 for HF)
	- `HOST` - Server host (default: 0.0.0.0)
	- `REPLICATE_API_TOKEN` - Your Replicate token (optional)

	### Request Parameters
	All models support:
	- `max_tokens` - Maximum response tokens
	- `temperature` - Creativity (0.0-2.0)
	- `top_p` - Nucleus sampling
	- `stream` - Enable streaming
	- `tools` - Function calling tools

	## 📈 Expected Performance

	### Response Times (approximate):
	- Claude 3.5 Haiku: ~2-5 seconds
	- GPT-4.1 Nano: ~2-4 seconds
	- GPT-4.1 Mini: ~3-6 seconds
	- Claude 3.5 Sonnet: ~4-8 seconds
	- Claude 3.7 Sonnet: ~5-10 seconds
	- GPT-4.1: ~6-12 seconds
	- Claude 4 Sonnet: ~8-15 seconds

	### Context Lengths:
	- Claude Models: 200,000 tokens
	- GPT Models: 128,000 tokens

	## 🆘 Troubleshooting

	### Build Issues
	1. Docker build fails: Check Dockerfile syntax
	2. Dependencies fail: Verify requirements.txt
	3. Port issues: Ensure using port 7860

	### Runtime Issues
	1. Health check fails: Check server logs in HF
	2. Models not working: Verify Replicate API access
	3. Slow responses: Try faster models (haiku, nano)

	### API Issues
	1. Model not found: Check model name spelling
	2. Streaming broken: Verify SSE support
	3. Function calling fails: Check tool definition format

	## ✅ Success Checklist

	- [ ] Space created with Docker SDK
	- [ ] All files uploaded correctly
	- [ ] Build completes without errors
	- [ ] Health endpoint returns 200
	- [ ] Models endpoint lists 7 models
	- [ ] At least one model responds correctly
	- [ ] Streaming works
	- [ ] OpenAI SDK compatibility verified

	## 🎉 You're Live!

	Once deployed, your API provides:

	✅ 7 AI Models in one endpoint
	✅ OpenAI Compatibility for easy integration
	✅ Streaming Support for real-time responses
	✅ Function Calling for tool integration
	✅ Global Access via Hugging Face
	✅ Free Hosting on HF Spaces

	## 📞 Support

	For issues:
	1. Check Hugging Face Space logs
	2. Test locally first: `python replicate_server.py`
	3. Verify model names match supported list
	4. Check Replicate API status

	## 🚀 Example Applications

	Your deployed API can power:
	- Chatbots with multiple personality models
	- Code Assistants using Claude for analysis
	- Writing Tools with model selection
	- Research Tools with different reasoning models
	- Customer Support with fast response models

	Your Multi-Model API URL:
	`https://your-username-replicate-multi-model-api.hf.space`

	🎊 Congratulations! You now have 7 AI models in one OpenAI-compatible API! 🎊