Spaces:

Samfy001
/

replcitae

Paused

App Files Files Community

replcitae / README.md

Samfy001

Upload 4 files

b0fe79f verified 6 months ago

preview code

raw

history blame contribute delete

7.94 kB

metadata

title: Multi-Model Replicate OpenAI API
emoji: 🤖
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
suggested_hardware: cpu-basic
tags:
  - openai
  - claude
  - gpt
  - replicate
  - api
  - multi-model
  - streaming
  - function-calling

🚀 Multi-Model Replicate OpenAI API - Hugging Face Spaces

Deploy a complete OpenAI-compatible API with 7 AI models (Claude & GPT) to Hugging Face Spaces.

🤖 Supported Models

Anthropic Claude Models

claude-4-sonnet - Latest Claude 4 Sonnet (Most Capable)
claude-3.7-sonnet - Claude 3.7 Sonnet
claude-3.5-sonnet - Claude 3.5 Sonnet (Balanced)
claude-3.5-haiku - Claude 3.5 Haiku (Fastest)

OpenAI GPT Models

gpt-4.1 - Latest GPT-4.1
gpt-4.1-mini - GPT-4.1 Mini (Cost-Effective)
gpt-4.1-nano - GPT-4.1 Nano (Ultra-Fast)

✨ Features

🎯 100% OpenAI Compatible - Drop-in replacement
🌊 Streaming Support - Real-time responses
🔧 Function Calling - Tool/function calling
🔐 Secure - Obfuscated API keys
📊 Monitoring - Health checks & stats
🚀 Multi-Model - 7 models in one API

🚀 Deploy to Hugging Face Spaces

Step 1: Create New Space

Go to huggingface.co/spaces
Click "Create new Space"
Choose:
- Name: replicate-multi-model-api
- SDK: Docker ⚠️ (Important!)
- Hardware: CPU Basic (free tier)
- Visibility: Public

Step 2: Upload Files

Upload these files to your Space:

📁 Your Hugging Face Space:
├── app.py                 ← Upload replicate_server.py as app.py
├── requirements.txt       ← Upload requirements.txt
├── Dockerfile            ← Upload Dockerfile
├── README.md             ← Upload this file as README.md
├── test_all_models.py    ← Upload test_all_models.py (optional)
└── quick_test.py         ← Upload quick_test.py (optional)

Step 3: Set Environment Variables (Optional)

In your Space settings, you can set:

REPLICATE_API_TOKEN - Your Replicate API token (if you want to use your own)

Note: The app includes an obfuscated token, so this is optional.

Step 4: Deploy

Hugging Face will automatically build and deploy
Wait 5-10 minutes for build completion
Your API will be live!

🎯 Your API Endpoints

Once deployed at https://your-username-replicate-multi-model-api.hf.space:

Main Endpoints

POST /v1/chat/completions - Chat completions (all models)
GET /v1/models - List all 7 models
GET /health - Health check

Alternative Endpoints

POST /chat/completions - Alternative chat endpoint
GET /models - Alternative models endpoint

🧪 Test Your Deployment

1. Health Check

curl https://your-username-replicate-multi-model-api.hf.space/health

2. List Models

curl https://your-username-replicate-multi-model-api.hf.space/v1/models

3. Test Claude 4 Sonnet

curl -X POST https://your-username-replicate-multi-model-api.hf.space/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-4-sonnet",
    "messages": [
      {"role": "user", "content": "Write a haiku about AI"}
    ],
    "max_tokens": 100
  }'

4. Test GPT-4.1 Mini

curl -X POST https://your-username-replicate-multi-model-api.hf.space/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4.1-mini",
    "messages": [
      {"role": "user", "content": "Quick math: What is 15 * 23?"}
    ],
    "stream": false
  }'

5. Test Streaming

curl -X POST https://your-username-replicate-multi-model-api.hf.space/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-3.5-haiku",
    "messages": [
      {"role": "user", "content": "Count from 1 to 10"}
    ],
    "stream": true
  }'

🔌 OpenAI SDK Compatibility

Your deployed API works with the OpenAI SDK:

import openai

client = openai.OpenAI(
    base_url="https://your-username-replicate-multi-model-api.hf.space/v1",
    api_key="dummy"  # Not required
)

# Use any of the 7 models
completion = client.chat.completions.create(
    model="claude-3.5-sonnet",
    messages=[
        {"role": "user", "content": "Hello, world!"}
    ]
)

print(completion.choices[0].message.content)

📊 Model Selection Guide

For Different Use Cases:

🧠 Complex Reasoning & Analysis

claude-4-sonnet - Best for complex tasks, analysis, coding

⚡ Speed & Quick Responses

claude-3.5-haiku - Fastest Claude model
gpt-4.1-nano - Ultra-fast GPT model

💰 Cost-Effective

gpt-4.1-mini - Good balance of cost and capability

🎯 General Purpose

claude-3.5-sonnet - Excellent all-around model
gpt-4.1 - Latest GPT capabilities

📝 Writing & Creative Tasks

claude-3.7-sonnet - Great for creative writing
claude-3.5-sonnet - Balanced creativity and logic

🔧 Configuration

Environment Variables

PORT - Server port (default: 7860 for HF)
HOST - Server host (default: 0.0.0.0)
REPLICATE_API_TOKEN - Your Replicate token (optional)

Request Parameters

All models support:

max_tokens - Maximum response tokens
temperature - Creativity (0.0-2.0)
top_p - Nucleus sampling
stream - Enable streaming
tools - Function calling tools

📈 Expected Performance

Response Times (approximate):

Claude 3.5 Haiku: ~2-5 seconds
GPT-4.1 Nano: ~2-4 seconds
GPT-4.1 Mini: ~3-6 seconds
Claude 3.5 Sonnet: ~4-8 seconds
Claude 3.7 Sonnet: ~5-10 seconds
GPT-4.1: ~6-12 seconds
Claude 4 Sonnet: ~8-15 seconds

Context Lengths:

Claude Models: 200,000 tokens
GPT Models: 128,000 tokens

🆘 Troubleshooting

Build Issues

Docker build fails: Check Dockerfile syntax
Dependencies fail: Verify requirements.txt
Port issues: Ensure using port 7860

Runtime Issues

Health check fails: Check server logs in HF
Models not working: Verify Replicate API access
Slow responses: Try faster models (haiku, nano)

API Issues

Model not found: Check model name spelling
Streaming broken: Verify SSE support
Function calling fails: Check tool definition format

✅ Success Checklist

Space created with Docker SDK
All files uploaded correctly
Build completes without errors
Health endpoint returns 200
Models endpoint lists 7 models
At least one model responds correctly
Streaming works
OpenAI SDK compatibility verified

🎉 You're Live!

Once deployed, your API provides:

✅ 7 AI Models in one endpoint ✅ OpenAI Compatibility for easy integration
✅ Streaming Support for real-time responses ✅ Function Calling for tool integration ✅ Global Access via Hugging Face ✅ Free Hosting on HF Spaces

📞 Support

For issues:

Check Hugging Face Space logs
Test locally first: python replicate_server.py
Verify model names match supported list
Check Replicate API status

🚀 Example Applications

Your deployed API can power:

Chatbots with multiple personality models
Code Assistants using Claude for analysis
Writing Tools with model selection
Research Tools with different reasoning models
Customer Support with fast response models

Your Multi-Model API URL: https://your-username-replicate-multi-model-api.hf.space

🎊 Congratulations! You now have 7 AI models in one OpenAI-compatible API! 🎊