replcitae / README.md
Samfy001's picture
Upload 4 files
b0fe79f verified
metadata
title: Multi-Model Replicate OpenAI API
emoji: πŸ€–
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
suggested_hardware: cpu-basic
tags:
  - openai
  - claude
  - gpt
  - replicate
  - api
  - multi-model
  - streaming
  - function-calling

πŸš€ Multi-Model Replicate OpenAI API - Hugging Face Spaces

Deploy a complete OpenAI-compatible API with 7 AI models (Claude & GPT) to Hugging Face Spaces.

πŸ€– Supported Models

Anthropic Claude Models

  • claude-4-sonnet - Latest Claude 4 Sonnet (Most Capable)
  • claude-3.7-sonnet - Claude 3.7 Sonnet
  • claude-3.5-sonnet - Claude 3.5 Sonnet (Balanced)
  • claude-3.5-haiku - Claude 3.5 Haiku (Fastest)

OpenAI GPT Models

  • gpt-4.1 - Latest GPT-4.1
  • gpt-4.1-mini - GPT-4.1 Mini (Cost-Effective)
  • gpt-4.1-nano - GPT-4.1 Nano (Ultra-Fast)

✨ Features

  • 🎯 100% OpenAI Compatible - Drop-in replacement
  • 🌊 Streaming Support - Real-time responses
  • πŸ”§ Function Calling - Tool/function calling
  • πŸ” Secure - Obfuscated API keys
  • πŸ“Š Monitoring - Health checks & stats
  • πŸš€ Multi-Model - 7 models in one API

πŸš€ Deploy to Hugging Face Spaces

Step 1: Create New Space

  1. Go to huggingface.co/spaces
  2. Click "Create new Space"
  3. Choose:
    • Name: replicate-multi-model-api
    • SDK: Docker ⚠️ (Important!)
    • Hardware: CPU Basic (free tier)
    • Visibility: Public

Step 2: Upload Files

Upload these files to your Space:

πŸ“ Your Hugging Face Space:
β”œβ”€β”€ app.py                 ← Upload replicate_server.py as app.py
β”œβ”€β”€ requirements.txt       ← Upload requirements.txt
β”œβ”€β”€ Dockerfile            ← Upload Dockerfile
β”œβ”€β”€ README.md             ← Upload this file as README.md
β”œβ”€β”€ test_all_models.py    ← Upload test_all_models.py (optional)
└── quick_test.py         ← Upload quick_test.py (optional)

Step 3: Set Environment Variables (Optional)

In your Space settings, you can set:

  • REPLICATE_API_TOKEN - Your Replicate API token (if you want to use your own)

Note: The app includes an obfuscated token, so this is optional.

Step 4: Deploy

  • Hugging Face will automatically build and deploy
  • Wait 5-10 minutes for build completion
  • Your API will be live!

🎯 Your API Endpoints

Once deployed at https://your-username-replicate-multi-model-api.hf.space:

Main Endpoints

  • POST /v1/chat/completions - Chat completions (all models)
  • GET /v1/models - List all 7 models
  • GET /health - Health check

Alternative Endpoints

  • POST /chat/completions - Alternative chat endpoint
  • GET /models - Alternative models endpoint

πŸ§ͺ Test Your Deployment

1. Health Check

curl https://your-username-replicate-multi-model-api.hf.space/health

2. List Models

curl https://your-username-replicate-multi-model-api.hf.space/v1/models

3. Test Claude 4 Sonnet

curl -X POST https://your-username-replicate-multi-model-api.hf.space/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-4-sonnet",
    "messages": [
      {"role": "user", "content": "Write a haiku about AI"}
    ],
    "max_tokens": 100
  }'

4. Test GPT-4.1 Mini

curl -X POST https://your-username-replicate-multi-model-api.hf.space/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4.1-mini",
    "messages": [
      {"role": "user", "content": "Quick math: What is 15 * 23?"}
    ],
    "stream": false
  }'

5. Test Streaming

curl -X POST https://your-username-replicate-multi-model-api.hf.space/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-3.5-haiku",
    "messages": [
      {"role": "user", "content": "Count from 1 to 10"}
    ],
    "stream": true
  }'

πŸ”Œ OpenAI SDK Compatibility

Your deployed API works with the OpenAI SDK:

import openai

client = openai.OpenAI(
    base_url="https://your-username-replicate-multi-model-api.hf.space/v1",
    api_key="dummy"  # Not required
)

# Use any of the 7 models
completion = client.chat.completions.create(
    model="claude-3.5-sonnet",
    messages=[
        {"role": "user", "content": "Hello, world!"}
    ]
)

print(completion.choices[0].message.content)

πŸ“Š Model Selection Guide

For Different Use Cases:

🧠 Complex Reasoning & Analysis

  • claude-4-sonnet - Best for complex tasks, analysis, coding

⚑ Speed & Quick Responses

  • claude-3.5-haiku - Fastest Claude model
  • gpt-4.1-nano - Ultra-fast GPT model

πŸ’° Cost-Effective

  • gpt-4.1-mini - Good balance of cost and capability

🎯 General Purpose

  • claude-3.5-sonnet - Excellent all-around model
  • gpt-4.1 - Latest GPT capabilities

πŸ“ Writing & Creative Tasks

  • claude-3.7-sonnet - Great for creative writing
  • claude-3.5-sonnet - Balanced creativity and logic

πŸ”§ Configuration

Environment Variables

  • PORT - Server port (default: 7860 for HF)
  • HOST - Server host (default: 0.0.0.0)
  • REPLICATE_API_TOKEN - Your Replicate token (optional)

Request Parameters

All models support:

  • max_tokens - Maximum response tokens
  • temperature - Creativity (0.0-2.0)
  • top_p - Nucleus sampling
  • stream - Enable streaming
  • tools - Function calling tools

πŸ“ˆ Expected Performance

Response Times (approximate):

  • Claude 3.5 Haiku: ~2-5 seconds
  • GPT-4.1 Nano: ~2-4 seconds
  • GPT-4.1 Mini: ~3-6 seconds
  • Claude 3.5 Sonnet: ~4-8 seconds
  • Claude 3.7 Sonnet: ~5-10 seconds
  • GPT-4.1: ~6-12 seconds
  • Claude 4 Sonnet: ~8-15 seconds

Context Lengths:

  • Claude Models: 200,000 tokens
  • GPT Models: 128,000 tokens

πŸ†˜ Troubleshooting

Build Issues

  1. Docker build fails: Check Dockerfile syntax
  2. Dependencies fail: Verify requirements.txt
  3. Port issues: Ensure using port 7860

Runtime Issues

  1. Health check fails: Check server logs in HF
  2. Models not working: Verify Replicate API access
  3. Slow responses: Try faster models (haiku, nano)

API Issues

  1. Model not found: Check model name spelling
  2. Streaming broken: Verify SSE support
  3. Function calling fails: Check tool definition format

βœ… Success Checklist

  • Space created with Docker SDK
  • All files uploaded correctly
  • Build completes without errors
  • Health endpoint returns 200
  • Models endpoint lists 7 models
  • At least one model responds correctly
  • Streaming works
  • OpenAI SDK compatibility verified

πŸŽ‰ You're Live!

Once deployed, your API provides:

βœ… 7 AI Models in one endpoint βœ… OpenAI Compatibility for easy integration
βœ… Streaming Support for real-time responses βœ… Function Calling for tool integration βœ… Global Access via Hugging Face βœ… Free Hosting on HF Spaces

πŸ“ž Support

For issues:

  1. Check Hugging Face Space logs
  2. Test locally first: python replicate_server.py
  3. Verify model names match supported list
  4. Check Replicate API status

πŸš€ Example Applications

Your deployed API can power:

  • Chatbots with multiple personality models
  • Code Assistants using Claude for analysis
  • Writing Tools with model selection
  • Research Tools with different reasoning models
  • Customer Support with fast response models

Your Multi-Model API URL: https://your-username-replicate-multi-model-api.hf.space

🎊 Congratulations! You now have 7 AI models in one OpenAI-compatible API! 🎊