Spaces:

Samfy001
/

replcitae

Paused

App Files Files Community

Samfy001 commited on Aug 7, 2025

Commit

b0fe79f

verified ·

1 Parent(s): 647ca2e

Upload 4 files

Browse files

Files changed (4) hide show

Dockerfile +48 -0
README.md +279 -0
app.py +629 -0
requirements.txt +6 -0

Dockerfile ADDED Viewed

	@@ -0,0 +1,48 @@

+# Use Python 3.11 slim image for better compatibility with Hugging Face
+FROM python:3.11-slim
+# Set working directory
+WORKDIR /app
+# Set environment variables for Hugging Face Spaces
+ENV PYTHONUNBUFFERED=1
+ENV PYTHONDONTWRITEBYTECODE=1
+ENV PORT=7860
+ENV HOST=0.0.0.0
+# Install system dependencies required for the application
+RUN apt-get update && apt-get install -y \
+    gcc \
+    g++ \
+    libffi-dev \
+    libssl-dev \
+    curl \
+    git \
+    && rm -rf /var/lib/apt/lists/*
+# Copy requirements first for better Docker layer caching
+COPY requirements.txt .
+# Install Python dependencies
+RUN pip install --no-cache-dir --upgrade pip && \
+    pip install --no-cache-dir -r requirements.txt
+# Copy application files
+COPY replicate_server.py app.py
+COPY test_all_models.py .
+COPY quick_test.py .
+# Create a simple health check script
+RUN echo '#!/bin/bash\ncurl -f http://localhost:7860/health || exit 1' > /healthcheck.sh && \
+    chmod +x /healthcheck.sh
+# Expose the port that Hugging Face expects
+EXPOSE 7860
+# Health check
+HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
+    CMD /healthcheck.sh
+# Command to run the multi-model application
+# Hugging Face Spaces expects the app to run on port 7860
+CMD ["python", "app.py"]

README.md ADDED Viewed

	@@ -0,0 +1,279 @@

+---
+title: Multi-Model Replicate OpenAI API
+emoji: 🤖
+colorFrom: blue
+colorTo: purple
+sdk: docker
+app_port: 7860
+suggested_hardware: cpu-basic
+tags:
+  - openai
+  - claude
+  - gpt
+  - replicate
+  - api
+  - multi-model
+  - streaming
+  - function-calling
+---
+# 🚀 Multi-Model Replicate OpenAI API - Hugging Face Spaces
+Deploy a complete OpenAI-compatible API with 7 AI models (Claude & GPT) to Hugging Face Spaces.
+## 🤖 Supported Models
+### Anthropic Claude Models
+- `claude-4-sonnet` - Latest Claude 4 Sonnet (Most Capable)
+- `claude-3.7-sonnet` - Claude 3.7 Sonnet
+- `claude-3.5-sonnet` - Claude 3.5 Sonnet (Balanced)
+- `claude-3.5-haiku` - Claude 3.5 Haiku (Fastest)
+### OpenAI GPT Models
+- `gpt-4.1` - Latest GPT-4.1
+- `gpt-4.1-mini` - GPT-4.1 Mini (Cost-Effective)
+- `gpt-4.1-nano` - GPT-4.1 Nano (Ultra-Fast)
+## ✨ Features
+- 🎯 **100% OpenAI Compatible** - Drop-in replacement
+- 🌊 **Streaming Support** - Real-time responses
+- 🔧 **Function Calling** - Tool/function calling
+- 🔐 **Secure** - Obfuscated API keys
+- 📊 **Monitoring** - Health checks & stats
+- 🚀 **Multi-Model** - 7 models in one API
+## 🚀 Deploy to Hugging Face Spaces
+### Step 1: Create New Space
+1. Go to [huggingface.co/spaces](https://huggingface.co/spaces)
+2. Click **"Create new Space"**
+3. Choose:
+   - **Name**: `replicate-multi-model-api`
+   - **SDK**: **Docker** ⚠️ (Important!)
+   - **Hardware**: CPU Basic (free tier)
+   - **Visibility**: Public
+### Step 2: Upload Files
+Upload these files to your Space:
+```
+📁 Your Hugging Face Space:
+├── app.py                 ← Upload replicate_server.py as app.py
+├── requirements.txt       ← Upload requirements.txt
+├── Dockerfile            ← Upload Dockerfile
+├── README.md             ← Upload this file as README.md
+├── test_all_models.py    ← Upload test_all_models.py (optional)
+└── quick_test.py         ← Upload quick_test.py (optional)
+```
+### Step 3: Set Environment Variables (Optional)
+In your Space settings, you can set:
+- `REPLICATE_API_TOKEN` - Your Replicate API token (if you want to use your own)
+**Note**: The app includes an obfuscated token, so this is optional.
+### Step 4: Deploy
+- Hugging Face will automatically build and deploy
+- Wait 5-10 minutes for build completion
+- Your API will be live!
+## 🎯 Your API Endpoints
+Once deployed at `https://your-username-replicate-multi-model-api.hf.space`:
+### Main Endpoints
+- `POST /v1/chat/completions` - Chat completions (all models)
+- `GET /v1/models` - List all 7 models
+- `GET /health` - Health check
+### Alternative Endpoints
+- `POST /chat/completions` - Alternative chat endpoint
+- `GET /models` - Alternative models endpoint
+## 🧪 Test Your Deployment
+### 1. Health Check
+```bash
+curl https://your-username-replicate-multi-model-api.hf.space/health
+```
+### 2. List Models
+```bash
+curl https://your-username-replicate-multi-model-api.hf.space/v1/models
+```
+### 3. Test Claude 4 Sonnet
+```bash
+curl -X POST https://your-username-replicate-multi-model-api.hf.space/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "claude-4-sonnet",
+    "messages": [
+      {"role": "user", "content": "Write a haiku about AI"}
+    ],
+    "max_tokens": 100
+  }'
+```
+### 4. Test GPT-4.1 Mini
+```bash
+curl -X POST https://your-username-replicate-multi-model-api.hf.space/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "gpt-4.1-mini",
+    "messages": [
+      {"role": "user", "content": "Quick math: What is 15 * 23?"}
+    ],
+    "stream": false
+  }'
+```
+### 5. Test Streaming
+```bash
+curl -X POST https://your-username-replicate-multi-model-api.hf.space/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "claude-3.5-haiku",
+    "messages": [
+      {"role": "user", "content": "Count from 1 to 10"}
+    ],
+    "stream": true
+  }'
+```
+## 🔌 OpenAI SDK Compatibility
+Your deployed API works with the OpenAI SDK:
+```python
+import openai
+client = openai.OpenAI(
+    base_url="https://your-username-replicate-multi-model-api.hf.space/v1",
+    api_key="dummy"  # Not required
+)
+# Use any of the 7 models
+completion = client.chat.completions.create(
+    model="claude-3.5-sonnet",
+    messages=[
+        {"role": "user", "content": "Hello, world!"}
+    ]
+)
+print(completion.choices[0].message.content)
+```
+## 📊 Model Selection Guide
+### For Different Use Cases:
+**🧠 Complex Reasoning & Analysis**
+- `claude-4-sonnet` - Best for complex tasks, analysis, coding
+**⚡ Speed & Quick Responses**
+- `claude-3.5-haiku` - Fastest Claude model
+- `gpt-4.1-nano` - Ultra-fast GPT model
+**💰 Cost-Effective**
+- `gpt-4.1-mini` - Good balance of cost and capability
+**🎯 General Purpose**
+- `claude-3.5-sonnet` - Excellent all-around model
+- `gpt-4.1` - Latest GPT capabilities
+**📝 Writing & Creative Tasks**
+- `claude-3.7-sonnet` - Great for creative writing
+- `claude-3.5-sonnet` - Balanced creativity and logic
+## 🔧 Configuration
+### Environment Variables
+- `PORT` - Server port (default: 7860 for HF)
+- `HOST` - Server host (default: 0.0.0.0)
+- `REPLICATE_API_TOKEN` - Your Replicate token (optional)
+### Request Parameters
+All models support:
+- `max_tokens` - Maximum response tokens
+- `temperature` - Creativity (0.0-2.0)
+- `top_p` - Nucleus sampling
+- `stream` - Enable streaming
+- `tools` - Function calling tools
+## 📈 Expected Performance
+### Response Times (approximate):
+- **Claude 3.5 Haiku**: ~2-5 seconds
+- **GPT-4.1 Nano**: ~2-4 seconds
+- **GPT-4.1 Mini**: ~3-6 seconds
+- **Claude 3.5 Sonnet**: ~4-8 seconds
+- **Claude 3.7 Sonnet**: ~5-10 seconds
+- **GPT-4.1**: ~6-12 seconds
+- **Claude 4 Sonnet**: ~8-15 seconds
+### Context Lengths:
+- **Claude Models**: 200,000 tokens
+- **GPT Models**: 128,000 tokens
+## 🆘 Troubleshooting
+### Build Issues
+1. **Docker build fails**: Check Dockerfile syntax
+2. **Dependencies fail**: Verify requirements.txt
+3. **Port issues**: Ensure using port 7860
+### Runtime Issues
+1. **Health check fails**: Check server logs in HF
+2. **Models not working**: Verify Replicate API access
+3. **Slow responses**: Try faster models (haiku, nano)
+### API Issues
+1. **Model not found**: Check model name spelling
+2. **Streaming broken**: Verify SSE support
+3. **Function calling fails**: Check tool definition format
+## ✅ Success Checklist
+- [ ] Space created with Docker SDK
+- [ ] All files uploaded correctly
+- [ ] Build completes without errors
+- [ ] Health endpoint returns 200
+- [ ] Models endpoint lists 7 models
+- [ ] At least one model responds correctly
+- [ ] Streaming works
+- [ ] OpenAI SDK compatibility verified
+## 🎉 You're Live!
+Once deployed, your API provides:
+✅ **7 AI Models** in one endpoint
+✅ **OpenAI Compatibility** for easy integration
+✅ **Streaming Support** for real-time responses
+✅ **Function Calling** for tool integration
+✅ **Global Access** via Hugging Face
+✅ **Free Hosting** on HF Spaces
+## 📞 Support
+For issues:
+1. Check Hugging Face Space logs
+2. Test locally first: `python replicate_server.py`
+3. Verify model names match supported list
+4. Check Replicate API status
+## 🚀 Example Applications
+Your deployed API can power:
+- **Chatbots** with multiple personality models
+- **Code Assistants** using Claude for analysis
+- **Writing Tools** with model selection
+- **Research Tools** with different reasoning models
+- **Customer Support** with fast response models
+**Your Multi-Model API URL**:
+`https://your-username-replicate-multi-model-api.hf.space`
+🎊 **Congratulations! You now have 7 AI models in one OpenAI-compatible API!** 🎊

app.py ADDED Viewed

	@@ -0,0 +1,629 @@

+import base64 as _b64, json as _j, time as _t, uuid as _u, logging as _l, traceback as _tb, os as _o
+from fastapi import FastAPI as _FA, HTTPException as _HE
+from fastapi.responses import StreamingResponse as _SR, JSONResponse as _JR
+from pydantic import BaseModel as _BM, Field as _F
+from typing import List as _L, Optional as _O, Dict as _D, Any as _A, Union as _U
+import replicate as _r
+from contextlib import asynccontextmanager as _acm
+# Obfuscated configuration
+_l.basicConfig(level=_l.INFO)
+_lg = _l.getLogger(__name__)
+_TOKEN = _b64.b64decode(b'cjhfWDdxeVpLTkZLZlZpUWdRaDJJcUhIa1BmdkFqRGhqSzFBWVl0Yw==').decode('utf-8')
+# Supported models configuration
+_MODELS = {
+    # Anthropic Claude Models
+    "claude-4-sonnet": "anthropic/claude-4-sonnet",
+    "claude-3.7-sonnet": "anthropic/claude-3.7-sonnet",
+    "claude-3.5-sonnet": "anthropic/claude-3.5-sonnet",
+    "claude-3.5-haiku": "anthropic/claude-3.5-haiku",
+    # OpenAI GPT Models
+    "gpt-4.1": "openai/gpt-4.1",
+    "gpt-4.1-mini": "openai/gpt-4.1-mini",
+    "gpt-4.1-nano": "openai/gpt-4.1-nano",
+    # Alternative naming (with provider prefix)
+    "anthropic/claude-4-sonnet": "anthropic/claude-4-sonnet",
+    "anthropic/claude-3.7-sonnet": "anthropic/claude-3.7-sonnet",
+    "anthropic/claude-3.5-sonnet": "anthropic/claude-3.5-sonnet",
+    "anthropic/claude-3.5-haiku": "anthropic/claude-3.5-haiku",
+    "openai/gpt-4.1": "openai/gpt-4.1",
+    "openai/gpt-4.1-mini": "openai/gpt-4.1-mini",
+    "openai/gpt-4.1-nano": "openai/gpt-4.1-nano"
+}
+# Model metadata for OpenAI compatibility
+_MODEL_INFO = {
+    "claude-4-sonnet": {"owned_by": "anthropic", "context_length": 200000},
+    "claude-3.7-sonnet": {"owned_by": "anthropic", "context_length": 200000},
+    "claude-3.5-sonnet": {"owned_by": "anthropic", "context_length": 200000},
+    "claude-3.5-haiku": {"owned_by": "anthropic", "context_length": 200000},
+    "gpt-4.1": {"owned_by": "openai", "context_length": 128000},
+    "gpt-4.1-mini": {"owned_by": "openai", "context_length": 128000},
+    "gpt-4.1-nano": {"owned_by": "openai", "context_length": 128000}
+}
+# OpenAI Compatible Models
+class _CM(_BM):
+    role: str = _F(..., description="Message role")
+    content: _O[_U[str, _L[_D[str, _A]]]] = _F(None, description="Message content")
+    name: _O[str] = _F(None, description="Message name")
+    function_call: _O[_D[str, _A]] = _F(None, description="Function call")
+    tool_calls: _O[_L[_D[str, _A]]] = _F(None, description="Tool calls")
+    tool_call_id: _O[str] = _F(None, description="Tool call ID")
+class _FC(_BM):
+    name: str = _F(..., description="Function name")
+    arguments: str = _F(..., description="Function arguments")
+class _TC(_BM):
+    id: str = _F(..., description="Tool call ID")
+    type: str = _F(default="function", description="Tool call type")
+    function: _FC = _F(..., description="Function call")
+class _FD(_BM):
+    name: str = _F(..., description="Function name")
+    description: _O[str] = _F(None, description="Function description")
+    parameters: _D[str, _A] = _F(..., description="Function parameters")
+class _TD(_BM):
+    type: str = _F(default="function", description="Tool type")
+    function: _FD = _F(..., description="Function definition")
+class _CCR(_BM):
+    model: str = _F(..., description="Model name")
+    messages: _L[_CM] = _F(..., description="Messages")
+    max_tokens: _O[int] = _F(default=4096, description="Max tokens")
+    temperature: _O[float] = _F(default=0.7, description="Temperature")
+    top_p: _O[float] = _F(default=1.0, description="Top p")
+    n: _O[int] = _F(default=1, description="Number of completions")
+    stream: _O[bool] = _F(default=True, description="Stream response")
+    stop: _O[_U[str, _L[str]]] = _F(None, description="Stop sequences")
+    presence_penalty: _O[float] = _F(default=0.0, description="Presence penalty")
+    frequency_penalty: _O[float] = _F(default=0.0, description="Frequency penalty")
+    logit_bias: _O[_D[str, float]] = _F(None, description="Logit bias")
+    user: _O[str] = _F(None, description="User ID")
+    tools: _O[_L[_TD]] = _F(None, description="Available tools")
+    tool_choice: _O[_U[str, _D[str, _A]]] = _F(None, description="Tool choice")
+    functions: _O[_L[_FD]] = _F(None, description="Available functions")
+    function_call: _O[_U[str, _D[str, _A]]] = _F(None, description="Function call")
+class _CCC(_BM):
+    index: int = _F(default=0, description="Choice index")
+    message: _CM = _F(..., description="Message")
+    finish_reason: _O[str] = _F(None, description="Finish reason")
+class _CCSC(_BM):
+    index: int = _F(default=0, description="Choice index")
+    delta: _D[str, _A] = _F(..., description="Delta")
+    finish_reason: _O[str] = _F(None, description="Finish reason")
+class _CCRes(_BM):
+    id: str = _F(..., description="Completion ID")
+    object: str = _F(default="chat.completion", description="Object type")
+    created: int = _F(..., description="Created timestamp")
+    model: str = _F(..., description="Model name")
+    choices: _L[_CCC] = _F(..., description="Choices")
+    usage: _D[str, int] = _F(..., description="Usage stats")
+    system_fingerprint: _O[str] = _F(None, description="System fingerprint")
+class _CCSR(_BM):
+    id: str = _F(..., description="Completion ID")
+    object: str = _F(default="chat.completion.chunk", description="Object type")
+    created: int = _F(..., description="Created timestamp")
+    model: str = _F(..., description="Model name")
+    choices: _L[_CCSC] = _F(..., description="Choices")
+    system_fingerprint: _O[str] = _F(None, description="System fingerprint")
+class _OM(_BM):
+    id: str = _F(..., description="Model ID")
+    object: str = _F(default="model", description="Object type")
+    created: int = _F(..., description="Created timestamp")
+    owned_by: str = _F(..., description="Owner")
+# Replicate Client
+class _RC:
+    def __init__(self, _tk=_TOKEN):
+        _o.environ['REPLICATE_API_TOKEN'] = _tk
+        self._client = _r
+        self._models = _MODELS
+        self._model_info = _MODEL_INFO
+    def _get_replicate_model(self, _model_name):
+        """Get the Replicate model ID from OpenAI model name"""
+        return self._models.get(_model_name, _model_name)
+    def _validate_model(self, _model_name):
+        """Validate if model is supported"""
+        return _model_name in self._models or _model_name in self._models.values()
+    def _format_messages(self, _msgs):
+        _prompt = ""
+        _system = ""
+        for _msg in _msgs:
+            _role = _msg.get('role', '')
+            _content = _msg.get('content', '')
+            if _role == 'system':
+                _system = _content
+            elif _role == 'user':
+                _prompt += f"Human: {_content}\n\n"
+            elif _role == 'assistant':
+                _prompt += f"Assistant: {_content}\n\n"
+        _prompt += "Assistant: "
+        return _prompt, _system
+    def _create_prediction(self, _model_name, _prompt, _system="", **_kwargs):
+        """Create a prediction using Replicate API"""
+        _replicate_model = self._get_replicate_model(_model_name)
+        _input = {
+            "prompt": _prompt,
+            "system_prompt": _system,
+            "max_tokens": _kwargs.get('max_tokens', 4096),
+            "temperature": _kwargs.get('temperature', 0.7),
+            "top_p": _kwargs.get('top_p', 1.0)
+        }
+        try:
+            _prediction = self._client.predictions.create(
+                model=_replicate_model,
+                input=_input
+            )
+            return _prediction
+        except Exception as _e:
+            _lg.error(f"Prediction creation error for {_replicate_model}: {_e}")
+            return None
+    def _handle_tools(self, _tools, _tool_choice):
+        if not _tools:
+            return ""
+        _tool_prompt = "\n\nYou have access to the following tools:\n"
+        for _tool in _tools:
+            _func = _tool.get('function', {})
+            _name = _func.get('name', '')
+            _desc = _func.get('description', '')
+            _params = _func.get('parameters', {})
+            _tool_prompt += f"- {_name}: {_desc}\n"
+            _tool_prompt += f"  Parameters: {_j.dumps(_params)}\n"
+        _tool_prompt += "\nTo use a tool, respond with JSON in this format:\n"
+        _tool_prompt += '{"tool_calls": [{"id": "call_123", "type": "function", "function": {"name": "tool_name", "arguments": "{\\"param\\": \\"value\\"}"}}]}\n'
+        return _tool_prompt
+    def _stream_chat(self, _model_name, _prompt, _system="", **_kwargs):
+        """Stream chat using Replicate's streaming API"""
+        _replicate_model = self._get_replicate_model(_model_name)
+        _input = {
+            "prompt": _prompt,
+            "system_prompt": _system,
+            "max_tokens": _kwargs.get('max_tokens', 4096),
+            "temperature": _kwargs.get('temperature', 0.7),
+            "top_p": _kwargs.get('top_p', 1.0)
+        }
+        try:
+            # Use Replicate's streaming method
+            for _event in self._client.stream(_replicate_model, input=_input):
+                if _event:
+                    yield str(_event)
+        except Exception as _e:
+            _lg.error(f"Streaming error for {_replicate_model}: {_e}")
+            yield f"Error: {_e}"
+    def _stream_from_prediction(self, _prediction):
+        """Stream from a prediction using the stream URL"""
+        try:
+            import requests
+            _stream_url = _prediction.urls.get('stream')
+            if not _stream_url:
+                _lg.error("No stream URL available")
+                return
+            _response = requests.get(
+                _stream_url,
+                headers={
+                    "Accept": "text/event-stream",
+                    "Cache-Control": "no-store"
+                },
+                stream=True
+            )
+            for _line in _response.iter_lines():
+                if _line:
+                    _line = _line.decode('utf-8')
+                    if _line.startswith('data: '):
+                        _data = _line[6:]
+                        if _data != '[DONE]':
+                            yield _data
+                        else:
+                            break
+        except Exception as _e:
+            _lg.error(f"Stream from prediction error: {_e}")
+            yield f"Error: {_e}"
+    def _complete_chat(self, _model_name, _prompt, _system="", **_kwargs):
+        """Complete chat using Replicate's run method"""
+        _replicate_model = self._get_replicate_model(_model_name)
+        _input = {
+            "prompt": _prompt,
+            "system_prompt": _system,
+            "max_tokens": _kwargs.get('max_tokens', 4096),
+            "temperature": _kwargs.get('temperature', 0.7),
+            "top_p": _kwargs.get('top_p', 1.0)
+        }
+        try:
+            _result = self._client.run(_replicate_model, input=_input)
+            return "".join(_result) if isinstance(_result, list) else str(_result)
+        except Exception as _e:
+            _lg.error(f"Completion error for {_replicate_model}: {_e}")
+            return f"Error: {_e}"
+# Global variables
+_client = None
+_startup_time = _t.time()
+_request_count = 0
+_error_count = 0
+@_acm
+async def _lifespan(_app: _FA):
+    global _client
+    try:
+        _lg.info("Initializing Replicate client...")
+        _client = _RC()
+        _lg.info("Replicate client initialized successfully")
+    except Exception as _e:
+        _lg.error(f"Failed to initialize client: {_e}")
+        _client = None
+    yield
+    _lg.info("Shutting down Replicate client...")
+# FastAPI App
+_app = _FA(
+    title="Replicate Claude-4-Sonnet OpenAI API",
+    version="1.0.0",
+    description="OpenAI-compatible API for Claude-4-Sonnet via Replicate",
+    lifespan=_lifespan
+)
+# CORS
+try:
+    from fastapi.middleware.cors import CORSMiddleware as _CM
+    _app.add_middleware(
+        _CM,
+        allow_origins=["*"],
+        allow_credentials=True,
+        allow_methods=["*"],
+        allow_headers=["*"],
+    )
+except ImportError:
+    pass
+# Error handlers
+@_app.exception_handler(_HE)
+async def _http_exception_handler(_request, _exc: _HE):
+    _lg.error(f"HTTP error: {_exc.status_code} - {_exc.detail}")
+    return _JR(
+        status_code=_exc.status_code,
+        content={
+            "error": {
+                "message": _exc.detail,
+                "type": "api_error",
+                "code": _exc.status_code
+            }
+        }
+    )
+@_app.exception_handler(Exception)
+async def _global_exception_handler(_request, _exc):
+    _lg.error(f"Unexpected error: {_exc}\n{_tb.format_exc()}")
+    return _JR(
+        status_code=500,
+        content={
+            "error": {
+                "message": "Internal server error",
+                "type": "server_error",
+                "code": 500
+            }
+        }
+    )
+@_app.get("/")
+async def _root():
+    _model_count = len([m for m in _MODELS.keys() if not m.startswith(('anthropic/', 'openai/'))])
+    return {
+        "message": "Replicate Multi-Model OpenAI API",
+        "version": "1.0.0",
+        "status": "running",
+        "supported_models": _model_count,
+        "providers": ["anthropic", "openai"]
+    }
+@_app.get("/health")
+async def _health_check():
+    global _client, _startup_time, _request_count, _error_count
+    _uptime = _t.time() - _startup_time
+    _status = "healthy"
+    _client_status = "unknown"
+    if _client is None:
+        _client_status = "not_initialized"
+        _status = "degraded"
+    else:
+        _client_status = "ready"
+    return {
+        "status": _status,
+        "timestamp": int(_t.time()),
+        "uptime_seconds": int(_uptime),
+        "client_status": _client_status,
+        "stats": {
+            "total_requests": _request_count,
+            "total_errors": _error_count,
+            "error_rate": _error_count / max(_request_count, 1)
+        }
+    }
+@_app.get("/v1/models")
+async def _list_models():
+    """List all supported models"""
+    _models_list = []
+    _created_time = int(_t.time())
+    # Get unique model names (remove duplicates from alternative naming)
+    _unique_models = set()
+    for _model_name in _MODELS.keys():
+        if not _model_name.startswith(('anthropic/', 'openai/')):
+            _unique_models.add(_model_name)
+    # Create model objects
+    for _model_name in sorted(_unique_models):
+        _info = _MODEL_INFO.get(_model_name, {"owned_by": "unknown", "context_length": 4096})
+        _models_list.append(_OM(
+            id=_model_name,
+            created=_created_time,
+            owned_by=_info["owned_by"]
+        ))
+    return {
+        "object": "list",
+        "data": _models_list
+    }
+@_app.get("/models")
+async def _list_models_alt():
+    return await _list_models()
+async def _generate_stream_response(_request: _CCR, _prompt: str, _system: str, _request_id: str = None):
+    _completion_id = f"chatcmpl-{_u.uuid4().hex}"
+    _created_time = int(_t.time())
+    _request_id = _request_id or f"req-{_u.uuid4().hex[:8]}"
+    _lg.info(f"[{_request_id}] Starting stream generation")
+    try:
+        # Send initial chunk with role
+        _initial_chunk = {
+            "id": _completion_id,
+            "object": "chat.completion.chunk",
+            "created": _created_time,
+            "model": _request.model,
+            "choices": [{
+                "index": 0,
+                "delta": {"role": "assistant"},
+                "finish_reason": None
+            }]
+        }
+        yield f"data: {_j.dumps(_initial_chunk)}\n\n"
+        # Stream content chunks using Replicate's streaming
+        _chunk_count = 0
+        _total_content = ""
+        try:
+            # Use Replicate's direct streaming method with model parameter
+            for _chunk in _client._stream_chat(_request.model, _prompt, _system, **_request.model_dump()):
+                if _chunk and isinstance(_chunk, str):
+                    _chunk_count += 1
+                    _total_content += _chunk
+                    _stream_response = _CCSR(
+                        id=_completion_id,
+                        created=_created_time,
+                        model=_request.model,
+                        choices=[_CCSC(
+                            delta={"content": _chunk},
+                            finish_reason=None
+                        )]
+                    )
+                    try:
+                        _chunk_json = _j.dumps(_stream_response.model_dump())
+                        yield f"data: {_chunk_json}\n\n"
+                    except Exception as _json_error:
+                        _lg.error(f"[{_request_id}] JSON serialization error: {_json_error}")
+                        continue
+        except Exception as _stream_error:
+            _lg.error(f"[{_request_id}] Streaming error after {_chunk_count} chunks: {_stream_error}")
+            if _chunk_count == 0:
+                _error_content = "I apologize, but I encountered an error while generating the response. Please try again."
+                _error_response = _CCSR(
+                    id=_completion_id,
+                    created=_created_time,
+                    model=_request.model,
+                    choices=[_CCSC(
+                        delta={"content": _error_content},
+                        finish_reason=None
+                    )]
+                )
+                yield f"data: {_j.dumps(_error_response.model_dump())}\n\n"
+        _lg.info(f"[{_request_id}] Stream completed: {_chunk_count} chunks, {len(_total_content)} characters")
+    except Exception as _e:
+        _lg.error(f"[{_request_id}] Critical streaming error: {_e}")
+        _error_chunk = {
+            "id": _completion_id,
+            "object": "chat.completion.chunk",
+            "created": _created_time,
+            "model": _request.model,
+            "choices": [{
+                "index": 0,
+                "delta": {"content": "Error occurred while streaming response."},
+                "finish_reason": "stop"
+            }]
+        }
+        yield f"data: {_j.dumps(_error_chunk)}\n\n"
+    finally:
+        try:
+            _final_chunk = {
+                "id": _completion_id,
+                "object": "chat.completion.chunk",
+                "created": _created_time,
+                "model": _request.model,
+                "choices": [{
+                    "index": 0,
+                    "delta": {},
+                    "finish_reason": "stop"
+                }]
+            }
+            yield f"data: {_j.dumps(_final_chunk)}\n\n"
+            yield "data: [DONE]\n\n"
+            _lg.info(f"[{_request_id}] Stream finalized")
+        except Exception as _final_error:
+            _lg.error(f"[{_request_id}] Error sending final chunk: {_final_error}")
+            yield "data: [DONE]\n\n"
+@_app.post("/v1/chat/completions")
+async def _create_chat_completion(_request: _CCR):
+    global _request_count, _error_count, _client
+    _request_count += 1
+    _request_id = f"req-{_u.uuid4().hex[:8]}"
+    _lg.info(f"[{_request_id}] Chat completion request: model={_request.model}, stream={_request.stream}")
+    if _client is None:
+        _error_count += 1
+        _lg.error(f"[{_request_id}] Client not initialized")
+        raise _HE(status_code=503, detail="Service temporarily unavailable")
+    try:
+        # Validate model
+        if not _client._validate_model(_request.model):
+            _supported_models = list(_MODELS.keys())
+            raise _HE(status_code=400, detail=f"Model '{_request.model}' not supported. Supported models: {_supported_models}")
+        # Format messages
+        _prompt, _system = _client._format_messages([_msg.model_dump() for _msg in _request.messages])
+        # Handle tools/functions
+        if _request.tools or _request.functions:
+            _tools = _request.tools or [_TD(function=_func) for _func in (_request.functions or [])]
+            _tool_prompt = _client._handle_tools([_tool.model_dump() for _tool in _tools], _request.tool_choice)
+            _prompt += _tool_prompt
+        _lg.info(f"[{_request_id}] Formatted prompt length: {len(_prompt)}")
+        # Stream or complete
+        if _request.stream:
+            _lg.info(f"[{_request_id}] Starting streaming response")
+            return _SR(
+                _generate_stream_response(_request, _prompt, _system, _request_id),
+                media_type="text/plain",
+                headers={
+                    "Cache-Control": "no-cache",
+                    "Connection": "keep-alive",
+                    "Content-Type": "text/event-stream"
+                }
+            )
+        else:
+            # Non-streaming completion
+            _lg.info(f"[{_request_id}] Starting non-streaming completion")
+            _content = _client._complete_chat(_request.model, _prompt, _system, **_request.model_dump())
+            _completion_id = f"chatcmpl-{_u.uuid4().hex}"
+            _created_time = int(_t.time())
+            # Check for tool calls in response
+            _tool_calls = None
+            _finish_reason = "stop"
+            try:
+                if _content.strip().startswith('{"tool_calls"'):
+                    _tool_data = _j.loads(_content.strip())
+                    if "tool_calls" in _tool_data:
+                        _tool_calls = [_TC(**_tc) for _tc in _tool_data["tool_calls"]]
+                        _finish_reason = "tool_calls"
+                        _content = None
+            except:
+                pass
+            _response = _CCRes(
+                id=_completion_id,
+                created=_created_time,
+                model=_request.model,
+                choices=[_CCC(
+                    message=_CM(
+                        role="assistant",
+                        content=_content,
+                        tool_calls=[_tc.model_dump() for _tc in _tool_calls] if _tool_calls else None
+                    ),
+                    finish_reason=_finish_reason
+                )],
+                usage={
+                    "prompt_tokens": len(_prompt.split()),
+                    "completion_tokens": len(_content.split()) if _content else 0,
+                    "total_tokens": len(_prompt.split()) + (len(_content.split()) if _content else 0)
+                }
+            )
+            _lg.info(f"[{_request_id}] Non-streaming completion finished")
+            return _response
+    except _HE:
+        _error_count += 1
+        raise
+    except Exception as _e:
+        _error_count += 1
+        _lg.error(f"[{_request_id}] Unexpected error: {_e}\n{_tb.format_exc()}")
+        raise _HE(status_code=500, detail="Internal server error occurred")
+@_app.post("/chat/completions")
+async def _create_chat_completion_alt(_request: _CCR):
+    return await _create_chat_completion(_request)
+if __name__ == "__main__":
+    try:
+        import uvicorn as _uv
+        _port = int(_o.getenv("PORT", 7860))  # Hugging Face default port
+        _host = _o.getenv("HOST", "0.0.0.0")
+        _lg.info(f"Starting Replicate Multi-Model server on {_host}:{_port}")
+        _lg.info(f"Supported models: {list(_MODELS.keys())[:7]}")  # Show first 7 models
+        _uv.run(
+            _app,
+            host=_host,
+            port=_port,
+            reload=False,
+            log_level="info",
+            access_log=True
+        )
+    except ImportError:
+        _lg.error("uvicorn not installed. Install with: pip install uvicorn")
+    except Exception as _e:
+        _lg.error(f"Failed to start server: {_e}")

requirements.txt ADDED Viewed

	@@ -0,0 +1,6 @@

+fastapi==0.104.1
+uvicorn[standard]==0.24.0
+pydantic==2.5.0
+replicate==0.22.0
+requests==2.31.0
+sseclient-py==1.8.0