Spaces:

sematech
/

sema-chat

Sleeping

App Files Files Community

kamau1 commited on Jun 22, 2025

Commit

639f3bb

1 Parent(s): 0943b9d

Initial Commit

Browse files

Files changed (34) hide show

.DS_Store +0 -0
CONFIGURATION_GUIDE.md +270 -0
Dockerfile +67 -0
HUGGINGFACE_DEPLOYMENT.md +295 -0
README.md +215 -10
app/__init__.py +1 -0
app/api/__init__.py +1 -0
app/api/v1/__init__.py +1 -0
app/api/v1/endpoints.py +363 -0
app/core/__init__.py +1 -0
app/core/config.py +178 -0
app/core/logging.py +85 -0
app/main.py +187 -0
app/models/__init__.py +1 -0
app/models/schemas.py +215 -0
app/services/__init__.py +1 -0
app/services/chat_manager.py +398 -0
app/services/model_backends/__init__.py +1 -0
app/services/model_backends/anthropic_api.py +319 -0
app/services/model_backends/base.py +222 -0
app/services/model_backends/google_api.py +304 -0
app/services/model_backends/hf_api.py +303 -0
app/services/model_backends/local_hf.py +330 -0
app/services/model_backends/minimax_api.py +341 -0
app/services/model_backends/openai_api.py +291 -0
app/services/model_manager.py +382 -0
app/services/session_manager.py +400 -0
app/utils/__init__.py +1 -0
app/utils/helpers.py +309 -0
examples/test_backends.py +292 -0
requirements.txt +28 -0
setup_huggingface.sh +242 -0
tests/__init__.py +1 -0
tests/test_api.py +313 -0

.DS_Store ADDED Viewed

Binary file (6.15 kB). View file

CONFIGURATION_GUIDE.md ADDED Viewed

	@@ -0,0 +1,270 @@

+# 🔧 Sema Chat API Configuration Guide
+## 🎯 **MiniMax Integration**
+### Configuration
+```bash
+MODEL_TYPE=minimax
+MODEL_NAME=MiniMax-M1
+MINIMAX_API_KEY=your_minimax_api_key
+MINIMAX_API_URL=https://api.minimax.chat/v1/text/chatcompletion_v2
+MINIMAX_MODEL_VERSION=abab6.5s-chat
+```
+### Features
+- ✅ **Reasoning Capabilities**: Shows model's thinking process
+- ✅ **Streaming Support**: Real-time response generation
+- ✅ **Custom API Integration**: Direct integration with MiniMax API
+- ✅ **Reasoning Content**: Displays both reasoning and final response
+### Example Usage
+```bash
+curl -X POST "http://localhost:7860/api/v1/chat" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "message": "Solve this math problem: 2x + 5 = 15",
+    "session_id": "minimax-test"
+  }'
+```
+**Response includes reasoning:**
+```json
+{
+  "message": "[Reasoning: I need to solve for x. First, subtract 5 from both sides: 2x = 10. Then divide by 2: x = 5]\n\nTo solve 2x + 5 = 15:\n1. Subtract 5 from both sides: 2x = 10\n2. Divide by 2: x = 5\n\nTherefore, x = 5.",
+  "session_id": "minimax-test",
+  "model_name": "MiniMax-M1"
+}
+```
+---
+## 🔥 **Gemma Integration**
+### Option 1: Local Gemma (Free Tier)
+```bash
+MODEL_TYPE=local
+MODEL_NAME=google/gemma-2b-it
+DEVICE=auto
+```
+### Option 2: Gemma via HuggingFace API
+```bash
+MODEL_TYPE=hf_api
+MODEL_NAME=google/gemma-2b-it
+HF_API_TOKEN=your_hf_token
+```
+### Option 3: Gemma via Google AI Studio
+```bash
+MODEL_TYPE=google
+MODEL_NAME=gemma-2-9b-it
+GOOGLE_API_KEY=your_google_api_key
+```
+### Available Gemma Models
+- **gemma-2-2b-it** (2B parameters, instruction-tuned)
+- **gemma-2-9b-it** (9B parameters, instruction-tuned)
+- **gemma-2-27b-it** (27B parameters, instruction-tuned)
+- **gemini-1.5-flash** (Fast, efficient)
+- **gemini-1.5-pro** (Most capable)
+### Example Usage
+```bash
+curl -X POST "http://localhost:7860/api/v1/chat" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "message": "Explain quantum computing in simple terms",
+    "session_id": "gemma-test",
+    "temperature": 0.7
+  }'
+```
+---
+## 🚀 **Complete Backend Comparison**
+| Backend | Cost | Setup | Streaming | Special Features |
+|---------|------|-------|-----------|------------------|
+| **Local** | Free | Medium | ✅ | Offline, Private |
+| **HF API** | Free/Paid | Easy | ✅ | Many models |
+| **OpenAI** | Paid | Easy | ✅ | High quality |
+| **Anthropic** | Paid | Easy | ✅ | Long context |
+| **MiniMax** | Paid | Easy | ✅ | Reasoning |
+| **Google** | Free/Paid | Easy | ✅ | Multimodal |
+---
+## 🔧 **Configuration Examples**
+### Free Tier Setup (HuggingFace Spaces)
+```bash
+# Best for free deployment
+MODEL_TYPE=local
+MODEL_NAME=TinyLlama/TinyLlama-1.1B-Chat-v1.0
+DEVICE=cpu
+MAX_NEW_TOKENS=256
+TEMPERATURE=0.7
+```
+### Production Setup (API-based)
+```bash
+# Best for production with fallbacks
+MODEL_TYPE=openai
+MODEL_NAME=gpt-3.5-turbo
+OPENAI_API_KEY=your_key
+# Fallback configuration
+FALLBACK_MODEL_TYPE=hf_api
+FALLBACK_MODEL_NAME=microsoft/DialoGPT-medium
+HF_API_TOKEN=your_token
+```
+### Research Setup (Multiple Models)
+```bash
+# Primary: Latest Gemini
+MODEL_TYPE=google
+MODEL_NAME=gemini-1.5-pro
+GOOGLE_API_KEY=your_key
+# For reasoning tasks
+REASONING_MODEL_TYPE=minimax
+REASONING_MODEL_NAME=MiniMax-M1
+MINIMAX_API_KEY=your_key
+```
+---
+## 🎯 **Model Selection Guide**
+### For **Free Deployment** (HuggingFace Spaces):
+1. **TinyLlama/TinyLlama-1.1B-Chat-v1.0** - Smallest, fastest
+2. **microsoft/DialoGPT-medium** - Better conversations
+3. **Qwen/Qwen2.5-0.5B-Instruct** - Good instruction following
+### For **Reasoning Tasks**:
+1. **MiniMax M1** - Shows thinking process
+2. **Claude-3 Opus** - Deep reasoning
+3. **GPT-4** - Complex problem solving
+### For **Conversations**:
+1. **Claude-3 Haiku** - Natural, fast
+2. **GPT-3.5-turbo** - Balanced cost/quality
+3. **Gemini-1.5-flash** - Fast, capable
+### For **Multilingual**:
+1. **Gemma-2-9b-it** - Good multilingual
+2. **GPT-4** - Excellent multilingual
+3. **Local models** - Depends on training
+---
+## 🔄 **Dynamic Model Switching**
+The API supports runtime model switching:
+```python
+# Switch to MiniMax for reasoning
+POST /api/v1/model/switch
+{
+  "model_type": "minimax",
+  "model_name": "MiniMax-M1"
+}
+# Switch back to fast model
+POST /api/v1/model/switch
+{
+  "model_type": "google",
+  "model_name": "gemini-1.5-flash"
+}
+```
+---
+## 🧪 **Testing Your Setup**
+### Test All Backends
+```bash
+python examples/test_backends.py
+```
+### Test Specific Backend
+```bash
+# Test MiniMax
+MINIMAX_API_KEY=your_key python -c "
+import asyncio
+from app.services.model_backends.minimax_api import MiniMaxAPIBackend
+from app.models.schemas import ChatMessage
+async def test():
+    backend = MiniMaxAPIBackend('MiniMax-M1', api_key='your_key', api_url='your_url')
+    await backend.load_model()
+    messages = [ChatMessage(role='user', content='Hello')]
+    response = await backend.generate_response(messages)
+    print(response.message)
+asyncio.run(test())
+"
+```
+### Test Gemma
+```bash
+# Test local Gemma
+MODEL_TYPE=local MODEL_NAME=google/gemma-2b-it python tests/test_api.py
+# Test Gemma via Google AI
+MODEL_TYPE=google MODEL_NAME=gemma-2-9b-it GOOGLE_API_KEY=your_key python tests/test_api.py
+```
+---
+## 🚀 **Deployment Examples**
+### HuggingFace Spaces (Free)
+```yaml
+# In your Space settings
+MODEL_TYPE: local
+MODEL_NAME: TinyLlama/TinyLlama-1.1B-Chat-v1.0
+DEVICE: cpu
+```
+### HuggingFace Spaces (With API)
+```yaml
+# In your Space settings
+MODEL_TYPE: google
+MODEL_NAME: gemma-2-9b-it
+GOOGLE_API_KEY: your_secret_key
+```
+### Docker Deployment
+```bash
+docker run -e MODEL_TYPE=minimax \
+           -e MINIMAX_API_KEY=your_key \
+           -e MINIMAX_API_URL=your_url \
+           -p 8000:7860 \
+           sema-chat-api
+```
+---
+## 💡 **Pro Tips**
+1. **Start Small**: Begin with TinyLlama for testing
+2. **Use APIs for Production**: More reliable than local models
+3. **Enable Streaming**: Better user experience
+4. **Monitor Usage**: Track API costs and limits
+5. **Have Fallbacks**: Configure multiple backends
+6. **Test Thoroughly**: Use the provided test scripts
+---
+## 🔗 **Getting API Keys**
+- **HuggingFace**: https://huggingface.co/settings/tokens
+- **OpenAI**: https://platform.openai.com/api-keys
+- **Anthropic**: https://console.anthropic.com/
+- **Google AI**: https://aistudio.google.com/
+- **MiniMax**: Contact MiniMax for API access
+---
+**Your architecture is now ready for both MiniMax and Gemma! 🎉**

Dockerfile ADDED Viewed

	@@ -0,0 +1,67 @@

+# Sema Chat API Dockerfile
+# Multi-stage build for optimized production image
+FROM python:3.11-slim as builder
+# Set environment variables
+ENV PYTHONDONTWRITEBYTECODE=1 \
+    PYTHONUNBUFFERED=1 \
+    PIP_NO_CACHE_DIR=1 \
+    PIP_DISABLE_PIP_VERSION_CHECK=1
+# Install system dependencies
+RUN apt-get update && apt-get install -y \
+    build-essential \
+    curl \
+    git \
+    && rm -rf /var/lib/apt/lists/*
+# Create and activate virtual environment
+RUN python -m venv /opt/venv
+ENV PATH="/opt/venv/bin:$PATH"
+# Copy requirements and install Python dependencies
+COPY requirements.txt .
+RUN pip install --upgrade pip && \
+    pip install -r requirements.txt
+# Production stage
+FROM python:3.11-slim
+# Set environment variables
+ENV PYTHONDONTWRITEBYTECODE=1 \
+    PYTHONUNBUFFERED=1 \
+    PATH="/opt/venv/bin:$PATH" \
+    PYTHONPATH="/app"
+# Install runtime dependencies
+RUN apt-get update && apt-get install -y \
+    curl \
+    && rm -rf /var/lib/apt/lists/*
+# Copy virtual environment from builder stage
+COPY --from=builder /opt/venv /opt/venv
+# Create app directory and user
+RUN groupadd -r appuser && useradd -r -g appuser appuser
+WORKDIR /app
+# Copy application code
+COPY . .
+# Create necessary directories
+RUN mkdir -p logs && \
+    chown -R appuser:appuser /app
+# Switch to non-root user
+USER appuser
+# Expose port
+EXPOSE 7860
+# Health check
+HEALTHCHECK --interval=30s --timeout=30s --start-period=5s --retries=3 \
+    CMD curl -f http://localhost:7860/health || exit 1
+# Default command
+CMD ["python", "-m", "uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "7860"]

HUGGINGFACE_DEPLOYMENT.md ADDED Viewed

	@@ -0,0 +1,295 @@

+# 🚀 HuggingFace Spaces Deployment Guide
+## 📋 **Quick Setup for Gemma**
+### Step 1: Create Your HuggingFace Space
+1. Go to [HuggingFace Spaces](https://huggingface.co/spaces)
+2. Click **"Create new Space"**
+3. Choose:
+   - **Space name**: `your-username/sema-chat-gemma`
+   - **License**: MIT
+   - **Space SDK**: Docker
+   - **Space hardware**: CPU basic (free) or T4 small (paid)
+### Step 2: Clone and Upload Files
+```bash
+# Clone your new space
+git clone https://huggingface.co/spaces/your-username/sema-chat-gemma
+cd sema-chat-gemma
+# Copy all files from backend/sema-chat/
+cp -r /path/to/sema/backend/sema-chat/* .
+# Add and commit
+git add .
+git commit -m "Initial Sema Chat API with Gemma support"
+git push
+```
+### Step 3: Configure Environment Variables
+In your Space settings, add these environment variables:
+#### **Option A: Local Gemma (Free Tier)**
+```
+MODEL_TYPE=local
+MODEL_NAME=google/gemma-2b-it
+DEVICE=cpu
+TEMPERATURE=0.7
+MAX_NEW_TOKENS=256
+DEBUG=false
+ENVIRONMENT=production
+```
+#### **Option B: Gemma via Google AI Studio (Recommended)**
+```
+MODEL_TYPE=google
+MODEL_NAME=gemma-2-9b-it
+GOOGLE_API_KEY=your_google_api_key_here
+TEMPERATURE=0.7
+MAX_NEW_TOKENS=512
+DEBUG=false
+ENVIRONMENT=production
+```
+#### **Option C: Gemma via HuggingFace API**
+```
+MODEL_TYPE=hf_api
+MODEL_NAME=google/gemma-2b-it
+HF_API_TOKEN=your_hf_token_here
+TEMPERATURE=0.7
+MAX_NEW_TOKENS=512
+DEBUG=false
+ENVIRONMENT=production
+```
+---
+## 🔑 **Getting API Keys**
+### Google AI Studio API Key (Recommended)
+1. Go to [Google AI Studio](https://aistudio.google.com/)
+2. Sign in with your Google account
+3. Click **"Get API Key"**
+4. Create a new API key
+5. Copy the key and add it to your Space settings
+### HuggingFace API Token (Alternative)
+1. Go to [HuggingFace Settings](https://huggingface.co/settings/tokens)
+2. Click **"New token"**
+3. Choose **"Read"** access
+4. Copy the token and add it to your Space settings
+---
+## 📁 **Required Files Structure**
+Make sure your Space has these files:
+```
+your-space/
+├── app/                    # Main application code
+├── requirements.txt        # Python dependencies
+├── Dockerfile             # Container configuration
+├── README.md              # Space documentation
+└── .gitignore             # Git ignore file
+```
+---
+## 🐳 **Dockerfile Configuration**
+Your Dockerfile should be:
+```dockerfile
+FROM python:3.11-slim
+# Set environment variables
+ENV PYTHONDONTWRITEBYTECODE=1 \
+    PYTHONUNBUFFERED=1 \
+    PYTHONPATH="/app"
+# Install system dependencies
+RUN apt-get update && apt-get install -y \
+    curl \
+    && rm -rf /var/lib/apt/lists/*
+# Set working directory
+WORKDIR /app
+# Copy requirements and install dependencies
+COPY requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+# Copy application code
+COPY . .
+# Create non-root user
+RUN useradd -m -u 1000 user
+USER user
+# Expose port 7860 (HuggingFace Spaces standard)
+EXPOSE 7860
+# Health check
+HEALTHCHECK --interval=30s --timeout=30s --start-period=5s --retries=3 \
+    CMD curl -f http://localhost:7860/health || exit 1
+# Start the application
+CMD ["python", "-m", "uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "7860"]
+```
+---
+## 🎯 **Recommended Configuration for First Version**
+For your first deployment, I recommend **Google AI Studio** with Gemma:
+### Environment Variables:
+```
+MODEL_TYPE=google
+MODEL_NAME=gemma-2-9b-it
+GOOGLE_API_KEY=your_api_key_here
+TEMPERATURE=0.7
+MAX_NEW_TOKENS=512
+DEBUG=false
+ENVIRONMENT=production
+ENABLE_STREAMING=true
+RATE_LIMIT=30
+SESSION_TIMEOUT=30
+```
+### Why This Setup?
+- ✅ **Fast deployment** - No model download needed
+- ✅ **Reliable** - Google's infrastructure
+- ✅ **Cost-effective** - Free tier available
+- ✅ **Good performance** - Gemma 2 9B is capable
+- ✅ **Streaming support** - Real-time responses
+---
+## 🧪 **Testing Your Deployment**
+### 1. Check Health
+```bash
+curl https://your-username-sema-chat-gemma.hf.space/health
+```
+### 2. Test Chat
+```bash
+curl -X POST "https://your-username-sema-chat-gemma.hf.space/api/v1/chat" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "message": "Hello! Can you introduce yourself?",
+    "session_id": "test-session"
+  }'
+```
+### 3. Test Streaming
+```bash
+curl -N -H "Accept: text/event-stream" \
+  "https://your-username-sema-chat-gemma.hf.space/api/v1/chat/stream?message=Tell%20me%20about%20AI&session_id=test"
+```
+### 4. Access Swagger UI
+Visit: `https://your-username-sema-chat-gemma.hf.space/`
+---
+## 🔧 **Troubleshooting**
+### Common Issues:
+#### 1. **Space Won't Start**
+- Check logs in Space settings
+- Verify all required files are present
+- Check Dockerfile syntax
+#### 2. **Model Loading Fails**
+- Verify API key is correct
+- Check model name spelling
+- Try a smaller model first
+#### 3. **API Errors**
+- Check environment variables
+- Verify network connectivity
+- Review application logs
+#### 4. **Slow Responses**
+- Use smaller model (gemma-2-2b-it)
+- Reduce MAX_NEW_TOKENS
+- Enable streaming for better UX
+### Debug Commands:
+```bash
+# Check environment variables
+curl https://your-space.hf.space/api/v1/model/info
+# Check detailed health
+curl https://your-space.hf.space/api/v1/health
+# View logs in Space settings
+```
+---
+## 🚀 **Step-by-Step Deployment**
+### 1. Prepare Your Space
+```bash
+# Create and clone your space
+git clone https://huggingface.co/spaces/your-username/sema-chat-gemma
+cd sema-chat-gemma
+# Copy files
+cp -r ../sema/backend/sema-chat/* .
+```
+### 2. Set Environment Variables
+Go to your Space settings and add:
+```
+MODEL_TYPE=google
+MODEL_NAME=gemma-2-9b-it
+GOOGLE_API_KEY=your_key_here
+```
+### 3. Deploy
+```bash
+git add .
+git commit -m "Deploy Sema Chat with Gemma"
+git push
+```
+### 4. Wait for Build
+- Space will automatically build (5-10 minutes)
+- Check build logs for any errors
+- Once running, test the endpoints
+### 5. Share Your Space
+Your API will be available at:
+`https://your-username-sema-chat-gemma.hf.space/`
+---
+## 💡 **Pro Tips**
+1. **Start with Google AI Studio** - Easiest setup
+2. **Use environment variables** - Never hardcode API keys
+3. **Enable streaming** - Better user experience
+4. **Monitor usage** - Check API quotas
+5. **Test thoroughly** - Use the provided test scripts
+6. **Document your API** - Swagger UI is auto-generated
+---
+## 🎉 **You're Ready!**
+With this setup, you'll have a production-ready chatbot API with:
+- ✅ Gemma 2 9B model via Google AI Studio
+- ✅ Streaming responses
+- ✅ Session management
+- ✅ Rate limiting
+- ✅ Health monitoring
+- ✅ Interactive Swagger UI
+**Your Space URL will be:**
+`https://your-username-sema-chat-gemma.hf.space/`
+Happy deploying! 🚀

README.md CHANGED Viewed

@@ -1,12 +1,217 @@
----
-title: Sema Chat
-emoji: 👀
-colorFrom: green
-colorTo: red
-sdk: docker
-pinned: false
-license: mit
-short_description: Chat Service for sema ai
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# Sema Chat API 💬
+Modern chatbot API with streaming capabilities, flexible model backends, and production-ready features. Built with FastAPI and designed for rapid GenAI advancements.
+## 🚀 Quick Start with Gemma
+### Option 1: Automated HuggingFace Spaces Deployment
+```bash
+cd backend/sema-chat
+./setup_huggingface.sh
+```
+### Option 2: Manual Local Setup
+```bash
+cd backend/sema-chat
+pip install -r requirements.txt
+# Copy and configure environment
+cp .env.example .env
+# For Gemma via Google AI Studio (Recommended)
+# Edit .env:
+MODEL_TYPE=google
+MODEL_NAME=gemma-2-9b-it
+GOOGLE_API_KEY=your_google_api_key
+# Run the API
+uvicorn app.main:app --reload --host 0.0.0.0 --port 7860
+```
+### Option 3: Local Gemma (Free, No API Key)
+```bash
+# Edit .env:
+MODEL_TYPE=local
+MODEL_NAME=google/gemma-2b-it
+DEVICE=cpu
+# Run (will download model on first run)
+uvicorn app.main:app --reload --host 0.0.0.0 --port 7860
+```
+## 🌐 Access Your API
+Once running, access:
+- **Swagger UI**: http://localhost:7860/
+- **Health Check**: http://localhost:7860/api/v1/health
+- **Chat Endpoint**: http://localhost:7860/api/v1/chat
+## 🧪 Quick Test
+```bash
+# Test chat
+curl -X POST "http://localhost:7860/api/v1/chat" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "message": "Hello! Can you introduce yourself?",
+    "session_id": "test-session"
+  }'
+# Test streaming
+curl -N -H "Accept: text/event-stream" \
+  "http://localhost:7860/api/v1/chat/stream?message=Tell%20me%20about%20AI&session_id=test"
+```
+## 🎯 Features
+### Core Capabilities
+- ✅ **Real-time Streaming**: Server-Sent Events and WebSocket support
+- ✅ **Multiple Model Backends**: Local, HuggingFace API, OpenAI, Anthropic, Google AI, MiniMax
+- ✅ **Session Management**: Persistent conversation contexts
+- ✅ **Rate Limiting**: Built-in protection with configurable limits
+- ✅ **Health Monitoring**: Comprehensive health checks and metrics
+### Supported Models
+- **Local**: TinyLlama, DialoGPT, Gemma, Qwen
+- **Google AI**: Gemma-2-9b-it, Gemini-1.5-flash, Gemini-1.5-pro
+- **OpenAI**: GPT-3.5-turbo, GPT-4, GPT-4-turbo
+- **Anthropic**: Claude-3-haiku, Claude-3-sonnet, Claude-3-opus
+- **HuggingFace API**: Any model via Inference API
+- **MiniMax**: M1 model with reasoning capabilities
+## 🔧 Configuration
+### Environment Variables
+```bash
+# Model Backend (local, google, openai, anthropic, hf_api, minimax)
+MODEL_TYPE=google
+MODEL_NAME=gemma-2-9b-it
+# API Keys (as needed)
+GOOGLE_API_KEY=your_key
+OPENAI_API_KEY=your_key
+ANTHROPIC_API_KEY=your_key
+HF_API_TOKEN=your_token
+MINIMAX_API_KEY=your_key
+# Generation Settings
+TEMPERATURE=0.7
+MAX_NEW_TOKENS=512
+TOP_P=0.9
+# Server Settings
+HOST=0.0.0.0
+PORT=7860
+DEBUG=false
+```
+## 📚 Documentation
+- **[Configuration Guide](CONFIGURATION_GUIDE.md)** - Detailed setup for all backends
+- **[HuggingFace Deployment](HUGGINGFACE_DEPLOYMENT.md)** - Step-by-step deployment guide
+- **[API Documentation](http://localhost:7860/)** - Interactive Swagger UI
+## 🧪 Testing
+```bash
+# Run comprehensive tests
+python tests/test_api.py
+# Test different backends
+python examples/test_backends.py
+# Test specific backend
+python examples/test_backends.py --backend google
+```
+## 🚀 Deployment
+### HuggingFace Spaces (Recommended)
+1. Run the setup script: `./setup_huggingface.sh`
+2. Create your Space on HuggingFace
+3. Push the generated code
+4. Set environment variables in Space settings
+5. Your API will be live at: `https://username-spacename.hf.space/`
+### Docker
+```bash
+docker build -t sema-chat-api .
+docker run -e MODEL_TYPE=google \
+           -e GOOGLE_API_KEY=your_key \
+           -p 7860:7860 \
+           sema-chat-api
+```
+## 🔗 API Endpoints
+### Chat
+- **`POST /api/v1/chat`** - Send chat message
+- **`GET /api/v1/chat/stream`** - Streaming chat (SSE)
+- **`WebSocket /api/v1/chat/ws`** - Real-time WebSocket chat
+### Sessions
+- **`GET /api/v1/sessions/{id}`** - Get conversation history
+- **`DELETE /api/v1/sessions/{id}`** - Clear conversation
+- **`GET /api/v1/sessions`** - List active sessions
+### System
+- **`GET /api/v1/health`** - Comprehensive health check
+- **`GET /api/v1/model/info`** - Current model information
+- **`GET /api/v1/status`** - Basic status
+## 💡 Why This Architecture?
+1. **Future-Proof**: Modular design adapts to rapid GenAI advancements
+2. **Flexible**: Switch between local models and APIs with environment variables
+3. **Production-Ready**: Rate limiting, monitoring, error handling built-in
+4. **Cost-Effective**: Start free with local models, scale with APIs
+5. **Developer-Friendly**: Comprehensive docs, tests, and examples
+## 🛠️ Development
+### Project Structure
+```
+app/
+├── main.py                     # FastAPI application
+├── api/v1/endpoints.py         # API routes
+├── core/
+│   ├── config.py              # Environment-based configuration
+│   └── logging.py             # Structured logging
+├── models/schemas.py           # Pydantic request/response models
+├── services/
+│   ├── chat_manager.py        # Chat orchestration
+│   ├── model_manager.py       # Backend selection
+│   ├── session_manager.py     # Conversation management
+│   └── model_backends/        # Model implementations
+└── utils/helpers.py           # Utility functions
+```
+### Adding New Backends
+1. Create new backend in `app/services/model_backends/`
+2. Inherit from `ModelBackend` base class
+3. Implement required methods
+4. Add to `ModelManager._create_backend()`
+5. Update configuration and documentation
+## 🤝 Contributing
+1. Fork the repository
+2. Create a feature branch
+3. Add tests for new functionality
+4. Ensure all tests pass
+5. Submit a pull request
+## 📄 License
+MIT License - see LICENSE file for details.
+## 🙏 Acknowledgments
+- **HuggingFace** for model hosting and Spaces platform
+- **Google** for Gemma models and AI Studio
+- **FastAPI** for the excellent web framework
+- **OpenAI, Anthropic, MiniMax** for their APIs
 ---
+**Ready to chat? Deploy your Sema Chat API today! 🚀💬**

app/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+ # Sema Chat API Package

app/api/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+ # API package

app/api/v1/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+ # API v1 package

app/api/v1/endpoints.py ADDED Viewed

	@@ -0,0 +1,363 @@

+"""
+API v1 endpoints for Sema Chat API
+"""
+import asyncio
+import time
+import uuid
+from typing import List, Optional, Dict, Any
+from datetime import datetime
+from fastapi import APIRouter, HTTPException, Request, Query, WebSocket, WebSocketDisconnect
+from fastapi.responses import StreamingResponse
+from sse_starlette.sse import EventSourceResponse
+from slowapi import Limiter
+from slowapi.util import get_remote_address
+from prometheus_client import generate_latest, CONTENT_TYPE_LATEST
+from fastapi.responses import Response
+from ...models.schemas import (
+    ChatRequest, ChatResponse, StreamChunk, ConversationHistory,
+    HealthResponse, ErrorResponse, ModelInfo, SessionInfo
+)
+from ...services.chat_manager import get_chat_manager
+from ...services.model_manager import get_model_manager
+from ...services.session_manager import get_session_manager
+from ...services.model_backends.base import ModelBackendError, ModelNotLoadedError, GenerationError
+from ...core.config import settings
+from ...core.logging import get_logger
+# Initialize router and rate limiter
+router = APIRouter()
+limiter = Limiter(key_func=get_remote_address)
+logger = get_logger()
+# WebSocket connection manager
+class ConnectionManager:
+    def __init__(self):
+        self.active_connections: List[WebSocket] = []
+    async def connect(self, websocket: WebSocket):
+        await websocket.accept()
+        self.active_connections.append(websocket)
+    def disconnect(self, websocket: WebSocket):
+        self.active_connections.remove(websocket)
+    async def send_personal_message(self, message: str, websocket: WebSocket):
+        await websocket.send_text(message)
+manager = ConnectionManager()
+@router.post("/chat", response_model=ChatResponse)
+@limiter.limit(f"{settings.rate_limit}/minute")
+async def chat(request: ChatRequest, req: Request):
+    """
+    Send a chat message and get a complete response
+    """
+    start_time = time.time()
+    try:
+        chat_manager = await get_chat_manager()
+        response = await chat_manager.process_chat_request(request)
+        # Add timing information
+        total_time = time.time() - start_time
+        response.generation_time = getattr(response, 'generation_time', total_time)
+        logger.info("chat_request_completed",
+                   session_id=request.session_id,
+                   message_length=len(request.message),
+                   response_length=len(response.message),
+                   total_time=total_time)
+        return response
+    except ModelNotLoadedError as e:
+        logger.error("model_not_loaded", error=str(e), session_id=request.session_id)
+        raise HTTPException(status_code=503, detail="Model not available")
+    except GenerationError as e:
+        logger.error("generation_error", error=str(e), session_id=request.session_id)
+        raise HTTPException(status_code=500, detail="Failed to generate response")
+    except Exception as e:
+        logger.error("chat_request_failed", error=str(e), session_id=request.session_id)
+        raise HTTPException(status_code=500, detail="Internal server error")
+@router.get("/chat/stream")
+@limiter.limit(f"{settings.rate_limit}/minute")
+async def chat_stream(
+    message: str = Query(..., description="Chat message"),
+    session_id: str = Query(..., description="Session ID"),
+    system_prompt: Optional[str] = Query(None, description="Custom system prompt"),
+    temperature: Optional[float] = Query(None, ge=0.0, le=1.0, description="Temperature"),
+    max_tokens: Optional[int] = Query(None, ge=1, le=2048, description="Max tokens"),
+    req: Request = None
+):
+    """
+    Send a chat message and get a streaming response via Server-Sent Events
+    """
+    try:
+        # Create chat request
+        chat_request = ChatRequest(
+            message=message,
+            session_id=session_id,
+            system_prompt=system_prompt,
+            temperature=temperature,
+            max_tokens=max_tokens,
+            stream=True
+        )
+        chat_manager = await get_chat_manager()
+        async def event_generator():
+            try:
+                async for chunk in chat_manager.process_streaming_chat_request(chat_request):
+                    # Format as SSE event
+                    chunk_data = {
+                        "content": chunk.content,
+                        "session_id": chunk.session_id,
+                        "message_id": chunk.message_id,
+                        "chunk_id": chunk.chunk_id,
+                        "is_final": chunk.is_final,
+                        "timestamp": chunk.timestamp.isoformat()
+                    }
+                    yield {
+                        "event": "chunk",
+                        "data": chunk_data
+                    }
+                    if chunk.is_final:
+                        yield {
+                            "event": "done",
+                            "data": {"message": "Stream completed"}
+                        }
+                        break
+            except Exception as e:
+                logger.error("streaming_error", error=str(e), session_id=session_id)
+                yield {
+                    "event": "error",
+                    "data": {"error": str(e)}
+                }
+        return EventSourceResponse(event_generator())
+    except Exception as e:
+        logger.error("stream_setup_failed", error=str(e), session_id=session_id)
+        raise HTTPException(status_code=500, detail="Failed to setup stream")
+@router.websocket("/chat/ws")
+async def websocket_chat(websocket: WebSocket):
+    """
+    WebSocket endpoint for real-time chat
+    """
+    await manager.connect(websocket)
+    session_id = None
+    try:
+        while True:
+            # Receive message from client
+            data = await websocket.receive_json()
+            # Extract request data
+            message = data.get("message")
+            session_id = data.get("session_id")
+            system_prompt = data.get("system_prompt")
+            temperature = data.get("temperature")
+            max_tokens = data.get("max_tokens")
+            if not message or not session_id:
+                await websocket.send_json({
+                    "error": "Message and session_id are required"
+                })
+                continue
+            # Create chat request
+            chat_request = ChatRequest(
+                message=message,
+                session_id=session_id,
+                system_prompt=system_prompt,
+                temperature=temperature,
+                max_tokens=max_tokens,
+                stream=True
+            )
+            # Process streaming request
+            chat_manager = await get_chat_manager()
+            try:
+                async for chunk in chat_manager.process_streaming_chat_request(chat_request):
+                    await websocket.send_json({
+                        "type": "chunk",
+                        "content": chunk.content,
+                        "session_id": chunk.session_id,
+                        "message_id": chunk.message_id,
+                        "chunk_id": chunk.chunk_id,
+                        "is_final": chunk.is_final,
+                        "timestamp": chunk.timestamp.isoformat()
+                    })
+                    if chunk.is_final:
+                        break
+            except Exception as e:
+                logger.error("websocket_generation_error", error=str(e), session_id=session_id)
+                await websocket.send_json({
+                    "type": "error",
+                    "error": str(e)
+                })
+    except WebSocketDisconnect:
+        manager.disconnect(websocket)
+        logger.info("websocket_disconnected", session_id=session_id)
+    except Exception as e:
+        logger.error("websocket_error", error=str(e), session_id=session_id)
+        manager.disconnect(websocket)
+@router.get("/sessions/{session_id}", response_model=ConversationHistory)
+async def get_session(session_id: str):
+    """
+    Get conversation history for a session
+    """
+    try:
+        chat_manager = await get_chat_manager()
+        history = await chat_manager.get_conversation_history(session_id)
+        if not history:
+            raise HTTPException(status_code=404, detail="Session not found")
+        return history
+    except HTTPException:
+        raise
+    except Exception as e:
+        logger.error("get_session_failed", error=str(e), session_id=session_id)
+        raise HTTPException(status_code=500, detail="Failed to get session")
+@router.delete("/sessions/{session_id}")
+async def clear_session(session_id: str):
+    """
+    Clear conversation history for a session
+    """
+    try:
+        chat_manager = await get_chat_manager()
+        success = await chat_manager.clear_conversation(session_id)
+        if not success:
+            raise HTTPException(status_code=404, detail="Session not found")
+        return {"message": "Session cleared successfully"}
+    except HTTPException:
+        raise
+    except Exception as e:
+        logger.error("clear_session_failed", error=str(e), session_id=session_id)
+        raise HTTPException(status_code=500, detail="Failed to clear session")
+@router.get("/sessions", response_model=List[Dict[str, Any]])
+async def get_active_sessions():
+    """
+    Get list of active chat sessions
+    """
+    try:
+        chat_manager = await get_chat_manager()
+        sessions = await chat_manager.get_active_sessions()
+        return sessions
+    except Exception as e:
+        logger.error("get_active_sessions_failed", error=str(e))
+        raise HTTPException(status_code=500, detail="Failed to get active sessions")
+@router.get("/model/info", response_model=ModelInfo)
+async def get_model_info():
+    """
+    Get information about the current model
+    """
+    try:
+        model_manager = await get_model_manager()
+        info = model_manager.get_model_info()
+        return ModelInfo(
+            name=info["name"],
+            type=info["type"],
+            loaded=info["loaded"],
+            parameters=info.get("parameters"),
+            capabilities=info.get("capabilities", [])
+        )
+    except Exception as e:
+        logger.error("get_model_info_failed", error=str(e))
+        raise HTTPException(status_code=500, detail="Failed to get model info")
+@router.get("/health", response_model=HealthResponse)
+async def health_check():
+    """
+    Comprehensive health check endpoint
+    """
+    try:
+        chat_manager = await get_chat_manager()
+        health_data = await chat_manager.health_check()
+        # Extract key information
+        overall_status = health_data.get("overall", {})
+        model_info = health_data.get("model_manager", {})
+        session_info = health_data.get("session_manager", {})
+        return HealthResponse(
+            status=overall_status.get("status", "unknown"),
+            version=settings.app_version,
+            model_type=settings.model_type,
+            model_name=settings.model_name,
+            model_loaded=model_info.get("status") == "healthy",
+            uptime=time.time(),  # Simplified uptime
+            active_sessions=session_info.get("active_sessions", 0),
+            timestamp=datetime.utcnow()
+        )
+    except Exception as e:
+        logger.error("health_check_failed", error=str(e))
+        return HealthResponse(
+            status="unhealthy",
+            version=settings.app_version,
+            model_type=settings.model_type,
+            model_name=settings.model_name,
+            model_loaded=False,
+            uptime=time.time(),
+            active_sessions=0,
+            timestamp=datetime.utcnow()
+        )
+@router.get("/status")
+async def status():
+    """
+    Simple status endpoint
+    """
+    return {
+        "status": "ok",
+        "service": "sema-chat-api",
+        "version": settings.app_version,
+        "timestamp": datetime.utcnow().isoformat()
+    }
+@router.get("/metrics")
+async def metrics():
+    """
+    Prometheus metrics endpoint
+    """
+    if not settings.enable_metrics:
+        raise HTTPException(status_code=404, detail="Metrics not enabled")
+    return Response(generate_latest(), media_type=CONTENT_TYPE_LATEST)

app/core/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+ # Core configuration and utilities

app/core/config.py ADDED Viewed

	@@ -0,0 +1,178 @@

+"""
+Configuration management for Sema Chat API
+Environment-driven settings for flexible model backends
+"""
+import os
+from typing import List, Optional
+from pydantic import BaseSettings, Field
+from dotenv import load_dotenv
+# Load environment variables from .env file
+load_dotenv()
+class Settings(BaseSettings):
+    """Application settings with environment variable support"""
+    # =============================================================================
+    # APPLICATION SETTINGS
+    # =============================================================================
+    app_name: str = Field(default="Sema Chat API", env="APP_NAME")
+    app_version: str = Field(default="1.0.0", env="APP_VERSION")
+    environment: str = Field(default="development", env="ENVIRONMENT")
+    debug: bool = Field(default=True, env="DEBUG")
+    # =============================================================================
+    # SERVER SETTINGS
+    # =============================================================================
+    host: str = Field(default="0.0.0.0", env="HOST")
+    port: int = Field(default=7860, env="PORT")
+    cors_origins: List[str] = Field(default=["*"], env="CORS_ORIGINS")
+    # =============================================================================
+    # MODEL CONFIGURATION
+    # =============================================================================
+    model_type: str = Field(default="local", env="MODEL_TYPE")
+    model_name: str = Field(default="TinyLlama/TinyLlama-1.1B-Chat-v1.0", env="MODEL_NAME")
+    # Local model settings
+    device: str = Field(default="auto", env="DEVICE")
+    max_length: int = Field(default=2048, env="MAX_LENGTH")
+    temperature: float = Field(default=0.7, env="TEMPERATURE")
+    top_p: float = Field(default=0.9, env="TOP_P")
+    top_k: int = Field(default=50, env="TOP_K")
+    max_new_tokens: int = Field(default=512, env="MAX_NEW_TOKENS")
+    # =============================================================================
+    # API KEYS AND TOKENS
+    # =============================================================================
+    # HuggingFace
+    hf_api_token: Optional[str] = Field(default=None, env="HF_API_TOKEN")
+    hf_inference_url: str = Field(
+        default="https://api-inference.huggingface.co/models/",
+        env="HF_INFERENCE_URL"
+    )
+    # OpenAI
+    openai_api_key: Optional[str] = Field(default=None, env="OPENAI_API_KEY")
+    openai_org_id: Optional[str] = Field(default=None, env="OPENAI_ORG_ID")
+    # Anthropic
+    anthropic_api_key: Optional[str] = Field(default=None, env="ANTHROPIC_API_KEY")
+    # MiniMax
+    minimax_api_key: Optional[str] = Field(default=None, env="MINIMAX_API_KEY")
+    minimax_api_url: Optional[str] = Field(default=None, env="MINIMAX_API_URL")
+    minimax_model_version: Optional[str] = Field(default=None, env="MINIMAX_MODEL_VERSION")
+    # Google AI Studio
+    google_api_key: Optional[str] = Field(default=None, env="GOOGLE_API_KEY")
+    # =============================================================================
+    # RATE LIMITING AND PERFORMANCE
+    # =============================================================================
+    rate_limit: int = Field(default=60, env="RATE_LIMIT")  # requests per minute
+    max_concurrent_streams: int = Field(default=10, env="MAX_CONCURRENT_STREAMS")
+    stream_delay: float = Field(default=0.01, env="STREAM_DELAY")
+    # =============================================================================
+    # SESSION MANAGEMENT
+    # =============================================================================
+    session_timeout: int = Field(default=30, env="SESSION_TIMEOUT")  # minutes
+    max_sessions_per_user: int = Field(default=5, env="MAX_SESSIONS_PER_USER")
+    max_messages_per_session: int = Field(default=100, env="MAX_MESSAGES_PER_SESSION")
+    # =============================================================================
+    # STREAMING SETTINGS
+    # =============================================================================
+    enable_streaming: bool = Field(default=True, env="ENABLE_STREAMING")
+    # =============================================================================
+    # LOGGING AND MONITORING
+    # =============================================================================
+    log_level: str = Field(default="INFO", env="LOG_LEVEL")
+    structured_logging: bool = Field(default=True, env="STRUCTURED_LOGGING")
+    log_file: Optional[str] = Field(default=None, env="LOG_FILE")
+    enable_metrics: bool = Field(default=True, env="ENABLE_METRICS")
+    metrics_path: str = Field(default="/metrics", env="METRICS_PATH")
+    # =============================================================================
+    # EXTERNAL SERVICES
+    # =============================================================================
+    redis_url: Optional[str] = Field(default=None, env="REDIS_URL")
+    # =============================================================================
+    # SECURITY
+    # =============================================================================
+    api_key: Optional[str] = Field(default=None, env="API_KEY")
+    jwt_secret: Optional[str] = Field(default=None, env="JWT_SECRET")
+    # =============================================================================
+    # SYSTEM PROMPTS
+    # =============================================================================
+    system_prompt: str = Field(
+        default="You are a helpful, harmless, and honest AI assistant. Respond in a friendly and professional manner.",
+        env="SYSTEM_PROMPT"
+    )
+    system_prompt_chat: Optional[str] = Field(default=None, env="SYSTEM_PROMPT_CHAT")
+    system_prompt_code: Optional[str] = Field(default=None, env="SYSTEM_PROMPT_CODE")
+    system_prompt_creative: Optional[str] = Field(default=None, env="SYSTEM_PROMPT_CREATIVE")
+    class Config:
+        env_file = ".env"
+        case_sensitive = False
+    def get_system_prompt(self, prompt_type: str = "default") -> str:
+        """Get system prompt based on type"""
+        if prompt_type == "chat" and self.system_prompt_chat:
+            return self.system_prompt_chat
+        elif prompt_type == "code" and self.system_prompt_code:
+            return self.system_prompt_code
+        elif prompt_type == "creative" and self.system_prompt_creative:
+            return self.system_prompt_creative
+        return self.system_prompt
+    def is_local_model(self) -> bool:
+        """Check if using local model backend"""
+        return self.model_type.lower() == "local"
+    def is_api_model(self) -> bool:
+        """Check if using API-based model backend"""
+        return self.model_type.lower() in ["hf_api", "openai", "anthropic"]
+    def validate_model_config(self) -> bool:
+        """Validate model configuration based on type"""
+        if self.model_type == "hf_api" and not self.hf_api_token:
+            return False
+        elif self.model_type == "openai" and not self.openai_api_key:
+            return False
+        elif self.model_type == "anthropic" and not self.anthropic_api_key:
+            return False
+        elif self.model_type == "minimax" and (not self.minimax_api_key or not self.minimax_api_url):
+            return False
+        elif self.model_type == "google" and not self.google_api_key:
+            return False
+        return True
+# Global settings instance
+settings = Settings()
+def get_settings() -> Settings:
+    """Get application settings"""
+    return settings

app/core/logging.py ADDED Viewed

	@@ -0,0 +1,85 @@

+"""
+Structured logging configuration for Sema Chat API
+"""
+import logging
+import sys
+from typing import Any, Dict
+import structlog
+from .config import settings
+def configure_logging():
+    """Configure structured logging for the application"""
+    # Configure structlog
+    structlog.configure(
+        processors=[
+            structlog.stdlib.filter_by_level,
+            structlog.stdlib.add_logger_name,
+            structlog.stdlib.add_log_level,
+            structlog.stdlib.PositionalArgumentsFormatter(),
+            structlog.processors.TimeStamper(fmt="iso"),
+            structlog.processors.StackInfoRenderer(),
+            structlog.processors.format_exc_info,
+            structlog.processors.UnicodeDecoder(),
+            structlog.processors.JSONRenderer() if settings.structured_logging else structlog.dev.ConsoleRenderer(),
+        ],
+        context_class=dict,
+        logger_factory=structlog.stdlib.LoggerFactory(),
+        wrapper_class=structlog.stdlib.BoundLogger,
+        cache_logger_on_first_use=True,
+    )
+    # Configure standard logging
+    logging.basicConfig(
+        format="%(message)s",
+        stream=sys.stdout,
+        level=getattr(logging, settings.log_level.upper()),
+    )
+    # Configure file logging if specified
+    if settings.log_file:
+        file_handler = logging.FileHandler(settings.log_file)
+        file_handler.setLevel(getattr(logging, settings.log_level.upper()))
+        if settings.structured_logging:
+            file_handler.setFormatter(logging.Formatter('%(message)s'))
+        else:
+            file_handler.setFormatter(
+                logging.Formatter(
+                    '%(asctime)s - %(name)s - %(levelname)s - %(message)s'
+                )
+            )
+        logging.getLogger().addHandler(file_handler)
+def get_logger(name: str = None) -> structlog.BoundLogger:
+    """Get a structured logger instance"""
+    return structlog.get_logger(name)
+class LoggerMixin:
+    """Mixin class to add logging capabilities to any class"""
+    @property
+    def logger(self) -> structlog.BoundLogger:
+        """Get logger for this class"""
+        return get_logger(self.__class__.__name__)
+    def log_info(self, message: str, **kwargs: Any):
+        """Log info message with context"""
+        self.logger.info(message, **kwargs)
+    def log_error(self, message: str, **kwargs: Any):
+        """Log error message with context"""
+        self.logger.error(message, **kwargs)
+    def log_warning(self, message: str, **kwargs: Any):
+        """Log warning message with context"""
+        self.logger.warning(message, **kwargs)
+    def log_debug(self, message: str, **kwargs: Any):
+        """Log debug message with context"""
+        self.logger.debug(message, **kwargs)

app/main.py ADDED Viewed

	@@ -0,0 +1,187 @@

+"""
+Sema Chat API - Main Application
+Modern chatbot API with streaming capabilities and flexible model backends
+"""
+from fastapi import FastAPI, Request
+from fastapi.middleware.cors import CORSMiddleware
+from fastapi.middleware.trustedhost import TrustedHostMiddleware
+from fastapi.responses import RedirectResponse
+from slowapi import _rate_limit_exceeded_handler
+from slowapi.errors import RateLimitExceeded
+import time
+from .core.config import settings
+from .core.logging import configure_logging, get_logger
+from .services.chat_manager import initialize_chat_manager, shutdown_chat_manager
+from .api.v1.endpoints import router as v1_router, limiter
+# Configure logging
+configure_logging()
+logger = get_logger()
+# Application startup time
+startup_time = time.time()
+def create_application() -> FastAPI:
+    """Create and configure the FastAPI application"""
+    # Create FastAPI app
+    app = FastAPI(
+        title=settings.app_name,
+        description="Modern chatbot API with streaming capabilities and flexible model backends",
+        version=settings.app_version,
+        docs_url="/docs" if settings.debug else "/",  # Swagger UI at root for HF Spaces
+        redoc_url="/redoc",
+        openapi_url="/openapi.json"
+    )
+    # Add rate limiting
+    app.state.limiter = limiter
+    app.add_exception_handler(RateLimitExceeded, _rate_limit_exceeded_handler)
+    # Add CORS middleware
+    app.add_middleware(
+        CORSMiddleware,
+        allow_origins=settings.cors_origins,
+        allow_credentials=True,
+        allow_methods=["*"],
+        allow_headers=["*"],
+    )
+    # Add trusted host middleware for production
+    if settings.environment == "production":
+        app.add_middleware(
+            TrustedHostMiddleware,
+            allowed_hosts=["*"]  # Configure appropriately for production
+        )
+    # Add request timing middleware
+    @app.middleware("http")
+    async def add_process_time_header(request: Request, call_next):
+        start_time = time.time()
+        response = await call_next(request)
+        process_time = time.time() - start_time
+        response.headers["X-Process-Time"] = str(process_time)
+        response.headers["X-Request-ID"] = str(id(request))
+        return response
+    # Include API routes
+    app.include_router(v1_router, prefix="/api/v1", tags=["Chat API v1"])
+    # Root redirect for HuggingFace Spaces
+    @app.get("/", include_in_schema=False)
+    async def root():
+        """Redirect root to docs for HuggingFace Spaces"""
+        return RedirectResponse(url="/docs")
+    return app
+# Create the application instance
+app = create_application()
+@app.on_event("startup")
+async def startup_event():
+    """Initialize the application on startup"""
+    logger.info("application_startup",
+               version=settings.app_version,
+               environment=settings.environment,
+               model_type=settings.model_type,
+               model_name=settings.model_name)
+    print(f"\n🚀 Starting {settings.app_name} v{settings.app_version}")
+    print(f"📊 Environment: {settings.environment}")
+    print(f"🤖 Model Backend: {settings.model_type}")
+    print(f"🎯 Model: {settings.model_name}")
+    print("🔄 Initializing chat services...")
+    try:
+        # Initialize chat manager (which initializes model and session managers)
+        success = await initialize_chat_manager()
+        if success:
+            logger.info("chat_services_initialized")
+            print("✅ Chat services initialized successfully")
+            print(f"🌐 API Documentation: http://localhost:7860/docs")
+            print(f"📡 WebSocket Chat: ws://localhost:7860/api/v1/chat/ws")
+            print(f"🔄 Streaming Chat: http://localhost:7860/api/v1/chat/stream")
+            print(f"💬 Regular Chat: http://localhost:7860/api/v1/chat")
+            print(f"❤️  Health Check: http://localhost:7860/api/v1/health")
+            if settings.enable_metrics:
+                print(f"📈 Metrics: http://localhost:7860/api/v1/metrics")
+            print("\n🎉 Sema Chat API is ready for conversations!")
+            print("=" * 60)
+        else:
+            logger.error("chat_services_initialization_failed")
+            print("❌ Failed to initialize chat services")
+            print("🔧 Please check your configuration and try again")
+            raise RuntimeError("Chat services initialization failed")
+    except Exception as e:
+        logger.error("startup_failed", error=str(e))
+        print(f"💥 Startup failed: {e}")
+        raise
+@app.on_event("shutdown")
+async def shutdown_event():
+    """Cleanup on application shutdown"""
+    logger.info("application_shutdown")
+    print("\n🛑 Shutting down Sema Chat API...")
+    try:
+        await shutdown_chat_manager()
+        print("✅ Chat services shutdown complete")
+    except Exception as e:
+        logger.error("shutdown_failed", error=str(e))
+        print(f"⚠️  Shutdown warning: {e}")
+    print("👋 Goodbye!\n")
+# Health check endpoint at app level
+@app.get("/health", tags=["Health"])
+async def app_health():
+    """Simple app-level health check"""
+    uptime = time.time() - startup_time
+    return {
+        "status": "healthy",
+        "service": "sema-chat-api",
+        "version": settings.app_version,
+        "uptime_seconds": uptime,
+        "model_type": settings.model_type,
+        "model_name": settings.model_name
+    }
+# Status endpoint
+@app.get("/status", tags=["Health"])
+async def app_status():
+    """Simple status endpoint"""
+    return {
+        "status": "ok",
+        "service": "sema-chat-api",
+        "version": settings.app_version,
+        "environment": settings.environment
+    }
+if __name__ == "__main__":
+    import uvicorn
+    print(f"🚀 Starting Sema Chat API on {settings.host}:7860")
+    print(f"🔧 Debug mode: {settings.debug}")
+    print(f"🤖 Model: {settings.model_type}/{settings.model_name}")
+    uvicorn.run(
+        "app.main:app",
+        host=settings.host,
+        port=7860,
+        reload=settings.debug,
+        log_level=settings.log_level.lower()
+    )

app/models/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+ # Pydantic models and schemas

app/models/schemas.py ADDED Viewed

	@@ -0,0 +1,215 @@

+"""
+Pydantic models for request/response validation
+"""
+from typing import List, Optional, Dict, Any, Union
+from pydantic import BaseModel, Field, validator
+from datetime import datetime
+import uuid
+class ChatMessage(BaseModel):
+    """Individual chat message model"""
+    role: str = Field(..., description="Message role: 'user' or 'assistant'")
+    content: str = Field(..., description="Message content")
+    timestamp: datetime = Field(default_factory=datetime.utcnow, description="Message timestamp")
+    metadata: Optional[Dict[str, Any]] = Field(default=None, description="Additional message metadata")
+    @validator('role')
+    def validate_role(cls, v):
+        if v not in ['user', 'assistant', 'system']:
+            raise ValueError('Role must be user, assistant, or system')
+        return v
+class ChatRequest(BaseModel):
+    """Chat request model"""
+    message: str = Field(
+        ...,
+        description="User message",
+        min_length=1,
+        max_length=4000,
+        example="Hello, how are you today?"
+    )
+    session_id: str = Field(
+        ...,
+        description="Session identifier for conversation context",
+        example="user-123-session"
+    )
+    system_prompt: Optional[str] = Field(
+        default=None,
+        description="Custom system prompt for this conversation",
+        max_length=1000
+    )
+    temperature: Optional[float] = Field(
+        default=None,
+        description="Sampling temperature (0.0 to 1.0)",
+        ge=0.0,
+        le=1.0
+    )
+    max_tokens: Optional[int] = Field(
+        default=None,
+        description="Maximum tokens to generate",
+        ge=1,
+        le=2048
+    )
+    stream: Optional[bool] = Field(
+        default=False,
+        description="Whether to stream the response"
+    )
+class ChatResponse(BaseModel):
+    """Chat response model"""
+    message: str = Field(..., description="Assistant response message")
+    session_id: str = Field(..., description="Session identifier")
+    message_id: str = Field(..., description="Unique message identifier")
+    model_name: str = Field(..., description="Model used for generation")
+    timestamp: datetime = Field(default_factory=datetime.utcnow, description="Response timestamp")
+    generation_time: float = Field(..., description="Time taken to generate response (seconds)")
+    token_count: Optional[int] = Field(default=None, description="Number of tokens in response")
+    finish_reason: Optional[str] = Field(default=None, description="Reason generation finished")
+    class Config:
+        json_schema_extra = {
+            "example": {
+                "message": "Hello! I'm doing well, thank you for asking. How can I help you today?",
+                "session_id": "user-123-session",
+                "message_id": "msg-456-789",
+                "model_name": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
+                "timestamp": "2024-01-15T10:30:00Z",
+                "generation_time": 1.234,
+                "token_count": 25,
+                "finish_reason": "stop"
+            }
+        }
+class StreamChunk(BaseModel):
+    """Streaming response chunk model"""
+    content: str = Field(..., description="Chunk content")
+    session_id: str = Field(..., description="Session identifier")
+    message_id: str = Field(..., description="Message identifier")
+    chunk_id: int = Field(..., description="Chunk sequence number")
+    is_final: bool = Field(default=False, description="Whether this is the final chunk")
+    timestamp: datetime = Field(default_factory=datetime.utcnow, description="Chunk timestamp")
+class ConversationHistory(BaseModel):
+    """Conversation history model"""
+    session_id: str = Field(..., description="Session identifier")
+    messages: List[ChatMessage] = Field(..., description="List of messages in conversation")
+    created_at: datetime = Field(default_factory=datetime.utcnow, description="Session creation time")
+    updated_at: datetime = Field(default_factory=datetime.utcnow, description="Last update time")
+    message_count: int = Field(..., description="Total number of messages")
+    class Config:
+        json_schema_extra = {
+            "example": {
+                "session_id": "user-123-session",
+                "messages": [
+                    {
+                        "role": "user",
+                        "content": "Hello!",
+                        "timestamp": "2024-01-15T10:30:00Z"
+                    },
+                    {
+                        "role": "assistant",
+                        "content": "Hello! How can I help you today?",
+                        "timestamp": "2024-01-15T10:30:01Z"
+                    }
+                ],
+                "created_at": "2024-01-15T10:30:00Z",
+                "updated_at": "2024-01-15T10:30:01Z",
+                "message_count": 2
+            }
+        }
+class SessionInfo(BaseModel):
+    """Session information model"""
+    session_id: str = Field(..., description="Session identifier")
+    created_at: datetime = Field(..., description="Session creation time")
+    updated_at: datetime = Field(..., description="Last activity time")
+    message_count: int = Field(..., description="Number of messages in session")
+    model_name: str = Field(..., description="Model used in this session")
+    is_active: bool = Field(..., description="Whether session is active")
+class HealthResponse(BaseModel):
+    """Health check response model"""
+    status: str = Field(..., description="API health status")
+    version: str = Field(..., description="API version")
+    model_type: str = Field(..., description="Current model backend type")
+    model_name: str = Field(..., description="Current model name")
+    model_loaded: bool = Field(..., description="Whether model is loaded and ready")
+    uptime: float = Field(..., description="API uptime in seconds")
+    active_sessions: int = Field(..., description="Number of active chat sessions")
+    timestamp: datetime = Field(default_factory=datetime.utcnow, description="Health check timestamp")
+    class Config:
+        json_schema_extra = {
+            "example": {
+                "status": "healthy",
+                "version": "1.0.0",
+                "model_type": "local",
+                "model_name": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
+                "model_loaded": True,
+                "uptime": 3600.5,
+                "active_sessions": 5,
+                "timestamp": "2024-01-15T10:30:00Z"
+            }
+        }
+class ErrorResponse(BaseModel):
+    """Error response model"""
+    error: str = Field(..., description="Error type")
+    message: str = Field(..., description="Error message")
+    details: Optional[Dict[str, Any]] = Field(default=None, description="Additional error details")
+    timestamp: datetime = Field(default_factory=datetime.utcnow, description="Error timestamp")
+    request_id: Optional[str] = Field(default=None, description="Request identifier for debugging")
+    class Config:
+        json_schema_extra = {
+            "example": {
+                "error": "validation_error",
+                "message": "Message content is required",
+                "details": {"field": "message", "constraint": "min_length"},
+                "timestamp": "2024-01-15T10:30:00Z",
+                "request_id": "req-123-456"
+            }
+        }
+class ModelInfo(BaseModel):
+    """Model information model"""
+    name: str = Field(..., description="Model name")
+    type: str = Field(..., description="Model backend type")
+    loaded: bool = Field(..., description="Whether model is loaded")
+    parameters: Optional[Dict[str, Any]] = Field(default=None, description="Model parameters")
+    capabilities: List[str] = Field(default=[], description="Model capabilities")
+    class Config:
+        json_schema_extra = {
+            "example": {
+                "name": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
+                "type": "local",
+                "loaded": True,
+                "parameters": {
+                    "temperature": 0.7,
+                    "max_tokens": 512,
+                    "top_p": 0.9
+                },
+                "capabilities": ["chat", "instruction_following", "streaming"]
+            }
+        }

app/services/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+ # Services package

app/services/chat_manager.py ADDED Viewed

	@@ -0,0 +1,398 @@

+"""
+Chat Manager - Main orchestrator for chat functionality
+Coordinates between model backends and session management
+"""
+import time
+import uuid
+from typing import AsyncGenerator, List, Optional, Dict, Any
+from datetime import datetime
+from ..core.config import settings
+from ..core.logging import LoggerMixin
+from ..models.schemas import (
+    ChatMessage, ChatRequest, ChatResponse, StreamChunk,
+    ConversationHistory, ErrorResponse
+)
+from .model_manager import get_model_manager
+from .session_manager import get_session_manager
+from .model_backends.base import ModelBackendError, ModelNotLoadedError, GenerationError
+class ChatManager(LoggerMixin):
+    """
+    Main chat service that orchestrates conversation handling
+    """
+    def __init__(self):
+        self.is_initialized = False
+        self.active_streams = 0
+        self.max_concurrent_streams = settings.max_concurrent_streams
+    async def initialize(self) -> bool:
+        """Initialize the chat manager"""
+        try:
+            self.log_info("Initializing chat manager")
+            # Initialize model manager
+            model_manager = await get_model_manager()
+            if not await model_manager.initialize():
+                self.log_error("Failed to initialize model manager")
+                return False
+            # Initialize session manager
+            session_manager = await get_session_manager()
+            if not await session_manager.initialize():
+                self.log_error("Failed to initialize session manager")
+                return False
+            self.is_initialized = True
+            self.log_info("Chat manager initialized successfully")
+            return True
+        except Exception as e:
+            self.log_error("Chat manager initialization failed", error=str(e))
+            return False
+    async def shutdown(self):
+        """Shutdown the chat manager"""
+        try:
+            self.log_info("Shutting down chat manager")
+            # Shutdown managers
+            model_manager = await get_model_manager()
+            await model_manager.shutdown()
+            session_manager = await get_session_manager()
+            await session_manager.shutdown()
+            self.is_initialized = False
+            self.log_info("Chat manager shutdown complete")
+        except Exception as e:
+            self.log_error("Chat manager shutdown failed", error=str(e))
+    async def process_chat_request(self, request: ChatRequest) -> ChatResponse:
+        """
+        Process a non-streaming chat request
+        Args:
+            request: Chat request containing message and parameters
+        Returns:
+            ChatResponse: Complete response
+        """
+        if not self.is_initialized:
+            raise RuntimeError("Chat manager not initialized")
+        start_time = time.time()
+        try:
+            # Get managers
+            model_manager = await get_model_manager()
+            session_manager = await get_session_manager()
+            if not model_manager.is_ready():
+                raise ModelNotLoadedError("Model not ready for inference")
+            # Ensure session exists
+            await session_manager.create_session(request.session_id)
+            # Add user message to session
+            user_message = ChatMessage(
+                role="user",
+                content=request.message,
+                timestamp=datetime.utcnow(),
+                metadata={"session_id": request.session_id}
+            )
+            await session_manager.add_message(request.session_id, user_message)
+            # Get conversation history
+            messages = await self._prepare_messages_for_model(
+                request.session_id,
+                request.system_prompt
+            )
+            # Generate response
+            backend = model_manager.get_backend()
+            response = await backend.generate_response(
+                messages=messages,
+                temperature=request.temperature or settings.temperature,
+                max_tokens=request.max_tokens or settings.max_new_tokens
+            )
+            # Add assistant message to session
+            assistant_message = ChatMessage(
+                role="assistant",
+                content=response.message,
+                timestamp=datetime.utcnow(),
+                metadata={"session_id": request.session_id, "message_id": response.message_id}
+            )
+            await session_manager.add_message(request.session_id, assistant_message)
+            # Update response with correct session info
+            response.session_id = request.session_id
+            self.log_info("Chat request processed",
+                         session_id=request.session_id,
+                         generation_time=response.generation_time,
+                         total_time=time.time() - start_time)
+            return response
+        except ModelBackendError as e:
+            self.log_error("Model backend error", error=str(e), session_id=request.session_id)
+            raise
+        except Exception as e:
+            self.log_error("Chat request processing failed", error=str(e), session_id=request.session_id)
+            raise
+    async def process_streaming_chat_request(
+        self,
+        request: ChatRequest
+    ) -> AsyncGenerator[StreamChunk, None]:
+        """
+        Process a streaming chat request
+        Args:
+            request: Chat request containing message and parameters
+        Yields:
+            StreamChunk: Response chunks
+        """
+        if not self.is_initialized:
+            raise RuntimeError("Chat manager not initialized")
+        # Check concurrent stream limit
+        if self.active_streams >= self.max_concurrent_streams:
+            raise RuntimeError(f"Maximum concurrent streams ({self.max_concurrent_streams}) exceeded")
+        self.active_streams += 1
+        try:
+            # Get managers
+            model_manager = await get_model_manager()
+            session_manager = await get_session_manager()
+            if not model_manager.is_ready():
+                raise ModelNotLoadedError("Model not ready for inference")
+            # Ensure session exists
+            await session_manager.create_session(request.session_id)
+            # Add user message to session
+            user_message = ChatMessage(
+                role="user",
+                content=request.message,
+                timestamp=datetime.utcnow(),
+                metadata={"session_id": request.session_id}
+            )
+            await session_manager.add_message(request.session_id, user_message)
+            # Get conversation history
+            messages = await self._prepare_messages_for_model(
+                request.session_id,
+                request.system_prompt
+            )
+            # Generate streaming response
+            backend = model_manager.get_backend()
+            full_response = ""
+            message_id = None
+            async for chunk in backend.generate_stream(
+                messages=messages,
+                temperature=request.temperature or settings.temperature,
+                max_tokens=request.max_tokens or settings.max_new_tokens
+            ):
+                if message_id is None:
+                    message_id = chunk.message_id
+                full_response += chunk.content
+                yield chunk
+            # Add complete assistant message to session
+            if full_response.strip():
+                assistant_message = ChatMessage(
+                    role="assistant",
+                    content=full_response.strip(),
+                    timestamp=datetime.utcnow(),
+                    metadata={"session_id": request.session_id, "message_id": message_id}
+                )
+                await session_manager.add_message(request.session_id, assistant_message)
+            self.log_info("Streaming chat request processed",
+                         session_id=request.session_id,
+                         response_length=len(full_response))
+        except ModelBackendError as e:
+            self.log_error("Model backend error in streaming", error=str(e), session_id=request.session_id)
+            raise
+        except Exception as e:
+            self.log_error("Streaming chat request failed", error=str(e), session_id=request.session_id)
+            raise
+        finally:
+            self.active_streams -= 1
+    async def _prepare_messages_for_model(
+        self,
+        session_id: str,
+        custom_system_prompt: Optional[str] = None
+    ) -> List[ChatMessage]:
+        """
+        Prepare messages for model input, including system prompt and history
+        Args:
+            session_id: Session identifier
+            custom_system_prompt: Optional custom system prompt
+        Returns:
+            List of ChatMessage objects ready for model input
+        """
+        session_manager = await get_session_manager()
+        # Get conversation history
+        history_messages = await session_manager.get_session_messages(session_id)
+        # Prepare messages list
+        messages = []
+        # Add system prompt if provided
+        system_prompt = custom_system_prompt or settings.get_system_prompt()
+        if system_prompt:
+            messages.append(ChatMessage(
+                role="system",
+                content=system_prompt,
+                timestamp=datetime.utcnow()
+            ))
+        # Add conversation history
+        messages.extend(history_messages)
+        return messages
+    async def get_conversation_history(self, session_id: str) -> Optional[ConversationHistory]:
+        """
+        Get conversation history for a session
+        Args:
+            session_id: Session identifier
+        Returns:
+            ConversationHistory or None if session not found
+        """
+        try:
+            session_manager = await get_session_manager()
+            return await session_manager.get_session(session_id)
+        except Exception as e:
+            self.log_error("Failed to get conversation history", error=str(e), session_id=session_id)
+            return None
+    async def clear_conversation(self, session_id: str) -> bool:
+        """
+        Clear conversation history for a session
+        Args:
+            session_id: Session identifier
+        Returns:
+            bool: True if cleared successfully
+        """
+        try:
+            session_manager = await get_session_manager()
+            return await session_manager.delete_session(session_id)
+        except Exception as e:
+            self.log_error("Failed to clear conversation", error=str(e), session_id=session_id)
+            return False
+    async def get_active_sessions(self) -> List[Dict[str, Any]]:
+        """Get information about active chat sessions"""
+        try:
+            session_manager = await get_session_manager()
+            sessions = await session_manager.get_active_sessions()
+            return [
+                {
+                    "session_id": session.session_id,
+                    "created_at": session.created_at.isoformat(),
+                    "updated_at": session.updated_at.isoformat(),
+                    "message_count": session.message_count,
+                    "model_name": session.model_name,
+                    "is_active": session.is_active
+                }
+                for session in sessions
+            ]
+        except Exception as e:
+            self.log_error("Failed to get active sessions", error=str(e))
+            return []
+    async def health_check(self) -> Dict[str, Any]:
+        """Perform a comprehensive health check"""
+        try:
+            health_status = {
+                "chat_manager": {
+                    "status": "healthy" if self.is_initialized else "unhealthy",
+                    "initialized": self.is_initialized,
+                    "active_streams": self.active_streams,
+                    "max_concurrent_streams": self.max_concurrent_streams
+                }
+            }
+            # Check model manager
+            model_manager = await get_model_manager()
+            model_health = await model_manager.health_check()
+            health_status["model_manager"] = model_health
+            # Check session manager
+            session_manager = await get_session_manager()
+            active_sessions = await session_manager.get_active_sessions()
+            health_status["session_manager"] = {
+                "status": "healthy",
+                "active_sessions": len(active_sessions),
+                "storage_type": "redis" if session_manager.use_redis else "memory"
+            }
+            # Overall status
+            overall_healthy = (
+                self.is_initialized and
+                model_health.get("status") == "healthy"
+            )
+            health_status["overall"] = {
+                "status": "healthy" if overall_healthy else "unhealthy",
+                "timestamp": datetime.utcnow().isoformat()
+            }
+            return health_status
+        except Exception as e:
+            self.log_error("Health check failed", error=str(e))
+            return {
+                "overall": {
+                    "status": "unhealthy",
+                    "error": str(e),
+                    "timestamp": datetime.utcnow().isoformat()
+                }
+            }
+# Global chat manager instance
+chat_manager = ChatManager()
+async def get_chat_manager() -> ChatManager:
+    """Get the global chat manager instance"""
+    return chat_manager
+async def initialize_chat_manager() -> bool:
+    """Initialize the global chat manager"""
+    return await chat_manager.initialize()
+async def shutdown_chat_manager():
+    """Shutdown the global chat manager"""
+    await chat_manager.shutdown()

app/services/model_backends/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+ # Model backends package

app/services/model_backends/anthropic_api.py ADDED Viewed

	@@ -0,0 +1,319 @@

+"""
+Anthropic API backend
+Uses Anthropic's API for Claude models
+"""
+import asyncio
+import time
+import uuid
+from typing import AsyncGenerator, List, Dict, Any, Optional
+from datetime import datetime
+import anthropic
+from .base import ModelBackend, ModelLoadError, GenerationError, ModelNotLoadedError
+from ...models.schemas import ChatMessage, ChatResponse, StreamChunk
+from ...core.config import settings
+class AnthropicAPIBackend(ModelBackend):
+    """Anthropic API backend for Claude models"""
+    def __init__(self, model_name: str, **kwargs):
+        super().__init__(model_name, **kwargs)
+        self.client = None
+        self.api_key = kwargs.get('api_key', settings.anthropic_api_key)
+        self.capabilities = ["chat", "streaming", "api_based", "long_context"]
+        # Generation parameters
+        self.parameters = {
+            'temperature': kwargs.get('temperature', settings.temperature),
+            'max_tokens': kwargs.get('max_tokens', settings.max_new_tokens),
+            'top_p': kwargs.get('top_p', settings.top_p),
+        }
+    async def load_model(self) -> bool:
+        """Initialize the Anthropic API client"""
+        try:
+            if not self.api_key:
+                raise ModelLoadError("Anthropic API key is required")
+            self.log_info("Initializing Anthropic API client", model=self.model_name)
+            # Initialize the Anthropic client
+            self.client = anthropic.AsyncAnthropic(
+                api_key=self.api_key
+            )
+            # Test the connection
+            await self._test_connection()
+            self.is_loaded = True
+            self.log_info("Anthropic API client initialized successfully", model=self.model_name)
+            return True
+        except Exception as e:
+            self.log_error("Failed to initialize Anthropic API client", error=str(e), model=self.model_name)
+            raise ModelLoadError(f"Failed to initialize Anthropic API for {self.model_name}: {str(e)}")
+    async def unload_model(self) -> bool:
+        """Clean up the API client"""
+        try:
+            if self.client:
+                await self.client.close()
+            self.client = None
+            self.is_loaded = False
+            self.log_info("Anthropic API client cleaned up", model=self.model_name)
+            return True
+        except Exception as e:
+            self.log_error("Failed to cleanup Anthropic API client", error=str(e), model=self.model_name)
+            return False
+    async def _test_connection(self):
+        """Test the Anthropic API connection"""
+        try:
+            # Simple test request
+            response = await self.client.messages.create(
+                model=self.model_name,
+                max_tokens=5,
+                temperature=0.1,
+                messages=[{"role": "user", "content": "Hello"}]
+            )
+            self.log_info("Anthropic API connection test successful", model=self.model_name)
+        except Exception as e:
+            self.log_error("Anthropic API connection test failed", error=str(e), model=self.model_name)
+            raise
+    def _format_messages_for_api(self, messages: List[ChatMessage]) -> tuple:
+        """Format messages for Anthropic API (separate system and messages)"""
+        system_message = None
+        formatted_messages = []
+        for msg in messages:
+            if msg.role == "system":
+                system_message = msg.content
+            else:
+                formatted_messages.append({
+                    "role": msg.role,
+                    "content": msg.content
+                })
+        return system_message, formatted_messages
+    async def generate_response(
+        self,
+        messages: List[ChatMessage],
+        temperature: float = 0.7,
+        max_tokens: int = 512,
+        **kwargs
+    ) -> ChatResponse:
+        """Generate a complete response using Anthropic API"""
+        if not self.is_loaded:
+            raise ModelNotLoadedError("Anthropic API client not initialized")
+        start_time = time.time()
+        message_id = str(uuid.uuid4())
+        try:
+            # Validate parameters
+            params = self.validate_parameters(
+                temperature=temperature,
+                max_tokens=max_tokens,
+                **kwargs
+            )
+            # Format messages
+            system_message, api_messages = self._format_messages_for_api(messages)
+            # Prepare request parameters
+            request_params = {
+                "model": self.model_name,
+                "messages": api_messages,
+                "max_tokens": params['max_tokens'],
+                "temperature": params['temperature'],
+                "top_p": params.get('top_p', 0.9),
+                "stream": False
+            }
+            # Add system message if present
+            if system_message:
+                request_params["system"] = system_message
+            # Make API call
+            response = await self.client.messages.create(**request_params)
+            # Extract response
+            response_text = response.content[0].text if response.content else ""
+            finish_reason = getattr(response, 'stop_reason', 'stop')
+            token_count = getattr(response.usage, 'output_tokens', None) if hasattr(response, 'usage') else None
+            generation_time = time.time() - start_time
+            return ChatResponse(
+                message=response_text.strip(),
+                session_id=messages[-1].metadata.get('session_id', 'unknown') if messages[-1].metadata else 'unknown',
+                message_id=message_id,
+                model_name=self.model_name,
+                generation_time=generation_time,
+                token_count=token_count,
+                finish_reason=finish_reason
+            )
+        except Exception as e:
+            self.log_error("Anthropic API generation failed", error=str(e), model=self.model_name)
+            raise GenerationError(f"Failed to generate response via Anthropic API: {str(e)}")
+    async def generate_stream(
+        self,
+        messages: List[ChatMessage],
+        temperature: float = 0.7,
+        max_tokens: int = 512,
+        **kwargs
+    ) -> AsyncGenerator[StreamChunk, None]:
+        """Generate a streaming response using Anthropic API"""
+        if not self.is_loaded:
+            raise ModelNotLoadedError("Anthropic API client not initialized")
+        message_id = str(uuid.uuid4())
+        session_id = messages[-1].metadata.get('session_id', 'unknown') if messages[-1].metadata else 'unknown'
+        chunk_id = 0
+        try:
+            # Validate parameters
+            params = self.validate_parameters(
+                temperature=temperature,
+                max_tokens=max_tokens,
+                **kwargs
+            )
+            # Format messages
+            system_message, api_messages = self._format_messages_for_api(messages)
+            # Prepare request parameters
+            request_params = {
+                "model": self.model_name,
+                "messages": api_messages,
+                "max_tokens": params['max_tokens'],
+                "temperature": params['temperature'],
+                "top_p": params.get('top_p', 0.9),
+                "stream": True
+            }
+            # Add system message if present
+            if system_message:
+                request_params["system"] = system_message
+            # Create streaming request
+            stream = await self.client.messages.create(**request_params)
+            # Process streaming chunks
+            async for chunk in stream:
+                if chunk.type == "content_block_delta":
+                    if hasattr(chunk.delta, 'text') and chunk.delta.text:
+                        yield StreamChunk(
+                            content=chunk.delta.text,
+                            session_id=session_id,
+                            message_id=message_id,
+                            chunk_id=chunk_id,
+                            is_final=False
+                        )
+                        chunk_id += 1
+                        # Add small delay to prevent overwhelming the client
+                        await asyncio.sleep(settings.stream_delay)
+                elif chunk.type == "message_stop":
+                    break
+            # Send final chunk
+            yield StreamChunk(
+                content="",
+                session_id=session_id,
+                message_id=message_id,
+                chunk_id=chunk_id,
+                is_final=True
+            )
+        except Exception as e:
+            self.log_error("Anthropic API streaming failed", error=str(e), model=self.model_name)
+            raise GenerationError(f"Failed to generate streaming response via Anthropic API: {str(e)}")
+    def get_model_info(self) -> Dict[str, Any]:
+        """Get information about the current model"""
+        return {
+            "name": self.model_name,
+            "type": "anthropic_api",
+            "loaded": self.is_loaded,
+            "provider": "Anthropic",
+            "capabilities": self.capabilities,
+            "parameters": self.parameters,
+            "requires_api_key": True,
+            "api_key_configured": bool(self.api_key),
+            "context_window": self._get_context_window()
+        }
+    def _get_context_window(self) -> int:
+        """Get the context window size for the model"""
+        context_windows = {
+            "claude-3-haiku-20240307": 200000,
+            "claude-3-sonnet-20240229": 200000,
+            "claude-3-opus-20240229": 200000,
+            "claude-3-5-sonnet-20241022": 200000,
+            "claude-3-5-haiku-20241022": 200000,
+        }
+        return context_windows.get(self.model_name, 100000)
+    async def health_check(self) -> Dict[str, Any]:
+        """Perform a health check on the Anthropic API"""
+        try:
+            if not self.is_loaded:
+                return {
+                    "status": "unhealthy",
+                    "reason": "client_not_initialized",
+                    "model_name": self.model_name
+                }
+            # Test API connectivity
+            start_time = time.time()
+            test_messages = [ChatMessage(role="user", content="Hi")]
+            try:
+                response = await asyncio.wait_for(
+                    self.generate_response(
+                        test_messages,
+                        temperature=0.1,
+                        max_tokens=5
+                    ),
+                    timeout=10.0
+                )
+                response_time = time.time() - start_time
+                return {
+                    "status": "healthy",
+                    "model_name": self.model_name,
+                    "response_time": response_time,
+                    "provider": "Anthropic",
+                    "context_window": self._get_context_window()
+                }
+            except asyncio.TimeoutError:
+                return {
+                    "status": "unhealthy",
+                    "reason": "api_timeout",
+                    "model_name": self.model_name,
+                    "provider": "Anthropic"
+                }
+        except Exception as e:
+            self.log_error("Anthropic API health check failed", error=str(e), model=self.model_name)
+            return {
+                "status": "unhealthy",
+                "reason": "api_error",
+                "error": str(e),
+                "model_name": self.model_name,
+                "provider": "Anthropic"
+            }

app/services/model_backends/base.py ADDED Viewed

	@@ -0,0 +1,222 @@

+"""
+Abstract base class for model backends
+Defines the interface that all model backends must implement
+"""
+from abc import ABC, abstractmethod
+from typing import AsyncGenerator, List, Dict, Any, Optional
+from ...models.schemas import ChatMessage, ChatResponse, StreamChunk
+from ...core.logging import LoggerMixin
+class ModelBackend(ABC, LoggerMixin):
+    """Abstract base class for all model backends"""
+    def __init__(self, model_name: str, **kwargs):
+        self.model_name = model_name
+        self.is_loaded = False
+        self.capabilities = []
+        self.parameters = {}
+    @abstractmethod
+    async def load_model(self) -> bool:
+        """
+        Load the model and prepare it for inference
+        Returns:
+            bool: True if model loaded successfully, False otherwise
+        """
+        pass
+    @abstractmethod
+    async def unload_model(self) -> bool:
+        """
+        Unload the model and free resources
+        Returns:
+            bool: True if model unloaded successfully, False otherwise
+        """
+        pass
+    @abstractmethod
+    async def generate_response(
+        self,
+        messages: List[ChatMessage],
+        temperature: float = 0.7,
+        max_tokens: int = 512,
+        **kwargs
+    ) -> ChatResponse:
+        """
+        Generate a complete response (non-streaming)
+        Args:
+            messages: List of conversation messages
+            temperature: Sampling temperature
+            max_tokens: Maximum tokens to generate
+            **kwargs: Additional generation parameters
+        Returns:
+            ChatResponse: Complete response
+        """
+        pass
+    @abstractmethod
+    async def generate_stream(
+        self,
+        messages: List[ChatMessage],
+        temperature: float = 0.7,
+        max_tokens: int = 512,
+        **kwargs
+    ) -> AsyncGenerator[StreamChunk, None]:
+        """
+        Generate a streaming response
+        Args:
+            messages: List of conversation messages
+            temperature: Sampling temperature
+            max_tokens: Maximum tokens to generate
+            **kwargs: Additional generation parameters
+        Yields:
+            StreamChunk: Response chunks
+        """
+        pass
+    @abstractmethod
+    def get_model_info(self) -> Dict[str, Any]:
+        """
+        Get information about the current model
+        Returns:
+            Dict containing model information
+        """
+        pass
+    def supports_streaming(self) -> bool:
+        """Check if this backend supports streaming"""
+        return "streaming" in self.capabilities
+    def supports_chat(self) -> bool:
+        """Check if this backend supports chat/conversation"""
+        return "chat" in self.capabilities
+    def is_model_loaded(self) -> bool:
+        """Check if model is loaded and ready"""
+        return self.is_loaded
+    def format_messages_for_model(self, messages: List[ChatMessage]) -> Any:
+        """
+        Format messages for the specific model format
+        Override in subclasses if needed
+        Args:
+            messages: List of ChatMessage objects
+        Returns:
+            Formatted messages for the model
+        """
+        return [{"role": msg.role, "content": msg.content} for msg in messages]
+    def validate_parameters(self, **kwargs) -> Dict[str, Any]:
+        """
+        Validate and normalize generation parameters
+        Args:
+            **kwargs: Generation parameters
+        Returns:
+            Dict of validated parameters
+        """
+        validated = {}
+        # Temperature validation
+        temperature = kwargs.get('temperature', 0.7)
+        validated['temperature'] = max(0.0, min(1.0, float(temperature)))
+        # Max tokens validation
+        max_tokens = kwargs.get('max_tokens', 512)
+        validated['max_tokens'] = max(1, min(2048, int(max_tokens)))
+        # Top-p validation
+        top_p = kwargs.get('top_p', 0.9)
+        validated['top_p'] = max(0.0, min(1.0, float(top_p)))
+        # Top-k validation
+        top_k = kwargs.get('top_k', 50)
+        validated['top_k'] = max(1, int(top_k))
+        return validated
+    async def health_check(self) -> Dict[str, Any]:
+        """
+        Perform a health check on the model backend
+        Returns:
+            Dict containing health status
+        """
+        try:
+            if not self.is_loaded:
+                return {
+                    "status": "unhealthy",
+                    "reason": "model_not_loaded",
+                    "model_name": self.model_name
+                }
+            # Try a simple generation to test the model
+            test_messages = [
+                ChatMessage(role="user", content="Hello")
+            ]
+            # Use a timeout for the health check
+            import asyncio
+            try:
+                response = await asyncio.wait_for(
+                    self.generate_response(
+                        test_messages,
+                        temperature=0.1,
+                        max_tokens=10
+                    ),
+                    timeout=10.0
+                )
+                return {
+                    "status": "healthy",
+                    "model_name": self.model_name,
+                    "response_time": getattr(response, 'generation_time', 0.0)
+                }
+            except asyncio.TimeoutError:
+                return {
+                    "status": "unhealthy",
+                    "reason": "timeout",
+                    "model_name": self.model_name
+                }
+        except Exception as e:
+            self.log_error("Health check failed", error=str(e), model=self.model_name)
+            return {
+                "status": "unhealthy",
+                "reason": "generation_error",
+                "error": str(e),
+                "model_name": self.model_name
+            }
+class ModelBackendError(Exception):
+    """Base exception for model backend errors"""
+    pass
+class ModelLoadError(ModelBackendError):
+    """Exception raised when model loading fails"""
+    pass
+class GenerationError(ModelBackendError):
+    """Exception raised when text generation fails"""
+    pass
+class ModelNotLoadedError(ModelBackendError):
+    """Exception raised when trying to use an unloaded model"""
+    pass

app/services/model_backends/google_api.py ADDED Viewed

	@@ -0,0 +1,304 @@

+"""
+Google AI Studio API backend
+Uses Google's AI Studio API for Gemma and other Google models
+"""
+import asyncio
+import time
+import uuid
+import json
+from typing import AsyncGenerator, List, Dict, Any, Optional
+from datetime import datetime
+import httpx
+from .base import ModelBackend, ModelLoadError, GenerationError, ModelNotLoadedError
+from ...models.schemas import ChatMessage, ChatResponse, StreamChunk
+from ...core.config import settings
+class GoogleAIBackend(ModelBackend):
+    """Google AI Studio API backend for Gemma and other Google models"""
+    def __init__(self, model_name: str, **kwargs):
+        super().__init__(model_name, **kwargs)
+        self.api_key = kwargs.get('api_key', settings.google_api_key)
+        self.base_url = "https://generativelanguage.googleapis.com/v1beta"
+        self.capabilities = ["chat", "streaming", "api_based"]
+        # Generation parameters
+        self.parameters = {
+            'temperature': kwargs.get('temperature', settings.temperature),
+            'max_tokens': kwargs.get('max_tokens', settings.max_new_tokens),
+            'top_p': kwargs.get('top_p', settings.top_p),
+            'top_k': kwargs.get('top_k', settings.top_k),
+        }
+    async def load_model(self) -> bool:
+        """Initialize the Google AI API client"""
+        try:
+            if not self.api_key:
+                raise ModelLoadError("Google AI API key is required")
+            self.log_info("Initializing Google AI API client", model=self.model_name)
+            # Test the connection
+            await self._test_connection()
+            self.is_loaded = True
+            self.log_info("Google AI API client initialized successfully", model=self.model_name)
+            return True
+        except Exception as e:
+            self.log_error("Failed to initialize Google AI API client", error=str(e), model=self.model_name)
+            raise ModelLoadError(f"Failed to initialize Google AI API for {self.model_name}: {str(e)}")
+    async def unload_model(self) -> bool:
+        """Clean up the API client"""
+        try:
+            self.is_loaded = False
+            self.log_info("Google AI API client cleaned up", model=self.model_name)
+            return True
+        except Exception as e:
+            self.log_error("Failed to cleanup Google AI API client", error=str(e), model=self.model_name)
+            return False
+    async def _test_connection(self):
+        """Test the Google AI API connection"""
+        try:
+            url = f"{self.base_url}/models/{self.model_name}:generateContent"
+            test_data = {
+                "contents": [
+                    {
+                        "parts": [{"text": "Hello"}]
+                    }
+                ],
+                "generationConfig": {
+                    "maxOutputTokens": 5,
+                    "temperature": 0.1
+                }
+            }
+            async with httpx.AsyncClient() as client:
+                response = await client.post(
+                    f"{url}?key={self.api_key}",
+                    headers={'Content-Type': 'application/json'},
+                    json=test_data,
+                    timeout=10.0
+                )
+                if response.status_code != 200:
+                    raise Exception(f"API test failed with status {response.status_code}: {response.text}")
+            self.log_info("Google AI API connection test successful", model=self.model_name)
+        except Exception as e:
+            self.log_error("Google AI API connection test failed", error=str(e), model=self.model_name)
+            raise
+    def _format_messages_for_api(self, messages: List[ChatMessage]) -> Dict[str, Any]:
+        """Format messages for Google AI API"""
+        contents = []
+        system_instruction = None
+        for msg in messages:
+            if msg.role == "system":
+                system_instruction = msg.content
+            elif msg.role == "user":
+                contents.append({
+                    "role": "user",
+                    "parts": [{"text": msg.content}]
+                })
+            elif msg.role == "assistant":
+                contents.append({
+                    "role": "model",
+                    "parts": [{"text": msg.content}]
+                })
+        result = {"contents": contents}
+        if system_instruction:
+            result["systemInstruction"] = {"parts": [{"text": system_instruction}]}
+        return result
+    async def generate_response(
+        self,
+        messages: List[ChatMessage],
+        temperature: float = 0.7,
+        max_tokens: int = 512,
+        **kwargs
+    ) -> ChatResponse:
+        """Generate a complete response using Google AI API"""
+        if not self.is_loaded:
+            raise ModelNotLoadedError("Google AI API client not initialized")
+        start_time = time.time()
+        message_id = str(uuid.uuid4())
+        try:
+            # Validate parameters
+            params = self.validate_parameters(
+                temperature=temperature,
+                max_tokens=max_tokens,
+                **kwargs
+            )
+            # Format messages
+            api_data = self._format_messages_for_api(messages)
+            # Add generation config
+            api_data["generationConfig"] = {
+                "maxOutputTokens": params['max_tokens'],
+                "temperature": params['temperature'],
+                "topP": params.get('top_p', 0.9),
+                "topK": params.get('top_k', 40)
+            }
+            # Make API call
+            url = f"{self.base_url}/models/{self.model_name}:generateContent"
+            async with httpx.AsyncClient() as client:
+                response = await client.post(
+                    f"{url}?key={self.api_key}",
+                    headers={'Content-Type': 'application/json'},
+                    json=api_data,
+                    timeout=30.0
+                )
+                if response.status_code != 200:
+                    raise GenerationError(f"API request failed with status {response.status_code}: {response.text}")
+                response_data = response.json()
+            # Extract response text
+            if 'candidates' in response_data and response_data['candidates']:
+                candidate = response_data['candidates'][0]
+                if 'content' in candidate and 'parts' in candidate['content']:
+                    parts = candidate['content']['parts']
+                    response_text = ''.join(part.get('text', '') for part in parts)
+                else:
+                    response_text = str(response_data)
+            else:
+                response_text = str(response_data)
+            generation_time = time.time() - start_time
+            return ChatResponse(
+                message=response_text.strip(),
+                session_id=messages[-1].metadata.get('session_id', 'unknown') if messages[-1].metadata else 'unknown',
+                message_id=message_id,
+                model_name=self.model_name,
+                generation_time=generation_time,
+                token_count=len(response_text.split()),  # Approximate token count
+                finish_reason="stop"
+            )
+        except Exception as e:
+            self.log_error("Google AI API generation failed", error=str(e), model=self.model_name)
+            raise GenerationError(f"Failed to generate response via Google AI API: {str(e)}")
+    async def generate_stream(
+        self,
+        messages: List[ChatMessage],
+        temperature: float = 0.7,
+        max_tokens: int = 512,
+        **kwargs
+    ) -> AsyncGenerator[StreamChunk, None]:
+        """Generate a streaming response using Google AI API"""
+        if not self.is_loaded:
+            raise ModelNotLoadedError("Google AI API client not initialized")
+        message_id = str(uuid.uuid4())
+        session_id = messages[-1].metadata.get('session_id', 'unknown') if messages[-1].metadata else 'unknown'
+        chunk_id = 0
+        try:
+            # Validate parameters
+            params = self.validate_parameters(
+                temperature=temperature,
+                max_tokens=max_tokens,
+                **kwargs
+            )
+            # Format messages
+            api_data = self._format_messages_for_api(messages)
+            # Add generation config
+            api_data["generationConfig"] = {
+                "maxOutputTokens": params['max_tokens'],
+                "temperature": params['temperature'],
+                "topP": params.get('top_p', 0.9),
+                "topK": params.get('top_k', 40)
+            }
+            # Make streaming API call
+            url = f"{self.base_url}/models/{self.model_name}:streamGenerateContent"
+            async with httpx.AsyncClient() as client:
+                async with client.stream(
+                    'POST',
+                    f"{url}?key={self.api_key}",
+                    headers={'Content-Type': 'application/json'},
+                    json=api_data,
+                    timeout=60.0
+                ) as response:
+                    if response.status_code != 200:
+                        raise GenerationError(f"Streaming request failed with status {response.status_code}")
+                    async for line in response.aiter_lines():
+                        if line.strip():
+                            try:
+                                # Google AI API returns JSON objects separated by newlines
+                                data = json.loads(line)
+                                if 'candidates' in data and data['candidates']:
+                                    candidate = data['candidates'][0]
+                                    if 'content' in candidate and 'parts' in candidate['content']:
+                                        parts = candidate['content']['parts']
+                                        content = ''.join(part.get('text', '') for part in parts)
+                                        if content:
+                                            yield StreamChunk(
+                                                content=content,
+                                                session_id=session_id,
+                                                message_id=message_id,
+                                                chunk_id=chunk_id,
+                                                is_final=False
+                                            )
+                                            chunk_id += 1
+                                            # Add small delay
+                                            await asyncio.sleep(settings.stream_delay)
+                            except json.JSONDecodeError:
+                                continue
+            # Send final chunk
+            yield StreamChunk(
+                content="",
+                session_id=session_id,
+                message_id=message_id,
+                chunk_id=chunk_id,
+                is_final=True
+            )
+        except Exception as e:
+            self.log_error("Google AI API streaming failed", error=str(e), model=self.model_name)
+            raise GenerationError(f"Failed to generate streaming response via Google AI API: {str(e)}")
+    def get_model_info(self) -> Dict[str, Any]:
+        """Get information about the current model"""
+        return {
+            "name": self.model_name,
+            "type": "google_ai",
+            "loaded": self.is_loaded,
+            "provider": "Google AI Studio",
+            "capabilities": self.capabilities,
+            "parameters": self.parameters,
+            "requires_api_key": True,
+            "api_key_configured": bool(self.api_key),
+            "base_url": self.base_url
+        }

app/services/model_backends/hf_api.py ADDED Viewed

	@@ -0,0 +1,303 @@

+"""
+HuggingFace Inference API backend
+Uses HuggingFace's hosted inference API for model access
+"""
+import asyncio
+import time
+import uuid
+import json
+from typing import AsyncGenerator, List, Dict, Any, Optional
+from datetime import datetime
+import httpx
+from huggingface_hub import InferenceClient
+from .base import ModelBackend, ModelLoadError, GenerationError, ModelNotLoadedError
+from ...models.schemas import ChatMessage, ChatResponse, StreamChunk
+from ...core.config import settings
+class HuggingFaceAPIBackend(ModelBackend):
+    """HuggingFace Inference API backend"""
+    def __init__(self, model_name: str, **kwargs):
+        super().__init__(model_name, **kwargs)
+        self.client = None
+        self.api_token = kwargs.get('api_token', settings.hf_api_token)
+        self.inference_url = kwargs.get('inference_url', settings.hf_inference_url)
+        self.capabilities = ["chat", "streaming", "api_based"]
+        # Generation parameters
+        self.parameters = {
+            'temperature': kwargs.get('temperature', settings.temperature),
+            'max_tokens': kwargs.get('max_tokens', settings.max_new_tokens),
+            'top_p': kwargs.get('top_p', settings.top_p),
+        }
+    async def load_model(self) -> bool:
+        """Initialize the HuggingFace API client"""
+        try:
+            if not self.api_token:
+                raise ModelLoadError("HuggingFace API token is required")
+            self.log_info("Initializing HuggingFace API client", model=self.model_name)
+            # Initialize the inference client
+            self.client = InferenceClient(
+                model=self.model_name,
+                token=self.api_token
+            )
+            # Test the connection with a simple request
+            await self._test_connection()
+            self.is_loaded = True
+            self.log_info("HuggingFace API client initialized successfully", model=self.model_name)
+            return True
+        except Exception as e:
+            self.log_error("Failed to initialize HuggingFace API client", error=str(e), model=self.model_name)
+            raise ModelLoadError(f"Failed to initialize HuggingFace API for {self.model_name}: {str(e)}")
+    async def unload_model(self) -> bool:
+        """Clean up the API client"""
+        try:
+            self.client = None
+            self.is_loaded = False
+            self.log_info("HuggingFace API client cleaned up", model=self.model_name)
+            return True
+        except Exception as e:
+            self.log_error("Failed to cleanup API client", error=str(e), model=self.model_name)
+            return False
+    async def _test_connection(self):
+        """Test the API connection"""
+        try:
+            # Simple test message
+            test_messages = [{"role": "user", "content": "Hello"}]
+            # Use asyncio to run the sync client method
+            loop = asyncio.get_event_loop()
+            response = await loop.run_in_executor(
+                None,
+                lambda: self.client.chat_completion(
+                    messages=test_messages,
+                    max_tokens=10,
+                    temperature=0.1
+                )
+            )
+            self.log_info("API connection test successful", model=self.model_name)
+        except Exception as e:
+            self.log_error("API connection test failed", error=str(e), model=self.model_name)
+            raise
+    def _format_messages_for_api(self, messages: List[ChatMessage]) -> List[Dict[str, str]]:
+        """Format messages for HuggingFace API"""
+        formatted = []
+        for msg in messages:
+            formatted.append({
+                "role": msg.role,
+                "content": msg.content
+            })
+        return formatted
+    async def generate_response(
+        self,
+        messages: List[ChatMessage],
+        temperature: float = 0.7,
+        max_tokens: int = 512,
+        **kwargs
+    ) -> ChatResponse:
+        """Generate a complete response using HuggingFace API"""
+        if not self.is_loaded:
+            raise ModelNotLoadedError("HuggingFace API client not initialized")
+        start_time = time.time()
+        message_id = str(uuid.uuid4())
+        try:
+            # Validate parameters
+            params = self.validate_parameters(
+                temperature=temperature,
+                max_tokens=max_tokens,
+                **kwargs
+            )
+            # Format messages
+            api_messages = self._format_messages_for_api(messages)
+            # Make API call
+            loop = asyncio.get_event_loop()
+            response = await loop.run_in_executor(
+                None,
+                lambda: self.client.chat_completion(
+                    messages=api_messages,
+                    max_tokens=params['max_tokens'],
+                    temperature=params['temperature'],
+                    top_p=params.get('top_p', 0.9),
+                    stream=False
+                )
+            )
+            # Extract response text
+            if hasattr(response, 'choices') and response.choices:
+                response_text = response.choices[0].message.content
+                finish_reason = getattr(response.choices[0], 'finish_reason', 'stop')
+            else:
+                response_text = str(response)
+                finish_reason = 'unknown'
+            generation_time = time.time() - start_time
+            return ChatResponse(
+                message=response_text.strip(),
+                session_id=messages[-1].metadata.get('session_id', 'unknown') if messages[-1].metadata else 'unknown',
+                message_id=message_id,
+                model_name=self.model_name,
+                generation_time=generation_time,
+                token_count=len(response_text.split()),  # Approximate token count
+                finish_reason=finish_reason
+            )
+        except Exception as e:
+            self.log_error("HuggingFace API generation failed", error=str(e), model=self.model_name)
+            raise GenerationError(f"Failed to generate response via HuggingFace API: {str(e)}")
+    async def generate_stream(
+        self,
+        messages: List[ChatMessage],
+        temperature: float = 0.7,
+        max_tokens: int = 512,
+        **kwargs
+    ) -> AsyncGenerator[StreamChunk, None]:
+        """Generate a streaming response using HuggingFace API"""
+        if not self.is_loaded:
+            raise ModelNotLoadedError("HuggingFace API client not initialized")
+        message_id = str(uuid.uuid4())
+        session_id = messages[-1].metadata.get('session_id', 'unknown') if messages[-1].metadata else 'unknown'
+        chunk_id = 0
+        try:
+            # Validate parameters
+            params = self.validate_parameters(
+                temperature=temperature,
+                max_tokens=max_tokens,
+                **kwargs
+            )
+            # Format messages
+            api_messages = self._format_messages_for_api(messages)
+            # Create streaming generator
+            loop = asyncio.get_event_loop()
+            def stream_generator():
+                return self.client.chat_completion(
+                    messages=api_messages,
+                    max_tokens=params['max_tokens'],
+                    temperature=params['temperature'],
+                    top_p=params.get('top_p', 0.9),
+                    stream=True
+                )
+            # Get the streaming response
+            stream = await loop.run_in_executor(None, stream_generator)
+            # Process streaming chunks
+            for chunk in stream:
+                if hasattr(chunk, 'choices') and chunk.choices:
+                    delta = chunk.choices[0].delta
+                    if hasattr(delta, 'content') and delta.content:
+                        yield StreamChunk(
+                            content=delta.content,
+                            session_id=session_id,
+                            message_id=message_id,
+                            chunk_id=chunk_id,
+                            is_final=False
+                        )
+                        chunk_id += 1
+                        # Add small delay to prevent overwhelming the client
+                        await asyncio.sleep(settings.stream_delay)
+            # Send final chunk
+            yield StreamChunk(
+                content="",
+                session_id=session_id,
+                message_id=message_id,
+                chunk_id=chunk_id,
+                is_final=True
+            )
+        except Exception as e:
+            self.log_error("HuggingFace API streaming failed", error=str(e), model=self.model_name)
+            raise GenerationError(f"Failed to generate streaming response via HuggingFace API: {str(e)}")
+    def get_model_info(self) -> Dict[str, Any]:
+        """Get information about the current model"""
+        return {
+            "name": self.model_name,
+            "type": "huggingface_api",
+            "loaded": self.is_loaded,
+            "api_endpoint": self.inference_url,
+            "capabilities": self.capabilities,
+            "parameters": self.parameters,
+            "requires_token": True,
+            "token_configured": bool(self.api_token)
+        }
+    async def health_check(self) -> Dict[str, Any]:
+        """Perform a health check on the HuggingFace API"""
+        try:
+            if not self.is_loaded:
+                return {
+                    "status": "unhealthy",
+                    "reason": "client_not_initialized",
+                    "model_name": self.model_name
+                }
+            # Test API connectivity
+            start_time = time.time()
+            test_messages = [ChatMessage(role="user", content="Hi")]
+            try:
+                response = await asyncio.wait_for(
+                    self.generate_response(
+                        test_messages,
+                        temperature=0.1,
+                        max_tokens=5
+                    ),
+                    timeout=15.0
+                )
+                response_time = time.time() - start_time
+                return {
+                    "status": "healthy",
+                    "model_name": self.model_name,
+                    "response_time": response_time,
+                    "api_endpoint": self.inference_url
+                }
+            except asyncio.TimeoutError:
+                return {
+                    "status": "unhealthy",
+                    "reason": "api_timeout",
+                    "model_name": self.model_name,
+                    "api_endpoint": self.inference_url
+                }
+        except Exception as e:
+            self.log_error("HuggingFace API health check failed", error=str(e), model=self.model_name)
+            return {
+                "status": "unhealthy",
+                "reason": "api_error",
+                "error": str(e),
+                "model_name": self.model_name,
+                "api_endpoint": self.inference_url
+            }

app/services/model_backends/local_hf.py ADDED Viewed

	@@ -0,0 +1,330 @@

+"""
+Local HuggingFace model backend
+Loads and runs models locally using transformers library
+"""
+import asyncio
+import time
+import uuid
+from typing import AsyncGenerator, List, Dict, Any, Optional
+from datetime import datetime
+import torch
+from transformers import (
+    AutoTokenizer,
+    AutoModelForCausalLM,
+    TextIteratorStreamer,
+    GenerationConfig
+)
+from threading import Thread
+from queue import Queue
+from .base import ModelBackend, ModelLoadError, GenerationError, ModelNotLoadedError
+from ...models.schemas import ChatMessage, ChatResponse, StreamChunk
+from ...core.config import settings
+class LocalHuggingFaceBackend(ModelBackend):
+    """Local HuggingFace model backend using transformers"""
+    def __init__(self, model_name: str, **kwargs):
+        super().__init__(model_name, **kwargs)
+        self.tokenizer = None
+        self.model = None
+        self.device = kwargs.get('device', settings.device)
+        self.capabilities = ["chat", "streaming", "instruction_following"]
+        # Generation parameters
+        self.parameters = {
+            'temperature': kwargs.get('temperature', settings.temperature),
+            'max_tokens': kwargs.get('max_tokens', settings.max_new_tokens),
+            'top_p': kwargs.get('top_p', settings.top_p),
+            'top_k': kwargs.get('top_k', settings.top_k),
+        }
+    async def load_model(self) -> bool:
+        """Load the HuggingFace model and tokenizer"""
+        try:
+            self.log_info("Loading local HuggingFace model", model=self.model_name)
+            # Determine device
+            if self.device == "auto":
+                self.device = "cuda" if torch.cuda.is_available() else "cpu"
+            self.log_info("Using device", device=self.device)
+            # Load tokenizer
+            self.log_info("Loading tokenizer")
+            self.tokenizer = AutoTokenizer.from_pretrained(
+                self.model_name,
+                trust_remote_code=True,
+                padding_side="left"
+            )
+            # Add pad token if not present
+            if self.tokenizer.pad_token is None:
+                self.tokenizer.pad_token = self.tokenizer.eos_token
+            # Load model
+            self.log_info("Loading model")
+            self.model = AutoModelForCausalLM.from_pretrained(
+                self.model_name,
+                trust_remote_code=True,
+                torch_dtype=torch.float16 if self.device == "cuda" else torch.float32,
+                device_map="auto" if self.device == "cuda" else None,
+                low_cpu_mem_usage=True
+            )
+            if self.device == "cpu":
+                self.model = self.model.to(self.device)
+            # Set model to evaluation mode
+            self.model.eval()
+            self.is_loaded = True
+            self.log_info("Model loaded successfully",
+                         model=self.model_name,
+                         device=self.device,
+                         parameters=self.model.num_parameters() if hasattr(self.model, 'num_parameters') else 'unknown')
+            return True
+        except Exception as e:
+            self.log_error("Failed to load model", error=str(e), model=self.model_name)
+            raise ModelLoadError(f"Failed to load model {self.model_name}: {str(e)}")
+    async def unload_model(self) -> bool:
+        """Unload the model and free memory"""
+        try:
+            if self.model is not None:
+                del self.model
+                self.model = None
+            if self.tokenizer is not None:
+                del self.tokenizer
+                self.tokenizer = None
+            # Clear CUDA cache if using GPU
+            if torch.cuda.is_available():
+                torch.cuda.empty_cache()
+            self.is_loaded = False
+            self.log_info("Model unloaded successfully", model=self.model_name)
+            return True
+        except Exception as e:
+            self.log_error("Failed to unload model", error=str(e), model=self.model_name)
+            return False
+    def _prepare_chat_input(self, messages: List[ChatMessage]) -> str:
+        """Prepare chat messages for the model"""
+        if not self.tokenizer:
+            raise ModelNotLoadedError("Tokenizer not loaded")
+        # Check if tokenizer has chat template
+        if hasattr(self.tokenizer, 'apply_chat_template') and self.tokenizer.chat_template:
+            # Use the model's chat template
+            formatted_messages = [{"role": msg.role, "content": msg.content} for msg in messages]
+            return self.tokenizer.apply_chat_template(
+                formatted_messages,
+                tokenize=False,
+                add_generation_prompt=True
+            )
+        else:
+            # Fallback to simple concatenation
+            chat_text = ""
+            for msg in messages:
+                if msg.role == "system":
+                    chat_text += f"System: {msg.content}\n"
+                elif msg.role == "user":
+                    chat_text += f"User: {msg.content}\n"
+                elif msg.role == "assistant":
+                    chat_text += f"Assistant: {msg.content}\n"
+            chat_text += "Assistant: "
+            return chat_text
+    async def generate_response(
+        self,
+        messages: List[ChatMessage],
+        temperature: float = 0.7,
+        max_tokens: int = 512,
+        **kwargs
+    ) -> ChatResponse:
+        """Generate a complete response"""
+        if not self.is_loaded:
+            raise ModelNotLoadedError("Model not loaded")
+        start_time = time.time()
+        message_id = str(uuid.uuid4())
+        try:
+            # Validate parameters
+            params = self.validate_parameters(
+                temperature=temperature,
+                max_tokens=max_tokens,
+                **kwargs
+            )
+            # Prepare input
+            chat_input = self._prepare_chat_input(messages)
+            # Tokenize input
+            inputs = self.tokenizer(
+                chat_input,
+                return_tensors="pt",
+                padding=True,
+                truncation=True,
+                max_length=settings.max_length - params['max_tokens']
+            ).to(self.device)
+            # Generate response
+            with torch.no_grad():
+                outputs = self.model.generate(
+                    **inputs,
+                    max_new_tokens=params['max_tokens'],
+                    temperature=params['temperature'],
+                    top_p=params['top_p'],
+                    top_k=params['top_k'],
+                    do_sample=True,
+                    pad_token_id=self.tokenizer.pad_token_id,
+                    eos_token_id=self.tokenizer.eos_token_id,
+                    repetition_penalty=1.1,
+                    no_repeat_ngram_size=3
+                )
+            # Decode response
+            input_length = inputs['input_ids'].shape[1]
+            generated_tokens = outputs[0][input_length:]
+            response_text = self.tokenizer.decode(generated_tokens, skip_special_tokens=True)
+            generation_time = time.time() - start_time
+            return ChatResponse(
+                message=response_text.strip(),
+                session_id=messages[-1].metadata.get('session_id', 'unknown') if messages[-1].metadata else 'unknown',
+                message_id=message_id,
+                model_name=self.model_name,
+                generation_time=generation_time,
+                token_count=len(generated_tokens),
+                finish_reason="stop"
+            )
+        except Exception as e:
+            self.log_error("Generation failed", error=str(e), model=self.model_name)
+            raise GenerationError(f"Failed to generate response: {str(e)}")
+    async def generate_stream(
+        self,
+        messages: List[ChatMessage],
+        temperature: float = 0.7,
+        max_tokens: int = 512,
+        **kwargs
+    ) -> AsyncGenerator[StreamChunk, None]:
+        """Generate a streaming response"""
+        if not self.is_loaded:
+            raise ModelNotLoadedError("Model not loaded")
+        message_id = str(uuid.uuid4())
+        session_id = messages[-1].metadata.get('session_id', 'unknown') if messages[-1].metadata else 'unknown'
+        chunk_id = 0
+        try:
+            # Validate parameters
+            params = self.validate_parameters(
+                temperature=temperature,
+                max_tokens=max_tokens,
+                **kwargs
+            )
+            # Prepare input
+            chat_input = self._prepare_chat_input(messages)
+            # Tokenize input
+            inputs = self.tokenizer(
+                chat_input,
+                return_tensors="pt",
+                padding=True,
+                truncation=True,
+                max_length=settings.max_length - params['max_tokens']
+            ).to(self.device)
+            # Create streamer
+            streamer = TextIteratorStreamer(
+                self.tokenizer,
+                skip_prompt=True,
+                skip_special_tokens=True
+            )
+            # Generation parameters
+            generation_kwargs = {
+                **inputs,
+                'max_new_tokens': params['max_tokens'],
+                'temperature': params['temperature'],
+                'top_p': params['top_p'],
+                'top_k': params['top_k'],
+                'do_sample': True,
+                'pad_token_id': self.tokenizer.pad_token_id,
+                'eos_token_id': self.tokenizer.eos_token_id,
+                'repetition_penalty': 1.1,
+                'no_repeat_ngram_size': 3,
+                'streamer': streamer
+            }
+            # Start generation in a separate thread
+            generation_thread = Thread(
+                target=self.model.generate,
+                kwargs=generation_kwargs
+            )
+            generation_thread.start()
+            # Stream the response
+            for chunk_text in streamer:
+                if chunk_text:  # Skip empty chunks
+                    yield StreamChunk(
+                        content=chunk_text,
+                        session_id=session_id,
+                        message_id=message_id,
+                        chunk_id=chunk_id,
+                        is_final=False
+                    )
+                    chunk_id += 1
+                    # Add small delay to prevent overwhelming the client
+                    await asyncio.sleep(settings.stream_delay)
+            # Send final chunk
+            yield StreamChunk(
+                content="",
+                session_id=session_id,
+                message_id=message_id,
+                chunk_id=chunk_id,
+                is_final=True
+            )
+            # Wait for generation thread to complete
+            generation_thread.join()
+        except Exception as e:
+            self.log_error("Streaming generation failed", error=str(e), model=self.model_name)
+            raise GenerationError(f"Failed to generate streaming response: {str(e)}")
+    def get_model_info(self) -> Dict[str, Any]:
+        """Get information about the current model"""
+        info = {
+            "name": self.model_name,
+            "type": "local_huggingface",
+            "loaded": self.is_loaded,
+            "device": self.device,
+            "capabilities": self.capabilities,
+            "parameters": self.parameters
+        }
+        if self.model and hasattr(self.model, 'config'):
+            info["model_config"] = {
+                "vocab_size": getattr(self.model.config, 'vocab_size', None),
+                "hidden_size": getattr(self.model.config, 'hidden_size', None),
+                "num_layers": getattr(self.model.config, 'num_hidden_layers', None),
+                "num_attention_heads": getattr(self.model.config, 'num_attention_heads', None),
+            }
+        return info

app/services/model_backends/minimax_api.py ADDED Viewed

	@@ -0,0 +1,341 @@

+"""
+MiniMax API backend
+Uses MiniMax's API for their M1 model with reasoning capabilities
+"""
+import asyncio
+import time
+import uuid
+import json
+from typing import AsyncGenerator, List, Dict, Any, Optional
+from datetime import datetime
+import httpx
+from .base import ModelBackend, ModelLoadError, GenerationError, ModelNotLoadedError
+from ...models.schemas import ChatMessage, ChatResponse, StreamChunk
+from ...core.config import settings
+class MiniMaxAPIBackend(ModelBackend):
+    """MiniMax API backend for M1 model"""
+    def __init__(self, model_name: str, **kwargs):
+        super().__init__(model_name, **kwargs)
+        self.api_url = kwargs.get('api_url', settings.minimax_api_url)
+        self.api_key = kwargs.get('api_key', settings.minimax_api_key)
+        self.model_version = kwargs.get('model_version', settings.minimax_model_version)
+        self.capabilities = ["chat", "streaming", "reasoning", "api_based"]
+        # Generation parameters
+        self.parameters = {
+            'temperature': kwargs.get('temperature', settings.temperature),
+            'max_tokens': kwargs.get('max_tokens', settings.max_new_tokens),
+            'top_p': kwargs.get('top_p', settings.top_p),
+        }
+    async def load_model(self) -> bool:
+        """Initialize the MiniMax API client"""
+        try:
+            if not self.api_key or not self.api_url:
+                raise ModelLoadError("MiniMax API key and URL are required")
+            self.log_info("Initializing MiniMax API client", model=self.model_name)
+            # Test the connection
+            await self._test_connection()
+            self.is_loaded = True
+            self.log_info("MiniMax API client initialized successfully", model=self.model_name)
+            return True
+        except Exception as e:
+            self.log_error("Failed to initialize MiniMax API client", error=str(e), model=self.model_name)
+            raise ModelLoadError(f"Failed to initialize MiniMax API for {self.model_name}: {str(e)}")
+    async def unload_model(self) -> bool:
+        """Clean up the API client"""
+        try:
+            self.is_loaded = False
+            self.log_info("MiniMax API client cleaned up", model=self.model_name)
+            return True
+        except Exception as e:
+            self.log_error("Failed to cleanup MiniMax API client", error=str(e), model=self.model_name)
+            return False
+    async def _test_connection(self):
+        """Test the MiniMax API connection"""
+        try:
+            test_data = {
+                'model': self.model_version,
+                'messages': [{"role": "user", "content": "Hello"}],
+                'stream': False,
+                'max_tokens': 5,
+                'temperature': 0.1
+            }
+            async with httpx.AsyncClient() as client:
+                response = await client.post(
+                    self.api_url,
+                    headers={
+                        'Content-Type': 'application/json',
+                        'Authorization': f'Bearer {self.api_key}'
+                    },
+                    json=test_data,
+                    timeout=10.0
+                )
+                if response.status_code != 200:
+                    raise Exception(f"API test failed with status {response.status_code}")
+            self.log_info("MiniMax API connection test successful", model=self.model_name)
+        except Exception as e:
+            self.log_error("MiniMax API connection test failed", error=str(e), model=self.model_name)
+            raise
+    def _format_messages_for_api(self, messages: List[ChatMessage]) -> List[Dict[str, str]]:
+        """Format messages for MiniMax API"""
+        formatted = []
+        for msg in messages:
+            formatted.append({
+                "role": msg.role,
+                "content": msg.content
+            })
+        return formatted
+    async def generate_response(
+        self,
+        messages: List[ChatMessage],
+        temperature: float = 0.7,
+        max_tokens: int = 512,
+        **kwargs
+    ) -> ChatResponse:
+        """Generate a complete response using MiniMax API"""
+        if not self.is_loaded:
+            raise ModelNotLoadedError("MiniMax API client not initialized")
+        start_time = time.time()
+        message_id = str(uuid.uuid4())
+        try:
+            # Validate parameters
+            params = self.validate_parameters(
+                temperature=temperature,
+                max_tokens=max_tokens,
+                **kwargs
+            )
+            # Format messages
+            api_messages = self._format_messages_for_api(messages)
+            # Prepare request data
+            request_data = {
+                'model': self.model_version,
+                'messages': api_messages,
+                'stream': False,
+                'max_tokens': params['max_tokens'],
+                'temperature': params['temperature'],
+                'top_p': params.get('top_p', 0.9)
+            }
+            # Make API call
+            async with httpx.AsyncClient() as client:
+                response = await client.post(
+                    self.api_url,
+                    headers={
+                        'Content-Type': 'application/json',
+                        'Authorization': f'Bearer {self.api_key}'
+                    },
+                    json=request_data,
+                    timeout=30.0
+                )
+                if response.status_code != 200:
+                    raise GenerationError(f"API request failed with status {response.status_code}")
+                response_data = response.json()
+            # Extract response text
+            if 'choices' in response_data and response_data['choices']:
+                choice = response_data['choices'][0]
+                if 'message' in choice:
+                    response_text = choice['message'].get('content', '')
+                    reasoning_content = choice['message'].get('reasoning_content', '')
+                    # Combine reasoning and main content if both exist
+                    if reasoning_content and response_text:
+                        full_response = f"[Reasoning: {reasoning_content}]\n\n{response_text}"
+                    else:
+                        full_response = response_text or reasoning_content
+                else:
+                    full_response = str(response_data)
+            else:
+                full_response = str(response_data)
+            generation_time = time.time() - start_time
+            return ChatResponse(
+                message=full_response.strip(),
+                session_id=messages[-1].metadata.get('session_id', 'unknown') if messages[-1].metadata else 'unknown',
+                message_id=message_id,
+                model_name=self.model_name,
+                generation_time=generation_time,
+                token_count=len(full_response.split()),  # Approximate token count
+                finish_reason="stop"
+            )
+        except Exception as e:
+            self.log_error("MiniMax API generation failed", error=str(e), model=self.model_name)
+            raise GenerationError(f"Failed to generate response via MiniMax API: {str(e)}")
+    async def generate_stream(
+        self,
+        messages: List[ChatMessage],
+        temperature: float = 0.7,
+        max_tokens: int = 512,
+        **kwargs
+    ) -> AsyncGenerator[StreamChunk, None]:
+        """Generate a streaming response using MiniMax API"""
+        if not self.is_loaded:
+            raise ModelNotLoadedError("MiniMax API client not initialized")
+        message_id = str(uuid.uuid4())
+        session_id = messages[-1].metadata.get('session_id', 'unknown') if messages[-1].metadata else 'unknown'
+        chunk_id = 0
+        try:
+            # Validate parameters
+            params = self.validate_parameters(
+                temperature=temperature,
+                max_tokens=max_tokens,
+                **kwargs
+            )
+            # Format messages
+            api_messages = self._format_messages_for_api(messages)
+            # Prepare request data
+            request_data = {
+                'model': self.model_version,
+                'messages': api_messages,
+                'stream': True,
+                'max_tokens': params['max_tokens'],
+                'temperature': params['temperature'],
+                'top_p': params.get('top_p', 0.9)
+            }
+            # Make streaming API call
+            async with httpx.AsyncClient() as client:
+                async with client.stream(
+                    'POST',
+                    self.api_url,
+                    headers={
+                        'Content-Type': 'application/json',
+                        'Authorization': f'Bearer {self.api_key}'
+                    },
+                    json=request_data,
+                    timeout=60.0
+                ) as response:
+                    if response.status_code != 200:
+                        raise GenerationError(f"Streaming request failed with status {response.status_code}")
+                    async for line in response.aiter_lines():
+                        if line.startswith('data:'):
+                            try:
+                                data = json.loads(line[5:])  # Remove 'data:' prefix
+                                if 'choices' not in data:
+                                    continue
+                                choice = data['choices'][0]
+                                # Handle delta content
+                                if 'delta' in choice:
+                                    delta = choice['delta']
+                                    reasoning_content = delta.get('reasoning_content', '')
+                                    content = delta.get('content', '')
+                                    # Send reasoning content if available
+                                    if reasoning_content:
+                                        yield StreamChunk(
+                                            content=f"[Thinking: {reasoning_content}]",
+                                            session_id=session_id,
+                                            message_id=message_id,
+                                            chunk_id=chunk_id,
+                                            is_final=False
+                                        )
+                                        chunk_id += 1
+                                    # Send main content
+                                    if content:
+                                        yield StreamChunk(
+                                            content=content,
+                                            session_id=session_id,
+                                            message_id=message_id,
+                                            chunk_id=chunk_id,
+                                            is_final=False
+                                        )
+                                        chunk_id += 1
+                                # Handle complete message
+                                elif 'message' in choice:
+                                    message_data = choice['message']
+                                    reasoning_content = message_data.get('reasoning_content', '')
+                                    main_content = message_data.get('content', '')
+                                    if reasoning_content:
+                                        yield StreamChunk(
+                                            content=f"\n[Final reasoning: {reasoning_content}]\n",
+                                            session_id=session_id,
+                                            message_id=message_id,
+                                            chunk_id=chunk_id,
+                                            is_final=False
+                                        )
+                                        chunk_id += 1
+                                    if main_content:
+                                        yield StreamChunk(
+                                            content=main_content,
+                                            session_id=session_id,
+                                            message_id=message_id,
+                                            chunk_id=chunk_id,
+                                            is_final=False
+                                        )
+                                        chunk_id += 1
+                            except json.JSONDecodeError:
+                                continue
+                            # Add small delay
+                            await asyncio.sleep(settings.stream_delay)
+            # Send final chunk
+            yield StreamChunk(
+                content="",
+                session_id=session_id,
+                message_id=message_id,
+                chunk_id=chunk_id,
+                is_final=True
+            )
+        except Exception as e:
+            self.log_error("MiniMax API streaming failed", error=str(e), model=self.model_name)
+            raise GenerationError(f"Failed to generate streaming response via MiniMax API: {str(e)}")
+    def get_model_info(self) -> Dict[str, Any]:
+        """Get information about the current model"""
+        return {
+            "name": self.model_name,
+            "type": "minimax_api",
+            "loaded": self.is_loaded,
+            "provider": "MiniMax",
+            "model_version": self.model_version,
+            "capabilities": self.capabilities,
+            "parameters": self.parameters,
+            "requires_api_key": True,
+            "api_key_configured": bool(self.api_key),
+            "api_url": self.api_url
+        }

app/services/model_backends/openai_api.py ADDED Viewed

	@@ -0,0 +1,291 @@

+"""
+OpenAI API backend
+Uses OpenAI's API for model access (GPT-3.5, GPT-4, etc.)
+"""
+import asyncio
+import time
+import uuid
+from typing import AsyncGenerator, List, Dict, Any, Optional
+from datetime import datetime
+import openai
+from .base import ModelBackend, ModelLoadError, GenerationError, ModelNotLoadedError
+from ...models.schemas import ChatMessage, ChatResponse, StreamChunk
+from ...core.config import settings
+class OpenAIAPIBackend(ModelBackend):
+    """OpenAI API backend"""
+    def __init__(self, model_name: str, **kwargs):
+        super().__init__(model_name, **kwargs)
+        self.client = None
+        self.api_key = kwargs.get('api_key', settings.openai_api_key)
+        self.org_id = kwargs.get('org_id', settings.openai_org_id)
+        self.capabilities = ["chat", "streaming", "api_based", "function_calling"]
+        # Generation parameters
+        self.parameters = {
+            'temperature': kwargs.get('temperature', settings.temperature),
+            'max_tokens': kwargs.get('max_tokens', settings.max_new_tokens),
+            'top_p': kwargs.get('top_p', settings.top_p),
+        }
+    async def load_model(self) -> bool:
+        """Initialize the OpenAI API client"""
+        try:
+            if not self.api_key:
+                raise ModelLoadError("OpenAI API key is required")
+            self.log_info("Initializing OpenAI API client", model=self.model_name)
+            # Initialize the OpenAI client
+            self.client = openai.AsyncOpenAI(
+                api_key=self.api_key,
+                organization=self.org_id
+            )
+            # Test the connection
+            await self._test_connection()
+            self.is_loaded = True
+            self.log_info("OpenAI API client initialized successfully", model=self.model_name)
+            return True
+        except Exception as e:
+            self.log_error("Failed to initialize OpenAI API client", error=str(e), model=self.model_name)
+            raise ModelLoadError(f"Failed to initialize OpenAI API for {self.model_name}: {str(e)}")
+    async def unload_model(self) -> bool:
+        """Clean up the API client"""
+        try:
+            if self.client:
+                await self.client.close()
+            self.client = None
+            self.is_loaded = False
+            self.log_info("OpenAI API client cleaned up", model=self.model_name)
+            return True
+        except Exception as e:
+            self.log_error("Failed to cleanup OpenAI API client", error=str(e), model=self.model_name)
+            return False
+    async def _test_connection(self):
+        """Test the OpenAI API connection"""
+        try:
+            # Simple test request
+            response = await self.client.chat.completions.create(
+                model=self.model_name,
+                messages=[{"role": "user", "content": "Hello"}],
+                max_tokens=5,
+                temperature=0.1
+            )
+            self.log_info("OpenAI API connection test successful", model=self.model_name)
+        except Exception as e:
+            self.log_error("OpenAI API connection test failed", error=str(e), model=self.model_name)
+            raise
+    def _format_messages_for_api(self, messages: List[ChatMessage]) -> List[Dict[str, str]]:
+        """Format messages for OpenAI API"""
+        formatted = []
+        for msg in messages:
+            formatted.append({
+                "role": msg.role,
+                "content": msg.content
+            })
+        return formatted
+    async def generate_response(
+        self,
+        messages: List[ChatMessage],
+        temperature: float = 0.7,
+        max_tokens: int = 512,
+        **kwargs
+    ) -> ChatResponse:
+        """Generate a complete response using OpenAI API"""
+        if not self.is_loaded:
+            raise ModelNotLoadedError("OpenAI API client not initialized")
+        start_time = time.time()
+        message_id = str(uuid.uuid4())
+        try:
+            # Validate parameters
+            params = self.validate_parameters(
+                temperature=temperature,
+                max_tokens=max_tokens,
+                **kwargs
+            )
+            # Format messages
+            api_messages = self._format_messages_for_api(messages)
+            # Make API call
+            response = await self.client.chat.completions.create(
+                model=self.model_name,
+                messages=api_messages,
+                max_tokens=params['max_tokens'],
+                temperature=params['temperature'],
+                top_p=params.get('top_p', 0.9),
+                stream=False
+            )
+            # Extract response
+            response_text = response.choices[0].message.content
+            finish_reason = response.choices[0].finish_reason
+            token_count = response.usage.completion_tokens if response.usage else None
+            generation_time = time.time() - start_time
+            return ChatResponse(
+                message=response_text.strip(),
+                session_id=messages[-1].metadata.get('session_id', 'unknown') if messages[-1].metadata else 'unknown',
+                message_id=message_id,
+                model_name=self.model_name,
+                generation_time=generation_time,
+                token_count=token_count,
+                finish_reason=finish_reason
+            )
+        except Exception as e:
+            self.log_error("OpenAI API generation failed", error=str(e), model=self.model_name)
+            raise GenerationError(f"Failed to generate response via OpenAI API: {str(e)}")
+    async def generate_stream(
+        self,
+        messages: List[ChatMessage],
+        temperature: float = 0.7,
+        max_tokens: int = 512,
+        **kwargs
+    ) -> AsyncGenerator[StreamChunk, None]:
+        """Generate a streaming response using OpenAI API"""
+        if not self.is_loaded:
+            raise ModelNotLoadedError("OpenAI API client not initialized")
+        message_id = str(uuid.uuid4())
+        session_id = messages[-1].metadata.get('session_id', 'unknown') if messages[-1].metadata else 'unknown'
+        chunk_id = 0
+        try:
+            # Validate parameters
+            params = self.validate_parameters(
+                temperature=temperature,
+                max_tokens=max_tokens,
+                **kwargs
+            )
+            # Format messages
+            api_messages = self._format_messages_for_api(messages)
+            # Create streaming request
+            stream = await self.client.chat.completions.create(
+                model=self.model_name,
+                messages=api_messages,
+                max_tokens=params['max_tokens'],
+                temperature=params['temperature'],
+                top_p=params.get('top_p', 0.9),
+                stream=True
+            )
+            # Process streaming chunks
+            async for chunk in stream:
+                if chunk.choices and chunk.choices[0].delta.content:
+                    content = chunk.choices[0].delta.content
+                    yield StreamChunk(
+                        content=content,
+                        session_id=session_id,
+                        message_id=message_id,
+                        chunk_id=chunk_id,
+                        is_final=False
+                    )
+                    chunk_id += 1
+                    # Add small delay to prevent overwhelming the client
+                    await asyncio.sleep(settings.stream_delay)
+                # Check if this is the final chunk
+                if chunk.choices and chunk.choices[0].finish_reason:
+                    break
+            # Send final chunk
+            yield StreamChunk(
+                content="",
+                session_id=session_id,
+                message_id=message_id,
+                chunk_id=chunk_id,
+                is_final=True
+            )
+        except Exception as e:
+            self.log_error("OpenAI API streaming failed", error=str(e), model=self.model_name)
+            raise GenerationError(f"Failed to generate streaming response via OpenAI API: {str(e)}")
+    def get_model_info(self) -> Dict[str, Any]:
+        """Get information about the current model"""
+        return {
+            "name": self.model_name,
+            "type": "openai_api",
+            "loaded": self.is_loaded,
+            "provider": "OpenAI",
+            "capabilities": self.capabilities,
+            "parameters": self.parameters,
+            "requires_api_key": True,
+            "api_key_configured": bool(self.api_key),
+            "organization": self.org_id
+        }
+    async def health_check(self) -> Dict[str, Any]:
+        """Perform a health check on the OpenAI API"""
+        try:
+            if not self.is_loaded:
+                return {
+                    "status": "unhealthy",
+                    "reason": "client_not_initialized",
+                    "model_name": self.model_name
+                }
+            # Test API connectivity
+            start_time = time.time()
+            test_messages = [ChatMessage(role="user", content="Hi")]
+            try:
+                response = await asyncio.wait_for(
+                    self.generate_response(
+                        test_messages,
+                        temperature=0.1,
+                        max_tokens=5
+                    ),
+                    timeout=10.0
+                )
+                response_time = time.time() - start_time
+                return {
+                    "status": "healthy",
+                    "model_name": self.model_name,
+                    "response_time": response_time,
+                    "provider": "OpenAI"
+                }
+            except asyncio.TimeoutError:
+                return {
+                    "status": "unhealthy",
+                    "reason": "api_timeout",
+                    "model_name": self.model_name,
+                    "provider": "OpenAI"
+                }
+        except Exception as e:
+            self.log_error("OpenAI API health check failed", error=str(e), model=self.model_name)
+            return {
+                "status": "unhealthy",
+                "reason": "api_error",
+                "error": str(e),
+                "model_name": self.model_name,
+                "provider": "OpenAI"
+            }

app/services/model_manager.py ADDED Viewed

	@@ -0,0 +1,382 @@

+"""
+Model Manager - Central hub for managing different model backends
+Handles backend selection based on environment configuration
+"""
+from typing import Optional, Dict, Any
+from ..core.config import settings
+from ..core.logging import LoggerMixin
+from .model_backends.base import ModelBackend, ModelBackendError
+from .model_backends.local_hf import LocalHuggingFaceBackend
+from .model_backends.hf_api import HuggingFaceAPIBackend
+from .model_backends.openai_api import OpenAIAPIBackend
+from .model_backends.anthropic_api import AnthropicAPIBackend
+from .model_backends.minimax_api import MiniMaxAPIBackend
+from .model_backends.google_api import GoogleAIBackend
+class ModelManager(LoggerMixin):
+    """
+    Central manager for model backends
+    Handles initialization, switching, and management of different model types
+    """
+    def __init__(self):
+        self.current_backend: Optional[ModelBackend] = None
+        self.backend_type = settings.model_type.lower()
+        self.model_name = settings.model_name
+        self.is_initialized = False
+    async def initialize(self) -> bool:
+        """
+        Initialize the model backend based on configuration
+        Returns:
+            bool: True if initialization successful, False otherwise
+        """
+        try:
+            self.log_info("Initializing model manager",
+                         backend_type=self.backend_type,
+                         model_name=self.model_name)
+            # Validate configuration
+            if not settings.validate_model_config():
+                self.log_error("Invalid model configuration", backend_type=self.backend_type)
+                return False
+            # Create the appropriate backend
+            backend = self._create_backend()
+            if not backend:
+                self.log_error("Failed to create backend", backend_type=self.backend_type)
+                return False
+            # Load the model
+            success = await backend.load_model()
+            if success:
+                self.current_backend = backend
+                self.is_initialized = True
+                self.log_info("Model manager initialized successfully",
+                             backend_type=self.backend_type,
+                             model_name=self.model_name)
+                return True
+            else:
+                self.log_error("Failed to load model", backend_type=self.backend_type)
+                return False
+        except Exception as e:
+            self.log_error("Model manager initialization failed",
+                          error=str(e),
+                          backend_type=self.backend_type)
+            return False
+    def _create_backend(self) -> Optional[ModelBackend]:
+        """Create the appropriate model backend based on configuration"""
+        try:
+            if self.backend_type == "local":
+                return LocalHuggingFaceBackend(
+                    model_name=self.model_name,
+                    device=settings.device,
+                    temperature=settings.temperature,
+                    max_tokens=settings.max_new_tokens,
+                    top_p=settings.top_p,
+                    top_k=settings.top_k
+                )
+            elif self.backend_type == "hf_api":
+                return HuggingFaceAPIBackend(
+                    model_name=self.model_name,
+                    api_token=settings.hf_api_token,
+                    inference_url=settings.hf_inference_url,
+                    temperature=settings.temperature,
+                    max_tokens=settings.max_new_tokens,
+                    top_p=settings.top_p
+                )
+            elif self.backend_type == "openai":
+                return OpenAIAPIBackend(
+                    model_name=self.model_name,
+                    api_key=settings.openai_api_key,
+                    org_id=settings.openai_org_id,
+                    temperature=settings.temperature,
+                    max_tokens=settings.max_new_tokens,
+                    top_p=settings.top_p
+                )
+            elif self.backend_type == "anthropic":
+                return AnthropicAPIBackend(
+                    model_name=self.model_name,
+                    api_key=settings.anthropic_api_key,
+                    temperature=settings.temperature,
+                    max_tokens=settings.max_new_tokens,
+                    top_p=settings.top_p
+                )
+            elif self.backend_type == "minimax":
+                return MiniMaxAPIBackend(
+                    model_name=self.model_name,
+                    api_key=settings.minimax_api_key,
+                    api_url=settings.minimax_api_url,
+                    model_version=settings.minimax_model_version,
+                    temperature=settings.temperature,
+                    max_tokens=settings.max_new_tokens,
+                    top_p=settings.top_p
+                )
+            elif self.backend_type == "google":
+                return GoogleAIBackend(
+                    model_name=self.model_name,
+                    api_key=settings.google_api_key,
+                    temperature=settings.temperature,
+                    max_tokens=settings.max_new_tokens,
+                    top_p=settings.top_p,
+                    top_k=settings.top_k
+                )
+            else:
+                self.log_error("Unsupported backend type", backend_type=self.backend_type)
+                return None
+        except Exception as e:
+            self.log_error("Failed to create backend", error=str(e), backend_type=self.backend_type)
+            return None
+    async def shutdown(self) -> bool:
+        """
+        Shutdown the current backend and cleanup resources
+        Returns:
+            bool: True if shutdown successful, False otherwise
+        """
+        try:
+            if self.current_backend:
+                success = await self.current_backend.unload_model()
+                self.current_backend = None
+                self.is_initialized = False
+                self.log_info("Model manager shutdown successfully")
+                return success
+            return True
+        except Exception as e:
+            self.log_error("Model manager shutdown failed", error=str(e))
+            return False
+    def get_backend(self) -> Optional[ModelBackend]:
+        """
+        Get the current model backend
+        Returns:
+            ModelBackend: Current backend instance or None if not initialized
+        """
+        return self.current_backend
+    def is_ready(self) -> bool:
+        """
+        Check if the model manager is ready for inference
+        Returns:
+            bool: True if ready, False otherwise
+        """
+        return (self.is_initialized and
+                self.current_backend is not None and
+                self.current_backend.is_model_loaded())
+    def get_model_info(self) -> Dict[str, Any]:
+        """
+        Get information about the current model
+        Returns:
+            Dict containing model information
+        """
+        if not self.current_backend:
+            return {
+                "status": "not_initialized",
+                "backend_type": self.backend_type,
+                "model_name": self.model_name,
+                "is_ready": False
+            }
+        info = self.current_backend.get_model_info()
+        info.update({
+            "is_ready": self.is_ready(),
+            "manager_initialized": self.is_initialized
+        })
+        return info
+    async def health_check(self) -> Dict[str, Any]:
+        """
+        Perform a comprehensive health check
+        Returns:
+            Dict containing health status
+        """
+        if not self.is_ready():
+            return {
+                "status": "unhealthy",
+                "reason": "manager_not_ready",
+                "backend_type": self.backend_type,
+                "model_name": self.model_name,
+                "is_initialized": self.is_initialized,
+                "backend_loaded": self.current_backend is not None
+            }
+        # Delegate to backend health check
+        backend_health = await self.current_backend.health_check()
+        # Add manager-level information
+        backend_health.update({
+            "manager_status": "healthy",
+            "backend_type": self.backend_type,
+            "is_ready": self.is_ready()
+        })
+        return backend_health
+    async def switch_model(self, new_model_name: str, new_backend_type: Optional[str] = None) -> bool:
+        """
+        Switch to a different model (and optionally backend type)
+        Args:
+            new_model_name: Name of the new model
+            new_backend_type: Optional new backend type
+        Returns:
+            bool: True if switch successful, False otherwise
+        """
+        try:
+            self.log_info("Switching model",
+                         current_model=self.model_name,
+                         new_model=new_model_name,
+                         current_backend=self.backend_type,
+                         new_backend=new_backend_type)
+            # Shutdown current backend
+            if self.current_backend:
+                await self.current_backend.unload_model()
+                self.current_backend = None
+            # Update configuration
+            old_model_name = self.model_name
+            old_backend_type = self.backend_type
+            self.model_name = new_model_name
+            if new_backend_type:
+                self.backend_type = new_backend_type.lower()
+            # Try to initialize new backend
+            success = await self.initialize()
+            if not success:
+                # Rollback on failure
+                self.log_warning("Model switch failed, rolling back",
+                               failed_model=new_model_name,
+                               rollback_model=old_model_name)
+                self.model_name = old_model_name
+                self.backend_type = old_backend_type
+                await self.initialize()  # Try to restore previous state
+                return False
+            self.log_info("Model switch successful",
+                         new_model=new_model_name,
+                         new_backend=self.backend_type)
+            return True
+        except Exception as e:
+            self.log_error("Model switch failed", error=str(e))
+            return False
+    def get_supported_backends(self) -> Dict[str, Dict[str, Any]]:
+        """
+        Get information about supported backends
+        Returns:
+            Dict containing backend information
+        """
+        return {
+            "local": {
+                "name": "Local HuggingFace",
+                "description": "Run models locally using transformers",
+                "requires": ["model_name", "device"],
+                "capabilities": ["chat", "streaming", "offline"],
+                "example_models": [
+                    "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
+                    "microsoft/DialoGPT-medium",
+                    "Qwen/Qwen2.5-0.5B-Instruct"
+                ]
+            },
+            "hf_api": {
+                "name": "HuggingFace Inference API",
+                "description": "Use HuggingFace's hosted inference API",
+                "requires": ["model_name", "hf_api_token"],
+                "capabilities": ["chat", "streaming", "serverless"],
+                "example_models": [
+                    "microsoft/DialoGPT-large",
+                    "microsoft/phi-2",
+                    "google/gemma-2b-it"
+                ]
+            },
+            "openai": {
+                "name": "OpenAI API",
+                "description": "Use OpenAI's GPT models",
+                "requires": ["model_name", "openai_api_key"],
+                "capabilities": ["chat", "streaming", "function_calling"],
+                "example_models": [
+                    "gpt-3.5-turbo",
+                    "gpt-4",
+                    "gpt-4-turbo"
+                ]
+            },
+            "anthropic": {
+                "name": "Anthropic API",
+                "description": "Use Anthropic's Claude models",
+                "requires": ["model_name", "anthropic_api_key"],
+                "capabilities": ["chat", "streaming", "long_context"],
+                "example_models": [
+                    "claude-3-haiku-20240307",
+                    "claude-3-sonnet-20240229",
+                    "claude-3-opus-20240229"
+                ]
+            },
+            "minimax": {
+                "name": "MiniMax API",
+                "description": "Use MiniMax's M1 model with reasoning capabilities",
+                "requires": ["model_name", "minimax_api_key", "minimax_api_url"],
+                "capabilities": ["chat", "streaming", "reasoning"],
+                "example_models": [
+                    "MiniMax-M1"
+                ]
+            },
+            "google": {
+                "name": "Google AI Studio",
+                "description": "Use Google's Gemma and other models via AI Studio",
+                "requires": ["model_name", "google_api_key"],
+                "capabilities": ["chat", "streaming", "multimodal"],
+                "example_models": [
+                    "gemini-1.5-flash",
+                    "gemini-1.5-pro",
+                    "gemma-2-9b-it",
+                    "gemma-2-27b-it"
+                ]
+            }
+        }
+# Global model manager instance
+model_manager = ModelManager()
+async def get_model_manager() -> ModelManager:
+    """Get the global model manager instance"""
+    return model_manager
+async def initialize_model_manager() -> bool:
+    """Initialize the global model manager"""
+    return await model_manager.initialize()
+async def shutdown_model_manager() -> bool:
+    """Shutdown the global model manager"""
+    return await model_manager.shutdown()

app/services/session_manager.py ADDED Viewed

	@@ -0,0 +1,400 @@

+"""
+Session Manager - Handles conversation sessions and message history
+Supports both in-memory and Redis-based storage
+"""
+import asyncio
+import time
+from typing import Dict, List, Optional, Any
+from datetime import datetime, timedelta
+import json
+import uuid
+from ..core.config import settings
+from ..core.logging import LoggerMixin
+from ..models.schemas import ChatMessage, ConversationHistory, SessionInfo
+class SessionManager(LoggerMixin):
+    """
+    Manages chat sessions and conversation history
+    Supports both in-memory and Redis storage backends
+    """
+    def __init__(self):
+        self.sessions: Dict[str, ConversationHistory] = {}
+        self.redis_client = None
+        self.use_redis = bool(settings.redis_url)
+        self.session_timeout = settings.session_timeout * 60  # Convert to seconds
+        self.max_sessions_per_user = settings.max_sessions_per_user
+        self.max_messages_per_session = settings.max_messages_per_session
+        # Cleanup task
+        self._cleanup_task = None
+    async def initialize(self) -> bool:
+        """Initialize the session manager"""
+        try:
+            if self.use_redis:
+                await self._initialize_redis()
+            # Start cleanup task
+            self._cleanup_task = asyncio.create_task(self._cleanup_expired_sessions())
+            self.log_info("Session manager initialized",
+                         storage_type="redis" if self.use_redis else "memory",
+                         session_timeout=self.session_timeout)
+            return True
+        except Exception as e:
+            self.log_error("Failed to initialize session manager", error=str(e))
+            return False
+    async def _initialize_redis(self):
+        """Initialize Redis connection"""
+        try:
+            import redis.asyncio as redis
+            self.redis_client = redis.from_url(settings.redis_url)
+            # Test connection
+            await self.redis_client.ping()
+            self.log_info("Redis connection established", url=settings.redis_url)
+        except ImportError:
+            self.log_warning("Redis not available, falling back to memory storage")
+            self.use_redis = False
+        except Exception as e:
+            self.log_error("Redis connection failed", error=str(e))
+            self.use_redis = False
+    async def shutdown(self):
+        """Shutdown the session manager"""
+        try:
+            if self._cleanup_task:
+                self._cleanup_task.cancel()
+                try:
+                    await self._cleanup_task
+                except asyncio.CancelledError:
+                    pass
+            if self.redis_client:
+                await self.redis_client.close()
+            self.log_info("Session manager shutdown complete")
+        except Exception as e:
+            self.log_error("Session manager shutdown failed", error=str(e))
+    async def create_session(self, session_id: str, user_id: Optional[str] = None) -> bool:
+        """
+        Create a new chat session
+        Args:
+            session_id: Unique session identifier
+            user_id: Optional user identifier for session limits
+        Returns:
+            bool: True if session created successfully
+        """
+        try:
+            # Check if session already exists
+            if await self.session_exists(session_id):
+                self.log_info("Session already exists", session_id=session_id)
+                return True
+            # Check user session limits if user_id provided
+            if user_id and self.max_sessions_per_user > 0:
+                user_sessions = await self.get_user_sessions(user_id)
+                if len(user_sessions) >= self.max_sessions_per_user:
+                    self.log_warning("User session limit exceeded",
+                                   user_id=user_id,
+                                   limit=self.max_sessions_per_user)
+                    return False
+            # Create new session
+            session = ConversationHistory(
+                session_id=session_id,
+                messages=[],
+                created_at=datetime.utcnow(),
+                updated_at=datetime.utcnow(),
+                message_count=0
+            )
+            await self._store_session(session)
+            self.log_info("Session created", session_id=session_id, user_id=user_id)
+            return True
+        except Exception as e:
+            self.log_error("Failed to create session", error=str(e), session_id=session_id)
+            return False
+    async def add_message(self, session_id: str, message: ChatMessage) -> bool:
+        """
+        Add a message to a session
+        Args:
+            session_id: Session identifier
+            message: Message to add
+        Returns:
+            bool: True if message added successfully
+        """
+        try:
+            # Get or create session
+            session = await self.get_session(session_id)
+            if not session:
+                await self.create_session(session_id)
+                session = await self.get_session(session_id)
+            if not session:
+                self.log_error("Failed to create session", session_id=session_id)
+                return False
+            # Check message limit
+            if (self.max_messages_per_session > 0 and
+                len(session.messages) >= self.max_messages_per_session):
+                # Remove oldest messages to make room
+                messages_to_remove = len(session.messages) - self.max_messages_per_session + 1
+                session.messages = session.messages[messages_to_remove:]
+                self.log_info("Trimmed old messages",
+                             session_id=session_id,
+                             removed_count=messages_to_remove)
+            # Add message
+            session.messages.append(message)
+            session.message_count = len(session.messages)
+            session.updated_at = datetime.utcnow()
+            # Store updated session
+            await self._store_session(session)
+            self.log_debug("Message added to session",
+                          session_id=session_id,
+                          message_role=message.role,
+                          total_messages=session.message_count)
+            return True
+        except Exception as e:
+            self.log_error("Failed to add message", error=str(e), session_id=session_id)
+            return False
+    async def get_session(self, session_id: str) -> Optional[ConversationHistory]:
+        """
+        Get a session by ID
+        Args:
+            session_id: Session identifier
+        Returns:
+            ConversationHistory or None if not found
+        """
+        try:
+            if self.use_redis:
+                return await self._get_session_from_redis(session_id)
+            else:
+                return self.sessions.get(session_id)
+        except Exception as e:
+            self.log_error("Failed to get session", error=str(e), session_id=session_id)
+            return None
+    async def session_exists(self, session_id: str) -> bool:
+        """Check if a session exists"""
+        session = await self.get_session(session_id)
+        return session is not None
+    async def delete_session(self, session_id: str) -> bool:
+        """
+        Delete a session
+        Args:
+            session_id: Session identifier
+        Returns:
+            bool: True if session deleted successfully
+        """
+        try:
+            if self.use_redis:
+                await self.redis_client.delete(f"session:{session_id}")
+            else:
+                self.sessions.pop(session_id, None)
+            self.log_info("Session deleted", session_id=session_id)
+            return True
+        except Exception as e:
+            self.log_error("Failed to delete session", error=str(e), session_id=session_id)
+            return False
+    async def get_session_messages(self, session_id: str, limit: Optional[int] = None) -> List[ChatMessage]:
+        """
+        Get messages from a session
+        Args:
+            session_id: Session identifier
+            limit: Optional limit on number of messages to return
+        Returns:
+            List of ChatMessage objects
+        """
+        session = await self.get_session(session_id)
+        if not session:
+            return []
+        messages = session.messages
+        if limit and limit > 0:
+            messages = messages[-limit:]  # Get last N messages
+        return messages
+    async def get_active_sessions(self) -> List[SessionInfo]:
+        """Get information about all active sessions"""
+        try:
+            sessions = []
+            if self.use_redis:
+                # Get all session keys from Redis
+                keys = await self.redis_client.keys("session:*")
+                for key in keys:
+                    session_id = key.decode().replace("session:", "")
+                    session = await self.get_session(session_id)
+                    if session:
+                        sessions.append(self._session_to_info(session))
+            else:
+                # Get from memory
+                for session in self.sessions.values():
+                    sessions.append(self._session_to_info(session))
+            return sessions
+        except Exception as e:
+            self.log_error("Failed to get active sessions", error=str(e))
+            return []
+    async def get_user_sessions(self, user_id: str) -> List[SessionInfo]:
+        """Get sessions for a specific user (requires user_id in session metadata)"""
+        # This is a simplified implementation
+        # In a real system, you'd store user_id -> session_id mappings
+        all_sessions = await self.get_active_sessions()
+        return [s for s in all_sessions if s.session_id.startswith(f"{user_id}-")]
+    def _session_to_info(self, session: ConversationHistory) -> SessionInfo:
+        """Convert ConversationHistory to SessionInfo"""
+        return SessionInfo(
+            session_id=session.session_id,
+            created_at=session.created_at,
+            updated_at=session.updated_at,
+            message_count=session.message_count,
+            model_name=settings.model_name,  # Current model
+            is_active=True
+        )
+    async def _store_session(self, session: ConversationHistory):
+        """Store session in the appropriate backend"""
+        if self.use_redis:
+            await self._store_session_in_redis(session)
+        else:
+            self.sessions[session.session_id] = session
+    async def _store_session_in_redis(self, session: ConversationHistory):
+        """Store session in Redis"""
+        key = f"session:{session.session_id}"
+        data = {
+            "session_id": session.session_id,
+            "messages": [
+                {
+                    "role": msg.role,
+                    "content": msg.content,
+                    "timestamp": msg.timestamp.isoformat(),
+                    "metadata": msg.metadata or {}
+                }
+                for msg in session.messages
+            ],
+            "created_at": session.created_at.isoformat(),
+            "updated_at": session.updated_at.isoformat(),
+            "message_count": session.message_count
+        }
+        await self.redis_client.setex(
+            key,
+            self.session_timeout,
+            json.dumps(data, default=str)
+        )
+    async def _get_session_from_redis(self, session_id: str) -> Optional[ConversationHistory]:
+        """Get session from Redis"""
+        key = f"session:{session_id}"
+        data = await self.redis_client.get(key)
+        if not data:
+            return None
+        try:
+            session_data = json.loads(data)
+            messages = [
+                ChatMessage(
+                    role=msg["role"],
+                    content=msg["content"],
+                    timestamp=datetime.fromisoformat(msg["timestamp"]),
+                    metadata=msg.get("metadata")
+                )
+                for msg in session_data["messages"]
+            ]
+            return ConversationHistory(
+                session_id=session_data["session_id"],
+                messages=messages,
+                created_at=datetime.fromisoformat(session_data["created_at"]),
+                updated_at=datetime.fromisoformat(session_data["updated_at"]),
+                message_count=session_data["message_count"]
+            )
+        except Exception as e:
+            self.log_error("Failed to parse session from Redis", error=str(e), session_id=session_id)
+            return None
+    async def _cleanup_expired_sessions(self):
+        """Background task to cleanup expired sessions"""
+        while True:
+            try:
+                await asyncio.sleep(300)  # Run every 5 minutes
+                if not self.use_redis:  # Redis handles expiration automatically
+                    current_time = datetime.utcnow()
+                    expired_sessions = []
+                    for session_id, session in self.sessions.items():
+                        if (current_time - session.updated_at).total_seconds() > self.session_timeout:
+                            expired_sessions.append(session_id)
+                    for session_id in expired_sessions:
+                        del self.sessions[session_id]
+                        self.log_debug("Expired session cleaned up", session_id=session_id)
+                    if expired_sessions:
+                        self.log_info("Cleaned up expired sessions", count=len(expired_sessions))
+            except asyncio.CancelledError:
+                break
+            except Exception as e:
+                self.log_error("Session cleanup failed", error=str(e))
+# Global session manager instance
+session_manager = SessionManager()
+async def get_session_manager() -> SessionManager:
+    """Get the global session manager instance"""
+    return session_manager
+async def initialize_session_manager() -> bool:
+    """Initialize the global session manager"""
+    return await session_manager.initialize()
+async def shutdown_session_manager():
+    """Shutdown the global session manager"""
+    await session_manager.shutdown()

app/utils/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+ # Utilities package

app/utils/helpers.py ADDED Viewed

	@@ -0,0 +1,309 @@

+"""
+Utility functions and helpers
+"""
+import re
+import uuid
+import hashlib
+from typing import Optional, Dict, Any, List
+from datetime import datetime, timezone
+def generate_session_id(user_id: Optional[str] = None) -> str:
+    """
+    Generate a unique session ID
+    Args:
+        user_id: Optional user identifier to include in session ID
+    Returns:
+        Unique session identifier
+    """
+    timestamp = datetime.now(timezone.utc).strftime("%Y%m%d%H%M%S")
+    random_part = str(uuid.uuid4())[:8]
+    if user_id:
+        # Create a hash of user_id for privacy
+        user_hash = hashlib.md5(user_id.encode()).hexdigest()[:8]
+        return f"{user_hash}-{timestamp}-{random_part}"
+    else:
+        return f"anon-{timestamp}-{random_part}"
+def generate_message_id() -> str:
+    """Generate a unique message ID"""
+    return f"msg-{uuid.uuid4()}"
+def sanitize_text(text: str, max_length: int = 4000) -> str:
+    """
+    Sanitize and clean text input
+    Args:
+        text: Input text to sanitize
+        max_length: Maximum allowed length
+    Returns:
+        Sanitized text
+    """
+    if not text:
+        return ""
+    # Remove excessive whitespace
+    text = re.sub(r'\s+', ' ', text.strip())
+    # Truncate if too long
+    if len(text) > max_length:
+        text = text[:max_length].rsplit(' ', 1)[0] + "..."
+    return text
+def format_timestamp(dt: datetime) -> str:
+    """
+    Format datetime for consistent display
+    Args:
+        dt: Datetime object
+    Returns:
+        Formatted timestamp string
+    """
+    return dt.strftime("%Y-%m-%d %H:%M:%S UTC")
+def estimate_tokens(text: str) -> int:
+    """
+    Rough estimation of token count for text
+    Args:
+        text: Input text
+    Returns:
+        Estimated token count
+    """
+    # Very rough estimation: ~4 characters per token on average
+    return max(1, len(text) // 4)
+def truncate_conversation_history(
+    messages: List[Dict[str, Any]],
+    max_tokens: int = 2000
+) -> List[Dict[str, Any]]:
+    """
+    Truncate conversation history to fit within token limit
+    Args:
+        messages: List of message dictionaries
+        max_tokens: Maximum token limit
+    Returns:
+        Truncated list of messages
+    """
+    if not messages:
+        return messages
+    # Always keep system message if present
+    system_messages = [msg for msg in messages if msg.get("role") == "system"]
+    other_messages = [msg for msg in messages if msg.get("role") != "system"]
+    # Estimate tokens for system messages
+    system_tokens = sum(estimate_tokens(msg.get("content", "")) for msg in system_messages)
+    available_tokens = max_tokens - system_tokens
+    if available_tokens <= 0:
+        return system_messages
+    # Add messages from the end (most recent first) until we hit the limit
+    selected_messages = []
+    current_tokens = 0
+    for msg in reversed(other_messages):
+        msg_tokens = estimate_tokens(msg.get("content", ""))
+        if current_tokens + msg_tokens <= available_tokens:
+            selected_messages.insert(0, msg)
+            current_tokens += msg_tokens
+        else:
+            break
+    return system_messages + selected_messages
+def validate_session_id(session_id: str) -> bool:
+    """
+    Validate session ID format
+    Args:
+        session_id: Session identifier to validate
+    Returns:
+        True if valid, False otherwise
+    """
+    if not session_id or len(session_id) < 5 or len(session_id) > 100:
+        return False
+    # Allow alphanumeric, hyphens, and underscores
+    return bool(re.match(r'^[a-zA-Z0-9_-]+$', session_id))
+def extract_model_name_from_path(model_path: str) -> str:
+    """
+    Extract clean model name from HuggingFace model path
+    Args:
+        model_path: Full model path (e.g., "microsoft/DialoGPT-medium")
+    Returns:
+        Clean model name
+    """
+    if "/" in model_path:
+        return model_path.split("/")[-1]
+    return model_path
+def format_model_info(model_info: Dict[str, Any]) -> Dict[str, Any]:
+    """
+    Format model information for API responses
+    Args:
+        model_info: Raw model information
+    Returns:
+        Formatted model information
+    """
+    formatted = {
+        "name": model_info.get("name", "unknown"),
+        "type": model_info.get("type", "unknown"),
+        "loaded": model_info.get("loaded", False),
+        "capabilities": model_info.get("capabilities", []),
+    }
+    # Add backend-specific information
+    if "device" in model_info:
+        formatted["device"] = model_info["device"]
+    if "provider" in model_info:
+        formatted["provider"] = model_info["provider"]
+    if "parameters" in model_info:
+        formatted["parameters"] = model_info["parameters"]
+    return formatted
+def create_error_response(
+    error_type: str,
+    message: str,
+    details: Optional[Dict[str, Any]] = None,
+    request_id: Optional[str] = None
+) -> Dict[str, Any]:
+    """
+    Create standardized error response
+    Args:
+        error_type: Type of error
+        message: Error message
+        details: Optional additional details
+        request_id: Optional request identifier
+    Returns:
+        Formatted error response
+    """
+    return {
+        "error": error_type,
+        "message": message,
+        "details": details or {},
+        "timestamp": datetime.utcnow().isoformat(),
+        "request_id": request_id or generate_message_id()
+    }
+def parse_model_backend_from_name(model_name: str) -> str:
+    """
+    Guess the appropriate backend type from model name
+    Args:
+        model_name: Model name or path
+    Returns:
+        Suggested backend type
+    """
+    model_lower = model_name.lower()
+    if "gpt" in model_lower and ("3.5" in model_lower or "4" in model_lower):
+        return "openai"
+    elif "claude" in model_lower:
+        return "anthropic"
+    elif any(provider in model_lower for provider in ["microsoft", "google", "meta", "huggingface"]):
+        return "hf_api"  # Likely available via HF API
+    else:
+        return "local"  # Default to local
+def get_supported_model_examples() -> Dict[str, List[str]]:
+    """
+    Get examples of supported models for each backend type
+    Returns:
+        Dictionary mapping backend types to example models
+    """
+    return {
+        "local": [
+            "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
+            "microsoft/DialoGPT-medium",
+            "Qwen/Qwen2.5-0.5B-Instruct",
+            "microsoft/phi-2"
+        ],
+        "hf_api": [
+            "microsoft/DialoGPT-large",
+            "google/gemma-2b-it",
+            "microsoft/phi-2",
+            "meta-llama/Llama-2-7b-chat-hf"
+        ],
+        "openai": [
+            "gpt-3.5-turbo",
+            "gpt-4",
+            "gpt-4-turbo",
+            "gpt-4o"
+        ],
+        "anthropic": [
+            "claude-3-haiku-20240307",
+            "claude-3-sonnet-20240229",
+            "claude-3-opus-20240229",
+            "claude-3-5-sonnet-20241022"
+        ]
+    }
+def calculate_response_metrics(
+    start_time: float,
+    response_text: str,
+    token_count: Optional[int] = None
+) -> Dict[str, Any]:
+    """
+    Calculate response metrics for monitoring
+    Args:
+        start_time: Request start time
+        response_text: Generated response text
+        token_count: Actual token count if available
+    Returns:
+        Dictionary of metrics
+    """
+    import time
+    end_time = time.time()
+    total_time = end_time - start_time
+    estimated_tokens = token_count or estimate_tokens(response_text)
+    tokens_per_second = estimated_tokens / total_time if total_time > 0 else 0
+    return {
+        "total_time": total_time,
+        "character_count": len(response_text),
+        "estimated_tokens": estimated_tokens,
+        "actual_tokens": token_count,
+        "tokens_per_second": tokens_per_second,
+        "words_count": len(response_text.split())
+    }

examples/test_backends.py ADDED Viewed

	@@ -0,0 +1,292 @@

+"""
+Example script to test different model backends
+Demonstrates how to configure and use various model types
+"""
+import os
+import asyncio
+import sys
+import time
+from pathlib import Path
+# Add the app directory to the Python path
+sys.path.insert(0, str(Path(__file__).parent.parent))
+from app.core.config import Settings
+from app.services.model_backends.local_hf import LocalHuggingFaceBackend
+from app.services.model_backends.hf_api import HuggingFaceAPIBackend
+from app.services.model_backends.openai_api import OpenAIAPIBackend
+from app.services.model_backends.anthropic_api import AnthropicAPIBackend
+from app.models.schemas import ChatMessage
+async def test_local_hf_backend():
+    """Test local HuggingFace backend"""
+    print("🤖 Testing Local HuggingFace Backend")
+    print("-" * 40)
+    # Use a small model for testing
+    model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
+    backend = LocalHuggingFaceBackend(
+        model_name=model_name,
+        device="cpu",  # Use CPU for compatibility
+        temperature=0.7,
+        max_tokens=50
+    )
+    try:
+        print(f"Loading model: {model_name}")
+        success = await backend.load_model()
+        if not success:
+            print("❌ Failed to load model")
+            return False
+        print("✅ Model loaded successfully")
+        # Test generation
+        messages = [
+            ChatMessage(role="user", content="Hello! What's your name?")
+        ]
+        print("Generating response...")
+        start_time = time.time()
+        response = await backend.generate_response(messages, max_tokens=30)
+        end_time = time.time()
+        print(f"✅ Response generated in {end_time - start_time:.2f}s")
+        print(f"Response: {response.message}")
+        # Test streaming
+        print("\nTesting streaming...")
+        full_response = ""
+        chunk_count = 0
+        async for chunk in backend.generate_stream(messages, max_tokens=30):
+            full_response += chunk.content
+            chunk_count += 1
+            if chunk.is_final:
+                break
+        print(f"✅ Streaming completed with {chunk_count} chunks")
+        print(f"Streamed response: {full_response}")
+        # Cleanup
+        await backend.unload_model()
+        print("✅ Model unloaded")
+        return True
+    except Exception as e:
+        print(f"❌ Local HF backend test failed: {e}")
+        return False
+async def test_hf_api_backend():
+    """Test HuggingFace API backend"""
+    print("\n🌐 Testing HuggingFace API Backend")
+    print("-" * 40)
+    # Check if API token is available
+    api_token = os.getenv("HF_API_TOKEN")
+    if not api_token:
+        print("⚠️  HF_API_TOKEN not set, skipping HF API test")
+        return True
+    model_name = "microsoft/DialoGPT-medium"
+    backend = HuggingFaceAPIBackend(
+        model_name=model_name,
+        api_token=api_token,
+        temperature=0.7,
+        max_tokens=50
+    )
+    try:
+        print(f"Initializing API client for: {model_name}")
+        success = await backend.load_model()
+        if not success:
+            print("❌ Failed to initialize API client")
+            return False
+        print("✅ API client initialized")
+        # Test generation
+        messages = [
+            ChatMessage(role="user", content="Hello! How are you?")
+        ]
+        print("Generating response via API...")
+        start_time = time.time()
+        response = await backend.generate_response(messages, max_tokens=30)
+        end_time = time.time()
+        print(f"✅ Response generated in {end_time - start_time:.2f}s")
+        print(f"Response: {response.message}")
+        return True
+    except Exception as e:
+        print(f"❌ HF API backend test failed: {e}")
+        return False
+async def test_openai_backend():
+    """Test OpenAI API backend"""
+    print("\n🔥 Testing OpenAI API Backend")
+    print("-" * 40)
+    # Check if API key is available
+    api_key = os.getenv("OPENAI_API_KEY")
+    if not api_key:
+        print("⚠️  OPENAI_API_KEY not set, skipping OpenAI test")
+        return True
+    model_name = "gpt-3.5-turbo"
+    backend = OpenAIAPIBackend(
+        model_name=model_name,
+        api_key=api_key,
+        temperature=0.7,
+        max_tokens=50
+    )
+    try:
+        print(f"Initializing OpenAI client for: {model_name}")
+        success = await backend.load_model()
+        if not success:
+            print("❌ Failed to initialize OpenAI client")
+            return False
+        print("✅ OpenAI client initialized")
+        # Test generation
+        messages = [
+            ChatMessage(role="user", content="Hello! What's the weather like?")
+        ]
+        print("Generating response via OpenAI...")
+        start_time = time.time()
+        response = await backend.generate_response(messages, max_tokens=30)
+        end_time = time.time()
+        print(f"✅ Response generated in {end_time - start_time:.2f}s")
+        print(f"Response: {response.message}")
+        # Test streaming
+        print("\nTesting streaming...")
+        full_response = ""
+        chunk_count = 0
+        async for chunk in backend.generate_stream(messages, max_tokens=30):
+            full_response += chunk.content
+            chunk_count += 1
+            if chunk.is_final:
+                break
+        print(f"✅ Streaming completed with {chunk_count} chunks")
+        print(f"Streamed response: {full_response}")
+        return True
+    except Exception as e:
+        print(f"❌ OpenAI backend test failed: {e}")
+        return False
+async def test_anthropic_backend():
+    """Test Anthropic API backend"""
+    print("\n🧠 Testing Anthropic API Backend")
+    print("-" * 40)
+    # Check if API key is available
+    api_key = os.getenv("ANTHROPIC_API_KEY")
+    if not api_key:
+        print("⚠️  ANTHROPIC_API_KEY not set, skipping Anthropic test")
+        return True
+    model_name = "claude-3-haiku-20240307"
+    backend = AnthropicAPIBackend(
+        model_name=model_name,
+        api_key=api_key,
+        temperature=0.7,
+        max_tokens=50
+    )
+    try:
+        print(f"Initializing Anthropic client for: {model_name}")
+        success = await backend.load_model()
+        if not success:
+            print("❌ Failed to initialize Anthropic client")
+            return False
+        print("✅ Anthropic client initialized")
+        # Test generation
+        messages = [
+            ChatMessage(role="user", content="Hello! Tell me about yourself.")
+        ]
+        print("Generating response via Anthropic...")
+        start_time = time.time()
+        response = await backend.generate_response(messages, max_tokens=30)
+        end_time = time.time()
+        print(f"✅ Response generated in {end_time - start_time:.2f}s")
+        print(f"Response: {response.message}")
+        return True
+    except Exception as e:
+        print(f"❌ Anthropic backend test failed: {e}")
+        return False
+async def main():
+    """Main test function"""
+    print("🚀 Sema Chat Backend Testing")
+    print("=" * 50)
+    results = {}
+    # Test each backend
+    results["local_hf"] = await test_local_hf_backend()
+    results["hf_api"] = await test_hf_api_backend()
+    results["openai"] = await test_openai_backend()
+    results["anthropic"] = await test_anthropic_backend()
+    # Summary
+    print("\n" + "=" * 50)
+    print("📊 Test Results Summary")
+    print("-" * 25)
+    for backend, success in results.items():
+        status = "✅ PASS" if success else "❌ FAIL"
+        print(f"{backend:15} {status}")
+    total_tests = len(results)
+    passed_tests = sum(results.values())
+    print(f"\nTotal: {passed_tests}/{total_tests} backends working")
+    if passed_tests == total_tests:
+        print("🎉 All available backends are working!")
+    elif passed_tests > 0:
+        print("⚠️  Some backends are working, check configuration for others")
+    else:
+        print("❌ No backends are working, check your setup")
+    print("\n💡 Tips:")
+    print("- For HF API: Set HF_API_TOKEN environment variable")
+    print("- For OpenAI: Set OPENAI_API_KEY environment variable")
+    print("- For Anthropic: Set ANTHROPIC_API_KEY environment variable")
+    print("- For local models: Ensure you have enough RAM/VRAM")
+if __name__ == "__main__":
+    asyncio.run(main())

requirements.txt ADDED Viewed

	@@ -0,0 +1,28 @@

+fastapi
+uvicorn
+pydantic
+python-multipart
+websockets
+sse-starlette
+slowapi
+prometheus-client
+structlog
+python-dotenv
+httpx
+aiofiles
+# HuggingFace & ML
+transformers
+torch
+huggingface-hub
+accelerate
+sentencepiece
+# API Clients
+openai
+anthropic
+# Utilities
+uuid
+asyncio-mqtt
+redis

setup_huggingface.sh ADDED Viewed

	@@ -0,0 +1,242 @@

+#!/bin/bash
+# 🚀 Sema Chat API - HuggingFace Spaces Setup Script
+# This script helps you deploy Sema Chat API to HuggingFace Spaces with Gemma
+set -e
+echo "🚀 Sema Chat API - HuggingFace Spaces Setup"
+echo "=========================================="
+# Check if we're in the right directory
+if [ ! -f "app/main.py" ]; then
+    echo "❌ Error: Please run this script from the backend/sema-chat directory"
+    echo "   Current directory: $(pwd)"
+    echo "   Expected files: app/main.py, requirements.txt, Dockerfile"
+    exit 1
+fi
+echo "✅ Found Sema Chat API files"
+# Get user input
+read -p "📝 Enter your HuggingFace username: " HF_USERNAME
+read -p "📝 Enter your Space name (e.g., sema-chat-gemma): " SPACE_NAME
+read -p "🔑 Enter your Google AI API key (or press Enter to skip): " GOOGLE_API_KEY
+# Validate inputs
+if [ -z "$HF_USERNAME" ]; then
+    echo "❌ Error: HuggingFace username is required"
+    exit 1
+fi
+if [ -z "$SPACE_NAME" ]; then
+    echo "❌ Error: Space name is required"
+    exit 1
+fi
+SPACE_URL="https://huggingface.co/spaces/$HF_USERNAME/$SPACE_NAME"
+SPACE_REPO="https://huggingface.co/spaces/$HF_USERNAME/$SPACE_NAME"
+echo ""
+echo "📋 Configuration Summary:"
+echo "   HuggingFace Username: $HF_USERNAME"
+echo "   Space Name: $SPACE_NAME"
+echo "   Space URL: $SPACE_URL"
+echo "   Google AI Key: ${GOOGLE_API_KEY:+[PROVIDED]}${GOOGLE_API_KEY:-[NOT PROVIDED]}"
+echo ""
+read -p "🤔 Continue with deployment? (y/N): " CONFIRM
+if [[ ! $CONFIRM =~ ^[Yy]$ ]]; then
+    echo "❌ Deployment cancelled"
+    exit 0
+fi
+# Create deployment directory
+DEPLOY_DIR="../sema-chat-deploy"
+echo "📁 Creating deployment directory: $DEPLOY_DIR"
+rm -rf "$DEPLOY_DIR"
+mkdir -p "$DEPLOY_DIR"
+# Copy all files
+echo "📋 Copying files..."
+cp -r . "$DEPLOY_DIR/"
+cd "$DEPLOY_DIR"
+# Create README.md for the Space
+echo "📝 Creating Space README..."
+cat > README.md << EOF
+---
+title: Sema Chat API
+emoji: 💬
+colorFrom: purple
+colorTo: pink
+sdk: docker
+pinned: false
+license: mit
+short_description: Modern chatbot API with Gemma integration and streaming capabilities
+---
+# Sema Chat API 💬
+Modern chatbot API with streaming capabilities, powered by Google's Gemma model.
+## 🚀 Features
+- **Real-time Streaming**: Server-Sent Events and WebSocket support
+- **Gemma Integration**: Powered by Google's Gemma 2 9B model
+- **Session Management**: Persistent conversation contexts
+- **RESTful API**: Clean, documented endpoints
+- **Interactive UI**: Built-in Swagger documentation
+## 🔗 API Endpoints
+- **Chat**: \`POST /api/v1/chat\`
+- **Streaming**: \`GET /api/v1/chat/stream\`
+- **WebSocket**: \`ws://space-url/api/v1/chat/ws\`
+- **Health**: \`GET /api/v1/health\`
+- **Docs**: \`GET /\` (Swagger UI)
+## 💬 Quick Test
+\`\`\`bash
+curl -X POST "https://$HF_USERNAME-$SPACE_NAME.hf.space/api/v1/chat" \\
+  -H "Content-Type: application/json" \\
+  -d '{
+    "message": "Hello! Can you introduce yourself?",
+    "session_id": "test-session"
+  }'
+\`\`\`
+## 🔄 Streaming Test
+\`\`\`bash
+curl -N -H "Accept: text/event-stream" \\
+  "https://$HF_USERNAME-$SPACE_NAME.hf.space/api/v1/chat/stream?message=Tell%20me%20about%20AI&session_id=test"
+\`\`\`
+## ⚙️ Configuration
+This Space is configured to use Google's Gemma model via AI Studio.
+Set your \`GOOGLE_API_KEY\` in the Space settings to enable the API.
+## 🛠️ Built With
+- **FastAPI**: Modern Python web framework
+- **Google Gemma**: Advanced language model
+- **Docker**: Containerized deployment
+- **HuggingFace Spaces**: Hosting platform
+---
+Created by $HF_USERNAME | Powered by Sema AI
+EOF
+# Create .gitignore
+echo "🚫 Creating .gitignore..."
+cat > .gitignore << EOF
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+.pytest_cache/
+.coverage
+htmlcov/
+.DS_Store
+.vscode/
+.idea/
+logs/
+*.log
+EOF
+# Initialize git repository
+echo "🔧 Initializing git repository..."
+git init
+git remote add origin "$SPACE_REPO"
+# Create initial commit
+echo "📦 Creating initial commit..."
+git add .
+git commit -m "Initial deployment of Sema Chat API with Gemma support
+Features:
+- Google Gemma 2 9B integration
+- Real-time streaming responses
+- Session management
+- RESTful API with Swagger docs
+- WebSocket support
+- Health monitoring
+Configuration:
+- MODEL_TYPE=google
+- MODEL_NAME=gemma-2-9b-it
+- Port: 7860 (HuggingFace standard)
+"
+echo ""
+echo "🎉 Setup Complete!"
+echo "=================="
+echo ""
+echo "📋 Next Steps:"
+echo "1. Create your HuggingFace Space:"
+echo "   → Go to: https://huggingface.co/spaces"
+echo "   → Click 'Create new Space'"
+echo "   → Name: $SPACE_NAME"
+echo "   → SDK: Docker"
+echo "   → License: MIT"
+echo ""
+echo "2. Push your code:"
+echo "   → cd $DEPLOY_DIR"
+echo "   → git push origin main"
+echo ""
+echo "3. Configure environment variables in Space settings:"
+if [ -n "$GOOGLE_API_KEY" ]; then
+echo "   → MODEL_TYPE=google"
+echo "   → MODEL_NAME=gemma-2-9b-it"
+echo "   → GOOGLE_API_KEY=$GOOGLE_API_KEY"
+else
+echo "   → MODEL_TYPE=google"
+echo "   → MODEL_NAME=gemma-2-9b-it"
+echo "   → GOOGLE_API_KEY=your_google_api_key_here"
+echo ""
+echo "   🔑 Get your Google AI API key from: https://aistudio.google.com/"
+fi
+echo "   → DEBUG=false"
+echo "   → ENVIRONMENT=production"
+echo ""
+echo "4. Wait for build and test:"
+echo "   → Space URL: $SPACE_URL"
+echo "   → API Docs: $SPACE_URL/"
+echo "   → Health Check: $SPACE_URL/api/v1/health"
+echo ""
+echo "🚀 Your Sema Chat API will be live at:"
+echo "   $SPACE_URL"
+echo ""
+echo "Happy deploying! 💬✨"

tests/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+ # Tests package

tests/test_api.py ADDED Viewed

	@@ -0,0 +1,313 @@

+"""
+Test script for Sema Chat API
+Tests all endpoints and functionality
+"""
+import requests
+import json
+import time
+import asyncio
+import websockets
+from typing import Dict, Any
+import sys
+class SemaChatAPITester:
+    """Test client for Sema Chat API"""
+    def __init__(self, base_url: str = "http://localhost:7860"):
+        self.base_url = base_url.rstrip("/")
+        self.session_id = f"test-session-{int(time.time())}"
+    def test_health_endpoints(self):
+        """Test health and status endpoints"""
+        print("🏥 Testing health endpoints...")
+        # Test basic status
+        response = requests.get(f"{self.base_url}/status")
+        assert response.status_code == 200
+        print("✅ Status endpoint working")
+        # Test app-level health
+        response = requests.get(f"{self.base_url}/health")
+        assert response.status_code == 200
+        print("✅ App health endpoint working")
+        # Test detailed health
+        response = requests.get(f"{self.base_url}/api/v1/health")
+        assert response.status_code == 200
+        health_data = response.json()
+        print(f"✅ Detailed health check: {health_data['status']}")
+        print(f"   Model: {health_data['model_name']} ({health_data['model_type']})")
+        print(f"   Model loaded: {health_data['model_loaded']}")
+        return health_data
+    def test_model_info(self):
+        """Test model information endpoint"""
+        print("\n🤖 Testing model info...")
+        response = requests.get(f"{self.base_url}/api/v1/model/info")
+        assert response.status_code == 200
+        model_info = response.json()
+        print(f"✅ Model info retrieved")
+        print(f"   Name: {model_info['name']}")
+        print(f"   Type: {model_info['type']}")
+        print(f"   Loaded: {model_info['loaded']}")
+        print(f"   Capabilities: {model_info['capabilities']}")
+        return model_info
+    def test_regular_chat(self):
+        """Test regular (non-streaming) chat"""
+        print("\n💬 Testing regular chat...")
+        chat_request = {
+            "message": "Hello! Can you introduce yourself?",
+            "session_id": self.session_id,
+            "temperature": 0.7,
+            "max_tokens": 100
+        }
+        start_time = time.time()
+        response = requests.post(
+            f"{self.base_url}/api/v1/chat",
+            json=chat_request,
+            headers={"Content-Type": "application/json"}
+        )
+        end_time = time.time()
+        assert response.status_code == 200
+        chat_response = response.json()
+        print(f"✅ Regular chat working")
+        print(f"   Response time: {end_time - start_time:.2f}s")
+        print(f"   Generation time: {chat_response['generation_time']:.2f}s")
+        print(f"   Response: {chat_response['message'][:100]}...")
+        print(f"   Session ID: {chat_response['session_id']}")
+        print(f"   Message ID: {chat_response['message_id']}")
+        return chat_response
+    def test_streaming_chat(self):
+        """Test streaming chat via SSE"""
+        print("\n🔄 Testing streaming chat...")
+        params = {
+            "message": "Tell me a short story about AI",
+            "session_id": self.session_id,
+            "temperature": 0.8,
+            "max_tokens": 150
+        }
+        start_time = time.time()
+        response = requests.get(
+            f"{self.base_url}/api/v1/chat/stream",
+            params=params,
+            headers={"Accept": "text/event-stream"},
+            stream=True
+        )
+        assert response.status_code == 200
+        chunks_received = 0
+        full_response = ""
+        for line in response.iter_lines():
+            if line:
+                line_str = line.decode('utf-8')
+                if line_str.startswith('data: '):
+                    try:
+                        data = json.loads(line_str[6:])  # Remove 'data: ' prefix
+                        if 'content' in data:
+                            full_response += data['content']
+                            chunks_received += 1
+                        if data.get('is_final'):
+                            break
+                    except json.JSONDecodeError:
+                        continue
+        end_time = time.time()
+        print(f"✅ Streaming chat working")
+        print(f"   Total time: {end_time - start_time:.2f}s")
+        print(f"   Chunks received: {chunks_received}")
+        print(f"   Response: {full_response[:100]}...")
+        return full_response
+    def test_session_management(self):
+        """Test session management endpoints"""
+        print("\n📝 Testing session management...")
+        # Get session history
+        response = requests.get(f"{self.base_url}/api/v1/sessions/{self.session_id}")
+        assert response.status_code == 200
+        session_data = response.json()
+        print(f"✅ Session retrieval working")
+        print(f"   Messages in session: {session_data['message_count']}")
+        print(f"   Session created: {session_data['created_at']}")
+        # Get active sessions
+        response = requests.get(f"{self.base_url}/api/v1/sessions")
+        assert response.status_code == 200
+        sessions = response.json()
+        print(f"✅ Active sessions list working")
+        print(f"   Total active sessions: {len(sessions)}")
+        return session_data
+    async def test_websocket_chat(self):
+        """Test WebSocket chat functionality"""
+        print("\n🔌 Testing WebSocket chat...")
+        ws_url = self.base_url.replace("http://", "ws://").replace("https://", "wss://")
+        ws_url += "/api/v1/chat/ws"
+        try:
+            async with websockets.connect(ws_url) as websocket:
+                # Send a message
+                message = {
+                    "message": "Hello via WebSocket!",
+                    "session_id": f"{self.session_id}-ws",
+                    "temperature": 0.7,
+                    "max_tokens": 50
+                }
+                await websocket.send(json.dumps(message))
+                # Receive response chunks
+                chunks_received = 0
+                full_response = ""
+                while True:
+                    try:
+                        response = await asyncio.wait_for(websocket.recv(), timeout=30.0)
+                        data = json.loads(response)
+                        if data.get("type") == "chunk":
+                            full_response += data.get("content", "")
+                            chunks_received += 1
+                            if data.get("is_final"):
+                                break
+                        elif data.get("type") == "error":
+                            print(f"❌ WebSocket error: {data.get('error')}")
+                            break
+                    except asyncio.TimeoutError:
+                        print("⚠️  WebSocket timeout")
+                        break
+                print(f"✅ WebSocket chat working")
+                print(f"   Chunks received: {chunks_received}")
+                print(f"   Response: {full_response[:100]}...")
+                return full_response
+        except Exception as e:
+            print(f"❌ WebSocket test failed: {e}")
+            return None
+    def test_error_handling(self):
+        """Test error handling"""
+        print("\n🚨 Testing error handling...")
+        # Test empty message
+        response = requests.post(
+            f"{self.base_url}/api/v1/chat",
+            json={"message": "", "session_id": self.session_id}
+        )
+        assert response.status_code == 422  # Validation error
+        print("✅ Empty message validation working")
+        # Test invalid session ID
+        response = requests.get(f"{self.base_url}/api/v1/sessions/invalid-session-id-that-does-not-exist")
+        assert response.status_code == 404
+        print("✅ Invalid session handling working")
+        # Test rate limiting (if enabled)
+        print("✅ Error handling tests passed")
+    def test_session_cleanup(self):
+        """Test session cleanup"""
+        print("\n🧹 Testing session cleanup...")
+        # Clear the test session
+        response = requests.delete(f"{self.base_url}/api/v1/sessions/{self.session_id}")
+        assert response.status_code == 200
+        print("✅ Session cleanup working")
+        # Verify session is gone
+        response = requests.get(f"{self.base_url}/api/v1/sessions/{self.session_id}")
+        assert response.status_code == 404
+        print("✅ Session deletion verified")
+    def run_all_tests(self):
+        """Run all tests"""
+        print("🚀 Starting Sema Chat API Tests")
+        print("=" * 50)
+        try:
+            # Test basic endpoints
+            health_data = self.test_health_endpoints()
+            if not health_data.get('model_loaded'):
+                print("⚠️  Model not loaded, skipping chat tests")
+                return False
+            model_info = self.test_model_info()
+            # Test chat functionality
+            self.test_regular_chat()
+            self.test_streaming_chat()
+            # Test session management
+            self.test_session_management()
+            # Test WebSocket (async)
+            asyncio.run(self.test_websocket_chat())
+            # Test error handling
+            self.test_error_handling()
+            # Cleanup
+            self.test_session_cleanup()
+            print("\n" + "=" * 50)
+            print("🎉 All tests passed successfully!")
+            print(f"✅ API is working correctly with {model_info['name']}")
+            return True
+        except Exception as e:
+            print(f"\n❌ Test failed: {e}")
+            import traceback
+            traceback.print_exc()
+            return False
+def main():
+    """Main test function"""
+    import argparse
+    parser = argparse.ArgumentParser(description="Test Sema Chat API")
+    parser.add_argument(
+        "--url",
+        default="http://localhost:7860",
+        help="Base URL of the API (default: http://localhost:7860)"
+    )
+    args = parser.parse_args()
+    tester = SemaChatAPITester(args.url)
+    success = tester.run_all_tests()
+    sys.exit(0 if success else 1)
+if __name__ == "__main__":
+    main()