| | --- |
| | title: Sema Chat API |
| | emoji: π¬ |
| | colorFrom: blue |
| | colorTo: green |
| | sdk: docker |
| | pinned: false |
| | license: mit |
| | short_description: Chat with llms |
| | --- |
| | |
| | # Sema Chat API π¬ |
| |
|
| | Modern chatbot API with streaming capabilities, flexible model backends, and production-ready features. Built with FastAPI and designed for rapid GenAI advancements. |
| |
|
| | ## π Quick Start with Gemma |
| |
|
| | ### Option 1: Automated HuggingFace Spaces Deployment |
| | ```bash |
| | cd backend/sema-chat |
| | ./setup_huggingface.sh |
| | ``` |
| |
|
| | ### Option 2: Manual Local Setup |
| | ```bash |
| | cd backend/sema-chat |
| | pip install -r requirements.txt |
| | |
| | # Copy and configure environment |
| | cp .env.example .env |
| | |
| | # For Gemma via Google AI Studio (Recommended) |
| | # Edit .env: |
| | MODEL_TYPE=google |
| | MODEL_NAME=gemma-2-9b-it |
| | GOOGLE_API_KEY=your_google_api_key |
| | |
| | # Run the API |
| | uvicorn app.main:app --reload --host 0.0.0.0 --port 7860 |
| | ``` |
| |
|
| | ### Option 3: Local Gemma (Free, No API Key) |
| | ```bash |
| | # Edit .env: |
| | MODEL_TYPE=local |
| | MODEL_NAME=google/gemma-2b-it |
| | DEVICE=cpu |
| | |
| | # Run (will download model on first run) |
| | uvicorn app.main:app --reload --host 0.0.0.0 --port 7860 |
| | ``` |
| |
|
| | ## π Access Your API |
| |
|
| | Once running, access: |
| | - **Swagger UI**: http://localhost:7860/ |
| | - **Health Check**: http://localhost:7860/api/v1/health |
| | - **Chat Endpoint**: http://localhost:7860/api/v1/chat |
| |
|
| | ## π§ͺ Quick Test |
| |
|
| | ```bash |
| | # Test chat |
| | curl -X POST "http://localhost:7860/api/v1/chat" \ |
| | -H "Content-Type: application/json" \ |
| | -d '{ |
| | "message": "Hello! Can you introduce yourself?", |
| | "session_id": "test-session" |
| | }' |
| | |
| | # Test streaming |
| | curl -N -H "Accept: text/event-stream" \ |
| | "http://localhost:7860/api/v1/chat/stream?message=Tell%20me%20about%20AI&session_id=test" |
| | ``` |
| |
|
| | ## π― Features |
| |
|
| | ### Core Capabilities |
| | - β
**Real-time Streaming**: Server-Sent Events and WebSocket support |
| | - β
**Multiple Model Backends**: Local, HuggingFace API, OpenAI, Anthropic, Google AI, MiniMax |
| | - β
**Session Management**: Persistent conversation contexts |
| | - β
**Rate Limiting**: Built-in protection with configurable limits |
| | - β
**Health Monitoring**: Comprehensive health checks and metrics |
| |
|
| | ### Supported Models |
| | - **Local**: TinyLlama, DialoGPT, Gemma, Qwen |
| | - **Google AI**: Gemma-2-9b-it, Gemini-1.5-flash, Gemini-1.5-pro |
| | - **OpenAI**: GPT-3.5-turbo, GPT-4, GPT-4-turbo |
| | - **Anthropic**: Claude-3-haiku, Claude-3-sonnet, Claude-3-opus |
| | - **HuggingFace API**: Any model via Inference API |
| | - **MiniMax**: M1 model with reasoning capabilities |
| |
|
| | ## π§ Configuration |
| |
|
| | ### Environment Variables |
| | ```bash |
| | # Model Backend (local, google, openai, anthropic, hf_api, minimax) |
| | MODEL_TYPE=google |
| | MODEL_NAME=gemma-2-9b-it |
| | |
| | # API Keys (as needed) |
| | GOOGLE_API_KEY=your_key |
| | OPENAI_API_KEY=your_key |
| | ANTHROPIC_API_KEY=your_key |
| | HF_API_TOKEN=your_token |
| | MINIMAX_API_KEY=your_key |
| | |
| | # Generation Settings |
| | TEMPERATURE=0.7 |
| | MAX_NEW_TOKENS=512 |
| | TOP_P=0.9 |
| | |
| | # Server Settings |
| | HOST=0.0.0.0 |
| | PORT=7860 |
| | DEBUG=false |
| | ``` |
| |
|
| | ## π Documentation |
| |
|
| | - **[Configuration Guide](CONFIGURATION_GUIDE.md)** - Detailed setup for all backends |
| | - **[HuggingFace Deployment](HUGGINGFACE_DEPLOYMENT.md)** - Step-by-step deployment guide |
| | - **[API Documentation](http://localhost:7860/)** - Interactive Swagger UI |
| |
|
| | ## π§ͺ Testing |
| |
|
| | ```bash |
| | # Run comprehensive tests |
| | python tests/test_api.py |
| | |
| | # Test different backends |
| | python examples/test_backends.py |
| | |
| | # Test specific backend |
| | python examples/test_backends.py --backend google |
| | ``` |
| |
|
| | ## π Deployment |
| |
|
| | ### HuggingFace Spaces (Recommended) |
| | 1. Run the setup script: `./setup_huggingface.sh` |
| | 2. Create your Space on HuggingFace |
| | 3. Push the generated code |
| | 4. Set environment variables in Space settings |
| | 5. Your API will be live at: `https://username-spacename.hf.space/` |
| |
|
| | ### Docker |
| | ```bash |
| | docker build -t sema-chat-api . |
| | docker run -e MODEL_TYPE=google \ |
| | -e GOOGLE_API_KEY=your_key \ |
| | -p 7860:7860 \ |
| | sema-chat-api |
| | ``` |
| |
|
| | ## π API Endpoints |
| |
|
| | ### Chat |
| | - **`POST /api/v1/chat`** - Send chat message |
| | - **`GET /api/v1/chat/stream`** - Streaming chat (SSE) |
| | - **`WebSocket /api/v1/chat/ws`** - Real-time WebSocket chat |
| |
|
| | ### Sessions |
| | - **`GET /api/v1/sessions/{id}`** - Get conversation history |
| | - **`DELETE /api/v1/sessions/{id}`** - Clear conversation |
| | - **`GET /api/v1/sessions`** - List active sessions |
| |
|
| | ### System |
| | - **`GET /api/v1/health`** - Comprehensive health check |
| | - **`GET /api/v1/model/info`** - Current model information |
| | - **`GET /api/v1/status`** - Basic status |
| |
|
| | ## π‘ Why This Architecture? |
| |
|
| | 1. **Future-Proof**: Modular design adapts to rapid GenAI advancements |
| | 2. **Flexible**: Switch between local models and APIs with environment variables |
| | 3. **Production-Ready**: Rate limiting, monitoring, error handling built-in |
| | 4. **Cost-Effective**: Start free with local models, scale with APIs |
| | 5. **Developer-Friendly**: Comprehensive docs, tests, and examples |
| |
|
| | ## π οΈ Development |
| |
|
| | ### Project Structure |
| | ``` |
| | app/ |
| | βββ main.py # FastAPI application |
| | βββ api/v1/endpoints.py # API routes |
| | βββ core/ |
| | β βββ config.py # Environment-based configuration |
| | β βββ logging.py # Structured logging |
| | βββ models/schemas.py # Pydantic request/response models |
| | βββ services/ |
| | β βββ chat_manager.py # Chat orchestration |
| | β βββ model_manager.py # Backend selection |
| | β βββ session_manager.py # Conversation management |
| | β βββ model_backends/ # Model implementations |
| | βββ utils/helpers.py # Utility functions |
| | ``` |
| |
|
| | ### Adding New Backends |
| | 1. Create new backend in `app/services/model_backends/` |
| | 2. Inherit from `ModelBackend` base class |
| | 3. Implement required methods |
| | 4. Add to `ModelManager._create_backend()` |
| | 5. Update configuration and documentation |
| |
|
| | ## π€ Contributing |
| |
|
| | 1. Fork the repository |
| | 2. Create a feature branch |
| | 3. Add tests for new functionality |
| | 4. Ensure all tests pass |
| | 5. Submit a pull request |
| |
|
| | ## π License |
| |
|
| | MIT License - see LICENSE file for details. |
| |
|
| | ## π Acknowledgments |
| |
|
| | - **HuggingFace** for model hosting and Spaces platform |
| | - **Google** for Gemma models and AI Studio |
| | - **FastAPI** for the excellent web framework |
| | - **OpenAI, Anthropic, MiniMax** for their APIs |
| |
|
| | --- |
| |
|
| | **Ready to chat? Deploy your Sema Chat API today! ππ¬** |
| |
|