# 🤖 Production-Ready LLM API Backend A flexible, high-performance REST API for LLM capabilities including conversational AI, RAG, and text analysis. Built with [Encore.ts](https://encore.dev) for easy deployment to Encore Cloud or Hugging Face Spaces. ## ✨ Features - 🎯 **5 Core Endpoints** - Chat, RAG, Analysis, Models, Health - 🔄 **Dual Provider Support** - Ollama (local) or Hugging Face (cloud) - ⚡ **Smart Caching** - In-memory cache with TTL and automatic cleanup - 🛡️ **Type-Safe** - Full TypeScript support with end-to-end type safety - 📦 **Production Ready** - Comprehensive error handling, logging, and monitoring - 🚀 **Zero Config** - Works out of the box on multiple platforms ## 🚀 Quick Start ### Local Development ```bash # Set up secrets encore secret set LLMProvider ollama encore secret set OllamaBaseURL http://localhost:11434 # Or use Hugging Face encore secret set LLMProvider huggingface encore secret set HuggingFaceAPIKey hf_your_token_here encore secret set DefaultModel mistralai/Mistral-7B-Instruct-v0.2 # Run locally encore run # Test the API curl -X POST http://localhost:4000/chat \ -H "Content-Type: application/json" \ -d '{"message": "Explain AI in simple terms"}' ``` ### Deploy to Encore Cloud ```bash encore deploy ``` Your API will be live at: `https://staging-.encr.app` ### Deploy to Hugging Face Spaces See [README.space.md](./README.space.md) for complete Hugging Face Spaces deployment instructions. **Quick summary:** 1. Create a new Docker Space on Hugging Face 2. Push this repository to your Space 3. Configure secrets in Space settings 4. Your API is live! ## 📡 API Endpoints ### POST `/chat` Conversational AI with intelligent caching. **Request:** ```json { "message": "Explain quantum computing", "model": "llama3", "temperature": 0.7, "maxTokens": 500, "systemPrompt": "You are a helpful assistant" } ``` **Response:** ```json { "response": "Quantum computing is...", "model": "llama3", "tokensUsed": 150 } ``` ### POST `/rag` Retrieval-Augmented Generation with source tracking. **Request:** ```json { "query": "What is the main topic?", "context": [ "Quantum computing uses qubits...", "Classical computers use bits..." ], "model": "mistral", "temperature": 0.5 } ``` **Response:** ```json { "response": "Based on [0] and [1], the main topic is...", "model": "mistral", "tokensUsed": 120, "sources": [0, 1] } ``` ### POST `/analyze` Text analysis for educational and research use cases. **Request:** ```json { "text": "Your long text here...", "task": "summarize", "model": "llama3", "temperature": 0.3 } ``` **Tasks:** `summarize`, `evaluate`, `explain`, `extract` **Response:** ```json { "result": "Summary of the text...", "task": "summarize", "model": "llama3", "tokensUsed": 80 } ``` ### GET `/models` List all available LLM models. **Response:** ```json { "provider": "ollama", "models": [ { "name": "llama3", "size": "4.7 GB", "description": "llama3 - Modified 1/2/2025", "provider": "ollama" } ] } ``` ### GET `/health` System health and uptime monitoring. **Response:** ```json { "status": "healthy", "uptime": 3600, "provider": "huggingface", "modelsAvailable": true, "cache": { "chat": {"size": 10, "maxEntries": 100, "ttl": 300}, "rag": {"size": 5, "maxEntries": 50, "ttl": 600}, "analysis": {"size": 2, "maxEntries": 30, "ttl": 900} } } ``` ## 🔧 Configuration ### Required Secrets | Secret | Description | Example | |--------|-------------|---------| | `LLMProvider` | Provider to use | `ollama` or `huggingface` | | `OllamaBaseURL` | Ollama API URL (if using Ollama) | `http://localhost:11434` | | `HuggingFaceAPIKey` | HF token (if using Hugging Face) | `hf_xxxxxxxxxxxxx` | | `DefaultModel` | Default model (optional) | `llama3` or `mistralai/Mistral-7B-Instruct-v0.2` | ### Setting Secrets **Encore Cloud:** ```bash encore secret set LLMProvider huggingface encore secret set HuggingFaceAPIKey hf_your_token ``` **Hugging Face Spaces:** Add secrets in Space Settings → Repository secrets ## 🏗️ Architecture ``` backend/ ├── chat/ # Conversational AI endpoint │ ├── encore.service.ts │ └── chat.ts ├── rag/ # RAG endpoint │ ├── encore.service.ts │ └── rag.ts ├── analyze/ # Text analysis endpoint │ ├── encore.service.ts │ └── analyze.ts ├── models/ # Model listing endpoint │ ├── encore.service.ts │ └── models.ts ├── health/ # Health check endpoint │ ├── encore.service.ts │ └── health.ts └── lib/ # Shared utilities ├── types.ts # TypeScript types ├── cache.ts # In-memory caching ├── llm-provider.ts # Provider abstraction ├── ollama-client.ts # Ollama integration └── huggingface-client.ts # Hugging Face integration ``` ## 🎯 Use Cases - 💬 **Chatbots** - Build conversational AI applications - 📚 **RAG Systems** - Create context-aware Q&A systems - 🎓 **Education** - Analyze and explain complex texts - 🔬 **Research** - Summarize and extract key information - 🤖 **AI Agents** - Backend for autonomous AI systems - 📊 **Content Analysis** - Evaluate and process documents ## 🚀 Deployment Options ### 1. Encore Cloud (Recommended for Production) ```bash encore deploy ``` - Automatic scaling - Built-in monitoring - Type-safe service-to-service calls - Zero infrastructure management ### 2. Hugging Face Spaces (Great for Demos) - See [README.space.md](./README.space.md) - Free hosting for public projects - Easy model integration - Community visibility ### 3. Docker ```bash docker build -t llm-api . docker run -p 7860:7860 \ -e LLMProvider=huggingface \ -e HuggingFaceAPIKey=your_key \ llm-api ``` ### 4. Self-Hosted ```bash npm install -g encore.dev encore run --port 8080 ``` ## 📊 Performance - **Caching** - Reduces redundant LLM calls by up to 80% - **Async/Await** - Non-blocking concurrent requests - **Lightweight** - Minimal dependencies for fast startup - **Efficient** - Optimized for serverless environments **Cache Configuration:** - Chat: 300s TTL, 100 max entries - RAG: 600s TTL, 50 max entries - Analysis: 900s TTL, 30 max entries ## 🔐 Security Best Practices ✅ API keys stored as secrets, never in code ✅ No sensitive data in logs ✅ Type-safe request validation ✅ Error messages don't leak internals ✅ CORS configured for frontend integration ## 🛠️ Development ```bash # Install Encore npm install -g encore.dev # Run with hot reload encore run # Run tests encore test # Type check encore build ``` ## 📝 Example: Frontend Integration ```typescript // Auto-generated type-safe client import backend from '~backend/client'; // Chat const response = await backend.chat.chat({ message: "Hello!", temperature: 0.7 }); // RAG const ragResponse = await backend.rag.rag({ query: "What is this about?", context: ["Document 1...", "Document 2..."] }); // Analysis const analysis = await backend.analyze.analyze({ text: "Long text...", task: "summarize" }); ``` ## 🤝 Contributing Contributions welcome! This is a production-ready foundation that can be extended with: - Additional analysis tasks - Vector database integration for RAG - Streaming responses - Rate limiting middleware - Authentication - Model fine-tuning endpoints ## 📄 License MIT License - feel free to use in your projects! ## 🆘 Support - [Encore Documentation](https://encore.dev/docs) - [Hugging Face Spaces Docs](https://huggingface.co/docs/hub/spaces) - [GitHub Issues](./issues) --- **Built with** ❤️ using [Encore.ts](https://encore.dev)