Spaces:
Runtime error
Runtime error
| # π€ Production-Ready LLM API Backend | |
| A flexible, high-performance REST API for LLM capabilities including conversational AI, RAG, and text analysis. Built with [Encore.ts](https://encore.dev) for easy deployment to Encore Cloud or Hugging Face Spaces. | |
| ## β¨ Features | |
| - π― **5 Core Endpoints** - Chat, RAG, Analysis, Models, Health | |
| - π **Dual Provider Support** - Ollama (local) or Hugging Face (cloud) | |
| - β‘ **Smart Caching** - In-memory cache with TTL and automatic cleanup | |
| - π‘οΈ **Type-Safe** - Full TypeScript support with end-to-end type safety | |
| - π¦ **Production Ready** - Comprehensive error handling, logging, and monitoring | |
| - π **Zero Config** - Works out of the box on multiple platforms | |
| ## π Quick Start | |
| ### Local Development | |
| ```bash | |
| # Set up secrets | |
| encore secret set LLMProvider ollama | |
| encore secret set OllamaBaseURL http://localhost:11434 | |
| # Or use Hugging Face | |
| encore secret set LLMProvider huggingface | |
| encore secret set HuggingFaceAPIKey hf_your_token_here | |
| encore secret set DefaultModel mistralai/Mistral-7B-Instruct-v0.2 | |
| # Run locally | |
| encore run | |
| # Test the API | |
| curl -X POST http://localhost:4000/chat \ | |
| -H "Content-Type: application/json" \ | |
| -d '{"message": "Explain AI in simple terms"}' | |
| ``` | |
| ### Deploy to Encore Cloud | |
| ```bash | |
| encore deploy | |
| ``` | |
| Your API will be live at: `https://staging-<your-app>.encr.app` | |
| ### Deploy to Hugging Face Spaces | |
| See [README.space.md](./README.space.md) for complete Hugging Face Spaces deployment instructions. | |
| **Quick summary:** | |
| 1. Create a new Docker Space on Hugging Face | |
| 2. Push this repository to your Space | |
| 3. Configure secrets in Space settings | |
| 4. Your API is live! | |
| ## π‘ API Endpoints | |
| ### POST `/chat` | |
| Conversational AI with intelligent caching. | |
| **Request:** | |
| ```json | |
| { | |
| "message": "Explain quantum computing", | |
| "model": "llama3", | |
| "temperature": 0.7, | |
| "maxTokens": 500, | |
| "systemPrompt": "You are a helpful assistant" | |
| } | |
| ``` | |
| **Response:** | |
| ```json | |
| { | |
| "response": "Quantum computing is...", | |
| "model": "llama3", | |
| "tokensUsed": 150 | |
| } | |
| ``` | |
| ### POST `/rag` | |
| Retrieval-Augmented Generation with source tracking. | |
| **Request:** | |
| ```json | |
| { | |
| "query": "What is the main topic?", | |
| "context": [ | |
| "Quantum computing uses qubits...", | |
| "Classical computers use bits..." | |
| ], | |
| "model": "mistral", | |
| "temperature": 0.5 | |
| } | |
| ``` | |
| **Response:** | |
| ```json | |
| { | |
| "response": "Based on [0] and [1], the main topic is...", | |
| "model": "mistral", | |
| "tokensUsed": 120, | |
| "sources": [0, 1] | |
| } | |
| ``` | |
| ### POST `/analyze` | |
| Text analysis for educational and research use cases. | |
| **Request:** | |
| ```json | |
| { | |
| "text": "Your long text here...", | |
| "task": "summarize", | |
| "model": "llama3", | |
| "temperature": 0.3 | |
| } | |
| ``` | |
| **Tasks:** `summarize`, `evaluate`, `explain`, `extract` | |
| **Response:** | |
| ```json | |
| { | |
| "result": "Summary of the text...", | |
| "task": "summarize", | |
| "model": "llama3", | |
| "tokensUsed": 80 | |
| } | |
| ``` | |
| ### GET `/models` | |
| List all available LLM models. | |
| **Response:** | |
| ```json | |
| { | |
| "provider": "ollama", | |
| "models": [ | |
| { | |
| "name": "llama3", | |
| "size": "4.7 GB", | |
| "description": "llama3 - Modified 1/2/2025", | |
| "provider": "ollama" | |
| } | |
| ] | |
| } | |
| ``` | |
| ### GET `/health` | |
| System health and uptime monitoring. | |
| **Response:** | |
| ```json | |
| { | |
| "status": "healthy", | |
| "uptime": 3600, | |
| "provider": "huggingface", | |
| "modelsAvailable": true, | |
| "cache": { | |
| "chat": {"size": 10, "maxEntries": 100, "ttl": 300}, | |
| "rag": {"size": 5, "maxEntries": 50, "ttl": 600}, | |
| "analysis": {"size": 2, "maxEntries": 30, "ttl": 900} | |
| } | |
| } | |
| ``` | |
| ## π§ Configuration | |
| ### Required Secrets | |
| | Secret | Description | Example | | |
| |--------|-------------|---------| | |
| | `LLMProvider` | Provider to use | `ollama` or `huggingface` | | |
| | `OllamaBaseURL` | Ollama API URL (if using Ollama) | `http://localhost:11434` | | |
| | `HuggingFaceAPIKey` | HF token (if using Hugging Face) | `hf_xxxxxxxxxxxxx` | | |
| | `DefaultModel` | Default model (optional) | `llama3` or `mistralai/Mistral-7B-Instruct-v0.2` | | |
| ### Setting Secrets | |
| **Encore Cloud:** | |
| ```bash | |
| encore secret set LLMProvider huggingface | |
| encore secret set HuggingFaceAPIKey hf_your_token | |
| ``` | |
| **Hugging Face Spaces:** | |
| Add secrets in Space Settings β Repository secrets | |
| ## ποΈ Architecture | |
| ``` | |
| backend/ | |
| βββ chat/ # Conversational AI endpoint | |
| β βββ encore.service.ts | |
| β βββ chat.ts | |
| βββ rag/ # RAG endpoint | |
| β βββ encore.service.ts | |
| β βββ rag.ts | |
| βββ analyze/ # Text analysis endpoint | |
| β βββ encore.service.ts | |
| β βββ analyze.ts | |
| βββ models/ # Model listing endpoint | |
| β βββ encore.service.ts | |
| β βββ models.ts | |
| βββ health/ # Health check endpoint | |
| β βββ encore.service.ts | |
| β βββ health.ts | |
| βββ lib/ # Shared utilities | |
| βββ types.ts # TypeScript types | |
| βββ cache.ts # In-memory caching | |
| βββ llm-provider.ts # Provider abstraction | |
| βββ ollama-client.ts # Ollama integration | |
| βββ huggingface-client.ts # Hugging Face integration | |
| ``` | |
| ## π― Use Cases | |
| - π¬ **Chatbots** - Build conversational AI applications | |
| - π **RAG Systems** - Create context-aware Q&A systems | |
| - π **Education** - Analyze and explain complex texts | |
| - π¬ **Research** - Summarize and extract key information | |
| - π€ **AI Agents** - Backend for autonomous AI systems | |
| - π **Content Analysis** - Evaluate and process documents | |
| ## π Deployment Options | |
| ### 1. Encore Cloud (Recommended for Production) | |
| ```bash | |
| encore deploy | |
| ``` | |
| - Automatic scaling | |
| - Built-in monitoring | |
| - Type-safe service-to-service calls | |
| - Zero infrastructure management | |
| ### 2. Hugging Face Spaces (Great for Demos) | |
| - See [README.space.md](./README.space.md) | |
| - Free hosting for public projects | |
| - Easy model integration | |
| - Community visibility | |
| ### 3. Docker | |
| ```bash | |
| docker build -t llm-api . | |
| docker run -p 7860:7860 \ | |
| -e LLMProvider=huggingface \ | |
| -e HuggingFaceAPIKey=your_key \ | |
| llm-api | |
| ``` | |
| ### 4. Self-Hosted | |
| ```bash | |
| npm install -g encore.dev | |
| encore run --port 8080 | |
| ``` | |
| ## π Performance | |
| - **Caching** - Reduces redundant LLM calls by up to 80% | |
| - **Async/Await** - Non-blocking concurrent requests | |
| - **Lightweight** - Minimal dependencies for fast startup | |
| - **Efficient** - Optimized for serverless environments | |
| **Cache Configuration:** | |
| - Chat: 300s TTL, 100 max entries | |
| - RAG: 600s TTL, 50 max entries | |
| - Analysis: 900s TTL, 30 max entries | |
| ## π Security Best Practices | |
| β API keys stored as secrets, never in code | |
| β No sensitive data in logs | |
| β Type-safe request validation | |
| β Error messages don't leak internals | |
| β CORS configured for frontend integration | |
| ## π οΈ Development | |
| ```bash | |
| # Install Encore | |
| npm install -g encore.dev | |
| # Run with hot reload | |
| encore run | |
| # Run tests | |
| encore test | |
| # Type check | |
| encore build | |
| ``` | |
| ## π Example: Frontend Integration | |
| ```typescript | |
| // Auto-generated type-safe client | |
| import backend from '~backend/client'; | |
| // Chat | |
| const response = await backend.chat.chat({ | |
| message: "Hello!", | |
| temperature: 0.7 | |
| }); | |
| // RAG | |
| const ragResponse = await backend.rag.rag({ | |
| query: "What is this about?", | |
| context: ["Document 1...", "Document 2..."] | |
| }); | |
| // Analysis | |
| const analysis = await backend.analyze.analyze({ | |
| text: "Long text...", | |
| task: "summarize" | |
| }); | |
| ``` | |
| ## π€ Contributing | |
| Contributions welcome! This is a production-ready foundation that can be extended with: | |
| - Additional analysis tasks | |
| - Vector database integration for RAG | |
| - Streaming responses | |
| - Rate limiting middleware | |
| - Authentication | |
| - Model fine-tuning endpoints | |
| ## π License | |
| MIT License - feel free to use in your projects! | |
| ## π Support | |
| - [Encore Documentation](https://encore.dev/docs) | |
| - [Hugging Face Spaces Docs](https://huggingface.co/docs/hub/spaces) | |
| - [GitHub Issues](./issues) | |
| --- | |
| **Built with** β€οΈ using [Encore.ts](https://encore.dev) | |