Spaces:

cygon24
/

llm-api-backend

Runtime error

App Files Files Community

cygon commited on Oct 4, 2025

Commit

7f7b101

1 Parent(s): 86042ad

intial commit

Browse files

Files changed (2) hide show

README.md +8 -340
strucrure.md +341 -0

README.md CHANGED Viewed

@@ -1,341 +1,9 @@
-# 🤖 Production-Ready LLM API Backend
-A flexible, high-performance REST API for LLM capabilities including conversational AI, RAG, and text analysis. Built with [Encore.ts](https://encore.dev) for easy deployment to Encore Cloud or Hugging Face Spaces.
-## ✨ Features
-- 🎯 **5 Core Endpoints** - Chat, RAG, Analysis, Models, Health
-- 🔄 **Dual Provider Support** - Ollama (local) or Hugging Face (cloud)
-- ⚡ **Smart Caching** - In-memory cache with TTL and automatic cleanup
-- 🛡️ **Type-Safe** - Full TypeScript support with end-to-end type safety
-- 📦 **Production Ready** - Comprehensive error handling, logging, and monitoring
-- 🚀 **Zero Config** - Works out of the box on multiple platforms
-## 🚀 Quick Start
-### Local Development
-```bash
-# Set up secrets
-encore secret set LLMProvider ollama
-encore secret set OllamaBaseURL http://localhost:11434
-# Or use Hugging Face
-encore secret set LLMProvider huggingface
-encore secret set HuggingFaceAPIKey hf_your_token_here
-encore secret set DefaultModel mistralai/Mistral-7B-Instruct-v0.2
-# Run locally
-encore run
-# Test the API
-curl -X POST http://localhost:4000/chat \
-  -H "Content-Type: application/json" \
-  -d '{"message": "Explain AI in simple terms"}'
-```
-### Deploy to Encore Cloud
-```bash
-encore deploy
-```
-Your API will be live at: `https://staging-<your-app>.encr.app`
-### Deploy to Hugging Face Spaces
-See [README.space.md](./README.space.md) for complete Hugging Face Spaces deployment instructions.
-**Quick summary:**
-1. Create a new Docker Space on Hugging Face
-2. Push this repository to your Space
-3. Configure secrets in Space settings
-4. Your API is live!
-## 📡 API Endpoints
-### POST `/chat`
-Conversational AI with intelligent caching.
-**Request:**
-```json
-{
-  "message": "Explain quantum computing",
-  "model": "llama3",
-  "temperature": 0.7,
-  "maxTokens": 500,
-  "systemPrompt": "You are a helpful assistant"
-}
-```
-**Response:**
-```json
-{
-  "response": "Quantum computing is...",
-  "model": "llama3",
-  "tokensUsed": 150
-}
-```
-### POST `/rag`
-Retrieval-Augmented Generation with source tracking.
-**Request:**
-```json
-{
-  "query": "What is the main topic?",
-  "context": [
-    "Quantum computing uses qubits...",
-    "Classical computers use bits..."
-  ],
-  "model": "mistral",
-  "temperature": 0.5
-}
-```
-**Response:**
-```json
-{
-  "response": "Based on [0] and [1], the main topic is...",
-  "model": "mistral",
-  "tokensUsed": 120,
-  "sources": [0, 1]
-}
-```
-### POST `/analyze`
-Text analysis for educational and research use cases.
-**Request:**
-```json
-{
-  "text": "Your long text here...",
-  "task": "summarize",
-  "model": "llama3",
-  "temperature": 0.3
-}
-```
-**Tasks:** `summarize`, `evaluate`, `explain`, `extract`
-**Response:**
-```json
-{
-  "result": "Summary of the text...",
-  "task": "summarize",
-  "model": "llama3",
-  "tokensUsed": 80
-}
-```
-### GET `/models`
-List all available LLM models.
-**Response:**
-```json
-{
-  "provider": "ollama",
-  "models": [
-    {
-      "name": "llama3",
-      "size": "4.7 GB",
-      "description": "llama3 - Modified 1/2/2025",
-      "provider": "ollama"
-    }
-  ]
-}
-```
-### GET `/health`
-System health and uptime monitoring.
-**Response:**
-```json
-{
-  "status": "healthy",
-  "uptime": 3600,
-  "provider": "huggingface",
-  "modelsAvailable": true,
-  "cache": {
-    "chat": {"size": 10, "maxEntries": 100, "ttl": 300},
-    "rag": {"size": 5, "maxEntries": 50, "ttl": 600},
-    "analysis": {"size": 2, "maxEntries": 30, "ttl": 900}
-  }
-}
-```
-## 🔧 Configuration
-### Required Secrets
-| Secret | Description | Example |
-|--------|-------------|---------|
-| `LLMProvider` | Provider to use | `ollama` or `huggingface` |
-| `OllamaBaseURL` | Ollama API URL (if using Ollama) | `http://localhost:11434` |
-| `HuggingFaceAPIKey` | HF token (if using Hugging Face) | `hf_xxxxxxxxxxxxx` |
-| `DefaultModel` | Default model (optional) | `llama3` or `mistralai/Mistral-7B-Instruct-v0.2` |
-### Setting Secrets
-**Encore Cloud:**
-```bash
-encore secret set LLMProvider huggingface
-encore secret set HuggingFaceAPIKey hf_your_token
-```
-**Hugging Face Spaces:**
-Add secrets in Space Settings → Repository secrets
-## 🏗️ Architecture
-```
-backend/
-├── chat/                    # Conversational AI endpoint
-│   ├── encore.service.ts
-│   └── chat.ts
-├── rag/                     # RAG endpoint
-│   ├── encore.service.ts
-│   └── rag.ts
-├── analyze/                 # Text analysis endpoint
-│   ├── encore.service.ts
-│   └── analyze.ts
-├── models/                  # Model listing endpoint
-│   ├── encore.service.ts
-│   └── models.ts
-├── health/                  # Health check endpoint
-│   ├── encore.service.ts
-│   └── health.ts
-└── lib/                     # Shared utilities
-    ├── types.ts            # TypeScript types
-    ├── cache.ts            # In-memory caching
-    ├── llm-provider.ts     # Provider abstraction
-    ├── ollama-client.ts    # Ollama integration
-    └── huggingface-client.ts # Hugging Face integration
-```
-## 🎯 Use Cases
-- 💬 **Chatbots** - Build conversational AI applications
-- 📚 **RAG Systems** - Create context-aware Q&A systems
-- 🎓 **Education** - Analyze and explain complex texts
-- 🔬 **Research** - Summarize and extract key information
-- 🤖 **AI Agents** - Backend for autonomous AI systems
-- 📊 **Content Analysis** - Evaluate and process documents
-## 🚀 Deployment Options
-### 1. Encore Cloud (Recommended for Production)
-```bash
-encore deploy
-```
-- Automatic scaling
-- Built-in monitoring
-- Type-safe service-to-service calls
-- Zero infrastructure management
-### 2. Hugging Face Spaces (Great for Demos)
-- See [README.space.md](./README.space.md)
-- Free hosting for public projects
-- Easy model integration
-- Community visibility
-### 3. Docker
-```bash
-docker build -t llm-api .
-docker run -p 7860:7860 \
-  -e LLMProvider=huggingface \
-  -e HuggingFaceAPIKey=your_key \
-  llm-api
-```
-### 4. Self-Hosted
-```bash
-npm install -g encore.dev
-encore run --port 8080
-```
-## 📊 Performance
-- **Caching** - Reduces redundant LLM calls by up to 80%
-- **Async/Await** - Non-blocking concurrent requests
-- **Lightweight** - Minimal dependencies for fast startup
-- **Efficient** - Optimized for serverless environments
-**Cache Configuration:**
-- Chat: 300s TTL, 100 max entries
-- RAG: 600s TTL, 50 max entries
-- Analysis: 900s TTL, 30 max entries
-## 🔐 Security Best Practices
-✅ API keys stored as secrets, never in code
-✅ No sensitive data in logs
-✅ Type-safe request validation
-✅ Error messages don't leak internals
-✅ CORS configured for frontend integration
-## 🛠️ Development
-```bash
-# Install Encore
-npm install -g encore.dev
-# Run with hot reload
-encore run
-# Run tests
-encore test
-# Type check
-encore build
-```
-## 📝 Example: Frontend Integration
-```typescript
-// Auto-generated type-safe client
-import backend from '~backend/client';
-// Chat
-const response = await backend.chat.chat({
-  message: "Hello!",
-  temperature: 0.7
-});
-// RAG
-const ragResponse = await backend.rag.rag({
-  query: "What is this about?",
-  context: ["Document 1...", "Document 2..."]
-});
-// Analysis
-const analysis = await backend.analyze.analyze({
-  text: "Long text...",
-  task: "summarize"
-});
-```
-## 🤝 Contributing
-Contributions welcome! This is a production-ready foundation that can be extended with:
-- Additional analysis tasks
-- Vector database integration for RAG
-- Streaming responses
-- Rate limiting middleware
-- Authentication
-- Model fine-tuning endpoints
-## 📄 License
-MIT License - feel free to use in your projects!
-## 🆘 Support
-- [Encore Documentation](https://encore.dev/docs)
-- [Hugging Face Spaces Docs](https://huggingface.co/docs/hub/spaces)
-- [GitHub Issues](./issues)
 ---
-**Built with** ❤️ using [Encore.ts](https://encore.dev)

 ---
+title: AI API Service with Ollama
+emoji: 🤖
+colorFrom: blue
+colorTo: purple
+sdk: docker
+app_port: 7860
+pinned: false
+---

strucrure.md ADDED Viewed

	@@ -0,0 +1,341 @@

+# 🤖 Production-Ready LLM API Backend
+A flexible, high-performance REST API for LLM capabilities including conversational AI, RAG, and text analysis. Built with [Encore.ts](https://encore.dev) for easy deployment to Encore Cloud or Hugging Face Spaces.
+## ✨ Features
+- 🎯 **5 Core Endpoints** - Chat, RAG, Analysis, Models, Health
+- 🔄 **Dual Provider Support** - Ollama (local) or Hugging Face (cloud)
+- ⚡ **Smart Caching** - In-memory cache with TTL and automatic cleanup
+- 🛡️ **Type-Safe** - Full TypeScript support with end-to-end type safety
+- 📦 **Production Ready** - Comprehensive error handling, logging, and monitoring
+- 🚀 **Zero Config** - Works out of the box on multiple platforms
+## 🚀 Quick Start
+### Local Development
+```bash
+# Set up secrets
+encore secret set LLMProvider ollama
+encore secret set OllamaBaseURL http://localhost:11434
+# Or use Hugging Face
+encore secret set LLMProvider huggingface
+encore secret set HuggingFaceAPIKey hf_your_token_here
+encore secret set DefaultModel mistralai/Mistral-7B-Instruct-v0.2
+# Run locally
+encore run
+# Test the API
+curl -X POST http://localhost:4000/chat \
+  -H "Content-Type: application/json" \
+  -d '{"message": "Explain AI in simple terms"}'
+```
+### Deploy to Encore Cloud
+```bash
+encore deploy
+```
+Your API will be live at: `https://staging-<your-app>.encr.app`
+### Deploy to Hugging Face Spaces
+See [README.space.md](./README.space.md) for complete Hugging Face Spaces deployment instructions.
+**Quick summary:**
+1. Create a new Docker Space on Hugging Face
+2. Push this repository to your Space
+3. Configure secrets in Space settings
+4. Your API is live!
+## 📡 API Endpoints
+### POST `/chat`
+Conversational AI with intelligent caching.
+**Request:**
+```json
+{
+  "message": "Explain quantum computing",
+  "model": "llama3",
+  "temperature": 0.7,
+  "maxTokens": 500,
+  "systemPrompt": "You are a helpful assistant"
+}
+```
+**Response:**
+```json
+{
+  "response": "Quantum computing is...",
+  "model": "llama3",
+  "tokensUsed": 150
+}
+```
+### POST `/rag`
+Retrieval-Augmented Generation with source tracking.
+**Request:**
+```json
+{
+  "query": "What is the main topic?",
+  "context": [
+    "Quantum computing uses qubits...",
+    "Classical computers use bits..."
+  ],
+  "model": "mistral",
+  "temperature": 0.5
+}
+```
+**Response:**
+```json
+{
+  "response": "Based on [0] and [1], the main topic is...",
+  "model": "mistral",
+  "tokensUsed": 120,
+  "sources": [0, 1]
+}
+```
+### POST `/analyze`
+Text analysis for educational and research use cases.
+**Request:**
+```json
+{
+  "text": "Your long text here...",
+  "task": "summarize",
+  "model": "llama3",
+  "temperature": 0.3
+}
+```
+**Tasks:** `summarize`, `evaluate`, `explain`, `extract`
+**Response:**
+```json
+{
+  "result": "Summary of the text...",
+  "task": "summarize",
+  "model": "llama3",
+  "tokensUsed": 80
+}
+```
+### GET `/models`
+List all available LLM models.
+**Response:**
+```json
+{
+  "provider": "ollama",
+  "models": [
+    {
+      "name": "llama3",
+      "size": "4.7 GB",
+      "description": "llama3 - Modified 1/2/2025",
+      "provider": "ollama"
+    }
+  ]
+}
+```
+### GET `/health`
+System health and uptime monitoring.
+**Response:**
+```json
+{
+  "status": "healthy",
+  "uptime": 3600,
+  "provider": "huggingface",
+  "modelsAvailable": true,
+  "cache": {
+    "chat": {"size": 10, "maxEntries": 100, "ttl": 300},
+    "rag": {"size": 5, "maxEntries": 50, "ttl": 600},
+    "analysis": {"size": 2, "maxEntries": 30, "ttl": 900}
+  }
+}
+```
+## 🔧 Configuration
+### Required Secrets
+| Secret | Description | Example |
+|--------|-------------|---------|
+| `LLMProvider` | Provider to use | `ollama` or `huggingface` |
+| `OllamaBaseURL` | Ollama API URL (if using Ollama) | `http://localhost:11434` |
+| `HuggingFaceAPIKey` | HF token (if using Hugging Face) | `hf_xxxxxxxxxxxxx` |
+| `DefaultModel` | Default model (optional) | `llama3` or `mistralai/Mistral-7B-Instruct-v0.2` |
+### Setting Secrets
+**Encore Cloud:**
+```bash
+encore secret set LLMProvider huggingface
+encore secret set HuggingFaceAPIKey hf_your_token
+```
+**Hugging Face Spaces:**
+Add secrets in Space Settings → Repository secrets
+## 🏗️ Architecture
+```
+backend/
+├── chat/                    # Conversational AI endpoint
+│   ├── encore.service.ts
+│   └── chat.ts
+├── rag/                     # RAG endpoint
+│   ├── encore.service.ts
+│   └── rag.ts
+├── analyze/                 # Text analysis endpoint
+│   ├── encore.service.ts
+│   └── analyze.ts
+├── models/                  # Model listing endpoint
+│   ├── encore.service.ts
+│   └── models.ts
+├── health/                  # Health check endpoint
+│   ├── encore.service.ts
+│   └── health.ts
+└── lib/                     # Shared utilities
+    ├── types.ts            # TypeScript types
+    ├── cache.ts            # In-memory caching
+    ├── llm-provider.ts     # Provider abstraction
+    ├── ollama-client.ts    # Ollama integration
+    └── huggingface-client.ts # Hugging Face integration
+```
+## 🎯 Use Cases
+- 💬 **Chatbots** - Build conversational AI applications
+- 📚 **RAG Systems** - Create context-aware Q&A systems
+- 🎓 **Education** - Analyze and explain complex texts
+- 🔬 **Research** - Summarize and extract key information
+- 🤖 **AI Agents** - Backend for autonomous AI systems
+- 📊 **Content Analysis** - Evaluate and process documents
+## 🚀 Deployment Options
+### 1. Encore Cloud (Recommended for Production)
+```bash
+encore deploy
+```
+- Automatic scaling
+- Built-in monitoring
+- Type-safe service-to-service calls
+- Zero infrastructure management
+### 2. Hugging Face Spaces (Great for Demos)
+- See [README.space.md](./README.space.md)
+- Free hosting for public projects
+- Easy model integration
+- Community visibility
+### 3. Docker
+```bash
+docker build -t llm-api .
+docker run -p 7860:7860 \
+  -e LLMProvider=huggingface \
+  -e HuggingFaceAPIKey=your_key \
+  llm-api
+```
+### 4. Self-Hosted
+```bash
+npm install -g encore.dev
+encore run --port 8080
+```
+## 📊 Performance
+- **Caching** - Reduces redundant LLM calls by up to 80%
+- **Async/Await** - Non-blocking concurrent requests
+- **Lightweight** - Minimal dependencies for fast startup
+- **Efficient** - Optimized for serverless environments
+**Cache Configuration:**
+- Chat: 300s TTL, 100 max entries
+- RAG: 600s TTL, 50 max entries
+- Analysis: 900s TTL, 30 max entries
+## 🔐 Security Best Practices
+✅ API keys stored as secrets, never in code
+✅ No sensitive data in logs
+✅ Type-safe request validation
+✅ Error messages don't leak internals
+✅ CORS configured for frontend integration
+## 🛠️ Development
+```bash
+# Install Encore
+npm install -g encore.dev
+# Run with hot reload
+encore run
+# Run tests
+encore test
+# Type check
+encore build
+```
+## 📝 Example: Frontend Integration
+```typescript
+// Auto-generated type-safe client
+import backend from '~backend/client';
+// Chat
+const response = await backend.chat.chat({
+  message: "Hello!",
+  temperature: 0.7
+});
+// RAG
+const ragResponse = await backend.rag.rag({
+  query: "What is this about?",
+  context: ["Document 1...", "Document 2..."]
+});
+// Analysis
+const analysis = await backend.analyze.analyze({
+  text: "Long text...",
+  task: "summarize"
+});
+```
+## 🤝 Contributing
+Contributions welcome! This is a production-ready foundation that can be extended with:
+- Additional analysis tasks
+- Vector database integration for RAG
+- Streaming responses
+- Rate limiting middleware
+- Authentication
+- Model fine-tuning endpoints
+## 📄 License
+MIT License - feel free to use in your projects!
+## 🆘 Support
+- [Encore Documentation](https://encore.dev/docs)
+- [Hugging Face Spaces Docs](https://huggingface.co/docs/hub/spaces)
+- [GitHub Issues](./issues)
+---
+**Built with** ❤️ using [Encore.ts](https://encore.dev)