llm-api-backend / README.space.md
cygon
intial commit
86042ad
# πŸ€– LLM API Backend - Hugging Face Spaces
A production-ready REST API for LLM capabilities including chat, RAG, and text analysis.
## πŸš€ Quick Deploy to Hugging Face Spaces
### Option 1: Using Hugging Face Spaces (Recommended)
1. **Create a new Space**
- Go to [Hugging Face Spaces](https://huggingface.co/spaces)
- Click "Create new Space"
- Choose **Docker** as the SDK
- Set visibility (Public or Private)
2. **Clone and push this repo**
```bash
git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
cd YOUR_SPACE_NAME
# Copy all files from this project
git add .
git commit -m "Initial commit"
git push
```
3. **Configure Secrets**
- Go to your Space settings β†’ Repository secrets
- Add these secrets:
```
LLMProvider=huggingface
HuggingFaceAPIKey=hf_your_token_here
DefaultModel=mistralai/Mistral-7B-Instruct-v0.2
```
4. **Your API is live!**
- Access at: `https://YOUR_USERNAME-YOUR_SPACE_NAME.hf.space`
### Option 2: Deploy Existing Encore App
Since this is already an Encore app, you can also:
```bash
# Deploy to Encore Cloud
encore deploy
# Then use the Encore API URL
https://proj_d3ggdgs82vjo5u1sek0g.api.lp.dev
```
## πŸ“‘ API Endpoints
All endpoints are available at your Space URL:
### Chat
```bash
curl -X POST https://YOUR_SPACE.hf.space/chat \
-H "Content-Type: application/json" \
-d '{"message": "Explain quantum computing"}'
```
### RAG (Retrieval-Augmented Generation)
```bash
curl -X POST https://YOUR_SPACE.hf.space/rag \
-H "Content-Type: application/json" \
-d '{
"query": "What is the main topic?",
"context": [
"Quantum computing uses quantum bits or qubits.",
"Classical computers use binary bits."
]
}'
```
### Text Analysis
```bash
curl -X POST https://YOUR_SPACE.hf.space/analyze \
-H "Content-Type: application/json" \
-d '{
"text": "Your long text here...",
"task": "summarize"
}'
```
**Available tasks:** `summarize`, `evaluate`, `explain`, `extract`
### List Models
```bash
curl https://YOUR_SPACE.hf.space/models
```
### Health Check
```bash
curl https://YOUR_SPACE.hf.space/health
```
## πŸ”§ Configuration
### Environment Variables / Secrets
Required secrets in Hugging Face Spaces:
| Secret | Description | Example |
|--------|-------------|---------|
| `LLMProvider` | Provider to use | `huggingface` or `ollama` |
| `HuggingFaceAPIKey` | Your HF token | `hf_xxxxxxxxxxxxx` |
| `DefaultModel` | Default model | `mistralai/Mistral-7B-Instruct-v0.2` |
| `OllamaBaseURL` | Only if using Ollama | `http://localhost:11434` |
### Recommended Models for HF Spaces
- `mistralai/Mistral-7B-Instruct-v0.2` (Fast, efficient)
- `microsoft/phi-3-mini-4k-instruct` (Compact)
- `meta-llama/Meta-Llama-3-8B-Instruct` (High quality)
- `google/gemma-7b-it` (Versatile)
## πŸ—οΈ Architecture
```
backend/
β”œβ”€β”€ chat/ # Chat endpoint
β”œβ”€β”€ rag/ # RAG endpoint
β”œβ”€β”€ analyze/ # Text analysis
β”œβ”€β”€ models/ # Model listing
β”œβ”€β”€ health/ # Health check
└── lib/
β”œβ”€β”€ llm-provider.ts # Provider abstraction
β”œβ”€β”€ ollama-client.ts # Ollama integration
β”œβ”€β”€ huggingface-client.ts # HF integration
β”œβ”€β”€ cache.ts # In-memory caching
└── types.ts # TypeScript types
```
## 🎯 Features
βœ… **Dual Provider Support** - Ollama (local) or Hugging Face (cloud)
βœ… **Smart Caching** - In-memory cache with TTL
βœ… **Type-Safe** - Full TypeScript support
βœ… **Production Ready** - Error handling, logging, monitoring
βœ… **RESTful API** - Clean, consistent endpoints
βœ… **Zero Config** - Works out of the box on HF Spaces
## πŸ” Security
- API keys stored as repository secrets
- No secrets in code or logs
- Rate limiting ready (can add middleware)
- CORS configured
## πŸ“Š Monitoring
Check API health:
```bash
curl https://YOUR_SPACE.hf.space/health
```
Returns:
```json
{
"status": "healthy",
"uptime": 3600,
"provider": "huggingface",
"modelsAvailable": true,
"cache": {
"chat": {"size": 10, "maxEntries": 100, "ttl": 300},
"rag": {"size": 5, "maxEntries": 50, "ttl": 600},
"analysis": {"size": 2, "maxEntries": 30, "ttl": 900}
}
}
```
## πŸ†˜ Troubleshooting
### "Model loading" errors
- Wait 30-60 seconds for HF models to load
- Check your HF token has access to the model
### "Secret not set" errors
- Verify all secrets are configured in Space settings
- Restart the Space after adding secrets
### API not responding
- Check Space logs in the Hugging Face interface
- Verify Docker build completed successfully
## πŸ“ License
MIT License - feel free to use in your projects!
---
**Built with** [Encore.ts](https://encore.dev) | **Powered by** [Hugging Face](https://huggingface.co)