Spaces:

cygon24
/

llm-api-backend

Runtime error

File size: 4,868 Bytes

86042ad

# 🤖 LLM API Backend - Hugging Face Spaces

A production-ready REST API for LLM capabilities including chat, RAG, and text analysis.

## 🚀 Quick Deploy to Hugging Face Spaces

### Option 1: Using Hugging Face Spaces (Recommended)

1. **Create a new Space**
   - Go to [Hugging Face Spaces](https://huggingface.co/spaces)
   - Click "Create new Space"
   - Choose **Docker** as the SDK
   - Set visibility (Public or Private)

2. **Clone and push this repo**
   ```bash
   git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
   cd YOUR_SPACE_NAME
   # Copy all files from this project
   git add .
   git commit -m "Initial commit"
   git push
   ```

3. **Configure Secrets**
   - Go to your Space settings → Repository secrets
   - Add these secrets:
     ```
     LLMProvider=huggingface
     HuggingFaceAPIKey=hf_your_token_here
     DefaultModel=mistralai/Mistral-7B-Instruct-v0.2
     ```

4. **Your API is live!**
   - Access at: `https://YOUR_USERNAME-YOUR_SPACE_NAME.hf.space`

### Option 2: Deploy Existing Encore App

Since this is already an Encore app, you can also:

```bash
# Deploy to Encore Cloud
encore deploy

# Then use the Encore API URL
https://proj_d3ggdgs82vjo5u1sek0g.api.lp.dev
```

## 📡 API Endpoints

All endpoints are available at your Space URL:

### Chat
```bash
curl -X POST https://YOUR_SPACE.hf.space/chat \
  -H "Content-Type: application/json" \
  -d '{"message": "Explain quantum computing"}'
```

### RAG (Retrieval-Augmented Generation)
```bash
curl -X POST https://YOUR_SPACE.hf.space/rag \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What is the main topic?",
    "context": [
      "Quantum computing uses quantum bits or qubits.",
      "Classical computers use binary bits."
    ]
  }'
```

### Text Analysis
```bash
curl -X POST https://YOUR_SPACE.hf.space/analyze \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Your long text here...",
    "task": "summarize"
  }'
```

**Available tasks:** `summarize`, `evaluate`, `explain`, `extract`

### List Models
```bash
curl https://YOUR_SPACE.hf.space/models
```

### Health Check
```bash
curl https://YOUR_SPACE.hf.space/health
```

## 🔧 Configuration

### Environment Variables / Secrets

Required secrets in Hugging Face Spaces:

| Secret | Description | Example |
|--------|-------------|---------|
| `LLMProvider` | Provider to use | `huggingface` or `ollama` |
| `HuggingFaceAPIKey` | Your HF token | `hf_xxxxxxxxxxxxx` |
| `DefaultModel` | Default model | `mistralai/Mistral-7B-Instruct-v0.2` |
| `OllamaBaseURL` | Only if using Ollama | `http://localhost:11434` |

### Recommended Models for HF Spaces

- `mistralai/Mistral-7B-Instruct-v0.2` (Fast, efficient)
- `microsoft/phi-3-mini-4k-instruct` (Compact)
- `meta-llama/Meta-Llama-3-8B-Instruct` (High quality)
- `google/gemma-7b-it` (Versatile)

## 🏗️ Architecture

```
backend/
├── chat/          # Chat endpoint
├── rag/           # RAG endpoint
├── analyze/       # Text analysis
├── models/        # Model listing
├── health/        # Health check
└── lib/
    ├── llm-provider.ts      # Provider abstraction
    ├── ollama-client.ts     # Ollama integration
    ├── huggingface-client.ts # HF integration
    ├── cache.ts             # In-memory caching
    └── types.ts             # TypeScript types
```

## 🎯 Features

✅ **Dual Provider Support** - Ollama (local) or Hugging Face (cloud)  
✅ **Smart Caching** - In-memory cache with TTL  
✅ **Type-Safe** - Full TypeScript support  
✅ **Production Ready** - Error handling, logging, monitoring  
✅ **RESTful API** - Clean, consistent endpoints  
✅ **Zero Config** - Works out of the box on HF Spaces  

## 🔐 Security

- API keys stored as repository secrets
- No secrets in code or logs
- Rate limiting ready (can add middleware)
- CORS configured

## 📊 Monitoring

Check API health:
```bash
curl https://YOUR_SPACE.hf.space/health
```

Returns:
```json
{
  "status": "healthy",
  "uptime": 3600,
  "provider": "huggingface",
  "modelsAvailable": true,
  "cache": {
    "chat": {"size": 10, "maxEntries": 100, "ttl": 300},
    "rag": {"size": 5, "maxEntries": 50, "ttl": 600},
    "analysis": {"size": 2, "maxEntries": 30, "ttl": 900}
  }
}
```

## 🆘 Troubleshooting

### "Model loading" errors
- Wait 30-60 seconds for HF models to load
- Check your HF token has access to the model

### "Secret not set" errors
- Verify all secrets are configured in Space settings
- Restart the Space after adding secrets

### API not responding
- Check Space logs in the Hugging Face interface
- Verify Docker build completed successfully

## 📝 License

MIT License - feel free to use in your projects!

---

**Built with** [Encore.ts](https://encore.dev) | **Powered by** [Hugging Face](https://huggingface.co)