Spaces:

cygon24
/

llm-api-backend

Runtime error

App Files Files Community

llm-api-backend / README.space.md

cygon

intial commit

86042ad 3 months ago

preview code

raw

history blame contribute delete

4.87 kB

	# 🤖 LLM API Backend - Hugging Face Spaces

	A production-ready REST API for LLM capabilities including chat, RAG, and text analysis.

	## 🚀 Quick Deploy to Hugging Face Spaces

	### Option 1: Using Hugging Face Spaces (Recommended)

	1. Create a new Space
	- Go to [Hugging Face Spaces](https://huggingface.co/spaces)
	- Click "Create new Space"
	- Choose Docker as the SDK
	- Set visibility (Public or Private)

	2. Clone and push this repo
	```bash
	git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
	cd YOUR_SPACE_NAME
	# Copy all files from this project
	git add .
	git commit -m "Initial commit"
	git push
	```

	3. Configure Secrets
	- Go to your Space settings → Repository secrets
	- Add these secrets:
	```
	LLMProvider=huggingface
	HuggingFaceAPIKey=hf_your_token_here
	DefaultModel=mistralai/Mistral-7B-Instruct-v0.2
	```

	4. Your API is live!
	- Access at: `https://YOUR_USERNAME-YOUR_SPACE_NAME.hf.space`

	### Option 2: Deploy Existing Encore App

	Since this is already an Encore app, you can also:

	```bash
	# Deploy to Encore Cloud
	encore deploy

	# Then use the Encore API URL
	https://proj_d3ggdgs82vjo5u1sek0g.api.lp.dev
	```

	## 📡 API Endpoints

	All endpoints are available at your Space URL:

	### Chat
	```bash
	curl -X POST https://YOUR_SPACE.hf.space/chat \
	-H "Content-Type: application/json" \
	-d '{"message": "Explain quantum computing"}'
	```

	### RAG (Retrieval-Augmented Generation)
	```bash
	curl -X POST https://YOUR_SPACE.hf.space/rag \
	-H "Content-Type: application/json" \
	-d '{
	"query": "What is the main topic?",
	"context": [
	"Quantum computing uses quantum bits or qubits.",
	"Classical computers use binary bits."
	]
	}'
	```

	### Text Analysis
	```bash
	curl -X POST https://YOUR_SPACE.hf.space/analyze \
	-H "Content-Type: application/json" \
	-d '{
	"text": "Your long text here...",
	"task": "summarize"
	}'
	```

	Available tasks: `summarize`, `evaluate`, `explain`, `extract`

	### List Models
	```bash
	curl https://YOUR_SPACE.hf.space/models
	```

	### Health Check
	```bash
	curl https://YOUR_SPACE.hf.space/health
	```

	## 🔧 Configuration

	### Environment Variables / Secrets

	Required secrets in Hugging Face Spaces:

	\| Secret \| Description \| Example \|
	\|--------\|-------------\|---------\|
	\| `LLMProvider` \| Provider to use \| `huggingface` or `ollama` \|
	\| `HuggingFaceAPIKey` \| Your HF token \| `hf_xxxxxxxxxxxxx` \|
	\| `DefaultModel` \| Default model \| `mistralai/Mistral-7B-Instruct-v0.2` \|
	\| `OllamaBaseURL` \| Only if using Ollama \| `http://localhost:11434` \|

	### Recommended Models for HF Spaces

	- `mistralai/Mistral-7B-Instruct-v0.2` (Fast, efficient)
	- `microsoft/phi-3-mini-4k-instruct` (Compact)
	- `meta-llama/Meta-Llama-3-8B-Instruct` (High quality)
	- `google/gemma-7b-it` (Versatile)

	## 🏗️ Architecture

	```
	backend/
	├── chat/ # Chat endpoint
	├── rag/ # RAG endpoint
	├── analyze/ # Text analysis
	├── models/ # Model listing
	├── health/ # Health check
	└── lib/
	├── llm-provider.ts # Provider abstraction
	├── ollama-client.ts # Ollama integration
	├── huggingface-client.ts # HF integration
	├── cache.ts # In-memory caching
	└── types.ts # TypeScript types
	```

	## 🎯 Features

	✅ Dual Provider Support - Ollama (local) or Hugging Face (cloud)
	✅ Smart Caching - In-memory cache with TTL
	✅ Type-Safe - Full TypeScript support
	✅ Production Ready - Error handling, logging, monitoring
	✅ RESTful API - Clean, consistent endpoints
	✅ Zero Config - Works out of the box on HF Spaces

	## 🔐 Security

	- API keys stored as repository secrets
	- No secrets in code or logs
	- Rate limiting ready (can add middleware)
	- CORS configured

	## 📊 Monitoring

	Check API health:
	```bash
	curl https://YOUR_SPACE.hf.space/health
	```

	Returns:
	```json
	{
	"status": "healthy",
	"uptime": 3600,
	"provider": "huggingface",
	"modelsAvailable": true,
	"cache": {
	"chat": {"size": 10, "maxEntries": 100, "ttl": 300},
	"rag": {"size": 5, "maxEntries": 50, "ttl": 600},
	"analysis": {"size": 2, "maxEntries": 30, "ttl": 900}
	}
	}
	```

	## 🆘 Troubleshooting

	### "Model loading" errors
	- Wait 30-60 seconds for HF models to load
	- Check your HF token has access to the model

	### "Secret not set" errors
	- Verify all secrets are configured in Space settings
	- Restart the Space after adding secrets

	### API not responding
	- Check Space logs in the Hugging Face interface
	- Verify Docker build completed successfully

	## 📝 License

	MIT License - feel free to use in your projects!

	---

	Built with [Encore.ts](https://encore.dev) \| Powered by [Hugging Face](https://huggingface.co)