llm-api-backend / README.space.md
cygon
intial commit
86042ad

πŸ€– LLM API Backend - Hugging Face Spaces

A production-ready REST API for LLM capabilities including chat, RAG, and text analysis.

πŸš€ Quick Deploy to Hugging Face Spaces

Option 1: Using Hugging Face Spaces (Recommended)

  1. Create a new Space

    • Go to Hugging Face Spaces
    • Click "Create new Space"
    • Choose Docker as the SDK
    • Set visibility (Public or Private)
  2. Clone and push this repo

    git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
    cd YOUR_SPACE_NAME
    # Copy all files from this project
    git add .
    git commit -m "Initial commit"
    git push
    
  3. Configure Secrets

    • Go to your Space settings β†’ Repository secrets
    • Add these secrets:
      LLMProvider=huggingface
      HuggingFaceAPIKey=hf_your_token_here
      DefaultModel=mistralai/Mistral-7B-Instruct-v0.2
      
  4. Your API is live!

    • Access at: https://YOUR_USERNAME-YOUR_SPACE_NAME.hf.space

Option 2: Deploy Existing Encore App

Since this is already an Encore app, you can also:

# Deploy to Encore Cloud
encore deploy

# Then use the Encore API URL
https://proj_d3ggdgs82vjo5u1sek0g.api.lp.dev

πŸ“‘ API Endpoints

All endpoints are available at your Space URL:

Chat

curl -X POST https://YOUR_SPACE.hf.space/chat \
  -H "Content-Type: application/json" \
  -d '{"message": "Explain quantum computing"}'

RAG (Retrieval-Augmented Generation)

curl -X POST https://YOUR_SPACE.hf.space/rag \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What is the main topic?",
    "context": [
      "Quantum computing uses quantum bits or qubits.",
      "Classical computers use binary bits."
    ]
  }'

Text Analysis

curl -X POST https://YOUR_SPACE.hf.space/analyze \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Your long text here...",
    "task": "summarize"
  }'

Available tasks: summarize, evaluate, explain, extract

List Models

curl https://YOUR_SPACE.hf.space/models

Health Check

curl https://YOUR_SPACE.hf.space/health

πŸ”§ Configuration

Environment Variables / Secrets

Required secrets in Hugging Face Spaces:

Secret Description Example
LLMProvider Provider to use huggingface or ollama
HuggingFaceAPIKey Your HF token hf_xxxxxxxxxxxxx
DefaultModel Default model mistralai/Mistral-7B-Instruct-v0.2
OllamaBaseURL Only if using Ollama http://localhost:11434

Recommended Models for HF Spaces

  • mistralai/Mistral-7B-Instruct-v0.2 (Fast, efficient)
  • microsoft/phi-3-mini-4k-instruct (Compact)
  • meta-llama/Meta-Llama-3-8B-Instruct (High quality)
  • google/gemma-7b-it (Versatile)

πŸ—οΈ Architecture

backend/
β”œβ”€β”€ chat/          # Chat endpoint
β”œβ”€β”€ rag/           # RAG endpoint
β”œβ”€β”€ analyze/       # Text analysis
β”œβ”€β”€ models/        # Model listing
β”œβ”€β”€ health/        # Health check
└── lib/
    β”œβ”€β”€ llm-provider.ts      # Provider abstraction
    β”œβ”€β”€ ollama-client.ts     # Ollama integration
    β”œβ”€β”€ huggingface-client.ts # HF integration
    β”œβ”€β”€ cache.ts             # In-memory caching
    └── types.ts             # TypeScript types

🎯 Features

βœ… Dual Provider Support - Ollama (local) or Hugging Face (cloud)
βœ… Smart Caching - In-memory cache with TTL
βœ… Type-Safe - Full TypeScript support
βœ… Production Ready - Error handling, logging, monitoring
βœ… RESTful API - Clean, consistent endpoints
βœ… Zero Config - Works out of the box on HF Spaces

πŸ” Security

  • API keys stored as repository secrets
  • No secrets in code or logs
  • Rate limiting ready (can add middleware)
  • CORS configured

πŸ“Š Monitoring

Check API health:

curl https://YOUR_SPACE.hf.space/health

Returns:

{
  "status": "healthy",
  "uptime": 3600,
  "provider": "huggingface",
  "modelsAvailable": true,
  "cache": {
    "chat": {"size": 10, "maxEntries": 100, "ttl": 300},
    "rag": {"size": 5, "maxEntries": 50, "ttl": 600},
    "analysis": {"size": 2, "maxEntries": 30, "ttl": 900}
  }
}

πŸ†˜ Troubleshooting

"Model loading" errors

  • Wait 30-60 seconds for HF models to load
  • Check your HF token has access to the model

"Secret not set" errors

  • Verify all secrets are configured in Space settings
  • Restart the Space after adding secrets

API not responding

  • Check Space logs in the Hugging Face interface
  • Verify Docker build completed successfully

πŸ“ License

MIT License - feel free to use in your projects!


Built with Encore.ts | Powered by Hugging Face