SimpleRAGPipeline / DEPLOYMENT_HF_SPACES.md
Aditya-1301's picture
Add HF Spaces deployment guide and update env configuration with clear comments
89e24f0

A newer version of the Gradio SDK is available: 6.14.0

Upgrade

Deploying to HuggingFace Spaces

This guide explains how to deploy the RAG Pipeline to HuggingFace Spaces.

Important Note About Environment Variables

The .env file is gitignored and will NOT be pushed to HuggingFace Spaces.

Instead, you need to set environment variables as Repository Secrets in your HF Space.

Step-by-Step Deployment

1. Create a HuggingFace Space

  1. Go to https://huggingface.co/new-space
  2. Choose:
    • Name: Your space name (e.g., SimpleRAGPipeline)
    • License: MIT or your choice
    • SDK: Gradio
    • Hardware: CPU basic (free tier)

2. Push Your Code

# Add HF Space as remote
git remote add space https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME

# Push to HF Space
git push space main

3. Configure Repository Secrets

Go to your Space's Settings β†’ Repository secrets and add these secrets:

Required Secrets:

Secret Name Value Description
HF_TOKEN hf_... Your HuggingFace token (Get it here)
LLM_BACKEND huggingface Use HF Inference API for LLM generation

Optional Secrets (if using alternatives):

Secret Name Value Description
OPENAI_API_KEY sk-... If using OpenAI instead of HF
LLM_BACKEND openai If using OpenAI

4. Verify Configuration

After pushing, check the Space logs to ensure:

βœ… Backend auto-detection: using 'huggingface'
βœ… HF_TOKEN configured
βœ… Model: HuggingFaceTB/SmolLM2-1.7B-Instruct

If you see:

⚠️ WARNING: HF_TOKEN not set!

Then you need to add HF_TOKEN as a repository secret.

Default Configuration for HF Spaces

The application is pre-configured to work on HF Spaces with these defaults:

  • LLM Backend: HuggingFace Inference API (when HF_TOKEN is set)
  • Model: SmolLM2-1.7B-Instruct (fast on CPU)
  • Embeddings: FastEmbed (local, no API calls)
  • Web Search: Disabled (duckduckgo-search causes issues)

Troubleshooting

"Error generating answer: Failed to connect to Ollama"

Cause: LLM_BACKEND is not set, so it defaults to auto which tries Ollama (not available on HF Spaces).

Fix: Add LLM_BACKEND=huggingface as a repository secret.

"HF_TOKEN not configured"

Cause: The HF_TOKEN secret is not set.

Fix: Add your HuggingFace token as a repository secret named HF_TOKEN.

"Rate limit exceeded"

Cause: Too many requests to HF Inference API (free tier has limits).

Fix:

  • Upgrade to HF Pro for higher rate limits
  • Or switch to OpenAI: set LLM_BACKEND=openai and OPENAI_API_KEY secrets

Local Development vs HF Spaces

Local Development (.env file):

LLM_BACKEND=ollama
OLLAMA_BASE_URL=http://127.0.0.1:11434
OLLAMA_MODEL=smollm2:360m

HF Spaces (Repository Secrets):

LLM_BACKEND=huggingface
HF_TOKEN=hf_...

Architecture on HF Spaces

User Query
    ↓
FastEmbed (local) β†’ Generate embeddings
    ↓
FAISS β†’ Retrieve relevant chunks
    ↓
HuggingFace Inference API β†’ Generate answer
    ↓
Response with citations

Benefits:

  • βœ… Runs entirely on CPU (no GPU needed)
  • βœ… Free tier available (with rate limits)
  • βœ… No local installation required
  • βœ… Automatic scaling and hosting

Cost Analysis

Component HF Spaces Free Cost
Hosting CPU basic Free
Embeddings FastEmbed (local) Free
LLM HF Inference API Free (rate-limited)
Total $0/month

For higher throughput, upgrade to HF Pro or use Zero GPU spaces.

Resources