Spaces:
Sleeping
A newer version of the Gradio SDK is available: 6.14.0
Deploying to HuggingFace Spaces
This guide explains how to deploy the RAG Pipeline to HuggingFace Spaces.
Important Note About Environment Variables
The .env file is gitignored and will NOT be pushed to HuggingFace Spaces.
Instead, you need to set environment variables as Repository Secrets in your HF Space.
Step-by-Step Deployment
1. Create a HuggingFace Space
- Go to https://huggingface.co/new-space
- Choose:
- Name: Your space name (e.g.,
SimpleRAGPipeline) - License: MIT or your choice
- SDK: Gradio
- Hardware: CPU basic (free tier)
- Name: Your space name (e.g.,
2. Push Your Code
# Add HF Space as remote
git remote add space https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
# Push to HF Space
git push space main
3. Configure Repository Secrets
Go to your Space's Settings β Repository secrets and add these secrets:
Required Secrets:
| Secret Name | Value | Description |
|---|---|---|
HF_TOKEN |
hf_... |
Your HuggingFace token (Get it here) |
LLM_BACKEND |
huggingface |
Use HF Inference API for LLM generation |
Optional Secrets (if using alternatives):
| Secret Name | Value | Description |
|---|---|---|
OPENAI_API_KEY |
sk-... |
If using OpenAI instead of HF |
LLM_BACKEND |
openai |
If using OpenAI |
4. Verify Configuration
After pushing, check the Space logs to ensure:
β
Backend auto-detection: using 'huggingface'
β
HF_TOKEN configured
β
Model: HuggingFaceTB/SmolLM2-1.7B-Instruct
If you see:
β οΈ WARNING: HF_TOKEN not set!
Then you need to add HF_TOKEN as a repository secret.
Default Configuration for HF Spaces
The application is pre-configured to work on HF Spaces with these defaults:
- LLM Backend: HuggingFace Inference API (when
HF_TOKENis set) - Model: SmolLM2-1.7B-Instruct (fast on CPU)
- Embeddings: FastEmbed (local, no API calls)
- Web Search: Disabled (duckduckgo-search causes issues)
Troubleshooting
"Error generating answer: Failed to connect to Ollama"
Cause: LLM_BACKEND is not set, so it defaults to auto which tries Ollama (not available on HF Spaces).
Fix: Add LLM_BACKEND=huggingface as a repository secret.
"HF_TOKEN not configured"
Cause: The HF_TOKEN secret is not set.
Fix: Add your HuggingFace token as a repository secret named HF_TOKEN.
"Rate limit exceeded"
Cause: Too many requests to HF Inference API (free tier has limits).
Fix:
- Upgrade to HF Pro for higher rate limits
- Or switch to OpenAI: set
LLM_BACKEND=openaiandOPENAI_API_KEYsecrets
Local Development vs HF Spaces
Local Development (.env file):
LLM_BACKEND=ollama
OLLAMA_BASE_URL=http://127.0.0.1:11434
OLLAMA_MODEL=smollm2:360m
HF Spaces (Repository Secrets):
LLM_BACKEND=huggingface
HF_TOKEN=hf_...
Architecture on HF Spaces
User Query
β
FastEmbed (local) β Generate embeddings
β
FAISS β Retrieve relevant chunks
β
HuggingFace Inference API β Generate answer
β
Response with citations
Benefits:
- β Runs entirely on CPU (no GPU needed)
- β Free tier available (with rate limits)
- β No local installation required
- β Automatic scaling and hosting
Cost Analysis
| Component | HF Spaces Free | Cost |
|---|---|---|
| Hosting | CPU basic | Free |
| Embeddings | FastEmbed (local) | Free |
| LLM | HF Inference API | Free (rate-limited) |
| Total | $0/month |
For higher throughput, upgrade to HF Pro or use Zero GPU spaces.