Spaces:

sinhapiyush86
/

convAI

Sleeping

📁 Your Space Repository
├── 📄 app.py                    # Main Streamlit application
├── 📄 rag_system.py             # Core RAG system
├── 📄 pdf_processor.py          # PDF processing utilities
├── 📄 guard_rails.py            # Safety and security system
├── 📄 hf_spaces_config.py       # HF Spaces configuration
├── 📄 requirements.txt          # Python dependencies
├── 📄 Dockerfile                # Container configuration
├── 📄 README.md                 # Project documentation
├── 📄 GUARD_RAILS_GUIDE.md     # Guard rails documentation
└── 📄 HF_SPACES_DEPLOYMENT.md   # This deployment guide

Step 3: Configure Environment

The system automatically detects HF Spaces environment and configures:

Cache directories in /tmp (writable in HF Spaces)
Environment variables for model loading
Resource limits optimized for HF Spaces
Permission handling for containerized environment

🔧 Configuration Details

Automatic Environment Detection

The system automatically detects HF Spaces using:

# Environment indicators
'SPACE_ID' in os.environ
'SPACE_HOST' in os.environ
'HF_HUB_ENDPOINT' in os.environ
os.path.exists('/tmp/huggingface')

Cache Directory Setup

# HF Spaces cache directories
HF_HOME=/tmp/huggingface
TRANSFORMERS_CACHE=/tmp/huggingface/transformers
TORCH_HOME=/tmp/torch
XDG_CACHE_HOME=/tmp
HF_HUB_CACHE=/tmp/huggingface/hub

Model Configuration

# Optimized for HF Spaces
embedding_model = 'all-MiniLM-L6-v2'      # Fast, lightweight
generative_model = 'Qwen/Qwen2.5-1.5B-Instruct'  # Primary model
fallback_model = 'distilgpt2'             # Backup model

🚀 Deployment Process

1. Initial Build

When you first deploy, the system will:

Download base image (Python 3.11)
Install dependencies from requirements.txt
Set up cache directories in /tmp
Download models (embedding + language models)
Initialize RAG system with guard rails
Start Streamlit server on port 8501

2. Model Download

The system downloads these models:

Embedding Model: all-MiniLM-L6-v2 (~90MB)
Primary LLM: Qwen/Qwen2.5-1.5B-Instruct (~3GB)
Fallback LLM: distilgpt2 (~300MB)

Note: First deployment may take 10-15 minutes due to model downloads.

3. System Initialization

The RAG system initializes with:

Guard rails enabled for safety
Vector store in ./vector_store
PDF processing ready
Hybrid search (FAISS + BM25) configured

📊 Resource Management

Memory Usage

Base system: ~500MB
Embedding model: ~100MB
Language model: ~3GB
Total: ~3.6GB

CPU Usage

Model loading: High (initial)
Inference: Medium
Search: Low

Storage

Models: ~3.5GB
Cache: ~1GB
Vector store: Variable (based on documents)

🔍 Troubleshooting

Common Issues

1. Permission Denied Errors

Error: [Errno 13] Permission denied: '/.cache'

Solution: The system automatically handles this by using /tmp directories.

2. Model Download Failures

Error: Failed to download model

Solution:

Check internet connectivity
Verify model names in configuration
Wait for retry (automatic)

3. Memory Issues

Error: Out of memory

Solution:

Use smaller models
Reduce batch sizes
Enable cache cleanup

4. Build Failures

Error: Docker build failed

Solution:

Check Dockerfile syntax
Verify all files are uploaded
Check requirements.txt format

Debug Mode

Enable debug logging by setting:

# In hf_spaces_config.py
logging.basicConfig(level=logging.DEBUG)

Health Checks

The system provides health check endpoints:

System status: /health
Model status: /models
Cache status: /cache

🔒 Security Features

Guard Rails

The system includes comprehensive guard rails:

Input validation: Query length, content filtering
Output safety: Response quality, hallucination detection
Data privacy: PII detection and masking
System protection: Rate limiting, resource monitoring

Environment Isolation

Containerized: Isolated from host system
Read-only: File system protection
Network: Limited network access
User: Non-root user execution

📈 Performance Optimization

Caching Strategy

Model caching: Persistent across restarts
Vector caching: FAISS index persistence
Response caching: Frequently asked questions

Resource Optimization

Memory: Efficient model loading
CPU: Parallel processing
Storage: Automatic cleanup

Monitoring

Response times: Real-time metrics
Memory usage: Resource monitoring
Error rates: System health tracking

🔄 Updates and Maintenance

Updating Models

Modify configuration in hf_spaces_config.py
Redeploy the Space
Models will re-download automatically

Updating Code

Push changes to your repository
HF Spaces auto-rebuilds the container
System restarts with new code

Cache Management

The system automatically:

Cleans old cache files
Manages storage usage
Optimizes performance

📞 Support

Documentation

README.md: General project information
GUARD_RAILS_GUIDE.md: Safety system details
This guide: HF Spaces specific instructions

Community

Hugging Face Forums: Community support
GitHub Issues: Bug reports and feature requests
Discord: Real-time help

🎉 Success Checklist

Space created successfully
All files uploaded
Build completed without errors
Models downloaded successfully
RAG system initialized
Streamlit interface accessible
Guard rails enabled
Test queries working
Performance acceptable

🚀 Next Steps

After successful deployment:

Test the system with sample queries
Upload documents for RAG functionality
Monitor performance and resource usage
Customize configuration as needed
Share your Space with others

Happy Deploying! 🎉

Your RAG system is now ready to provide intelligent document question-answering capabilities on Hugging Face Spaces.