--- title: SafeRAG Demo emoji: 🤖 colorFrom: blue colorTo: purple sdk: gradio sdk_version: 4.0.0 app_file: app.py pinned: false license: apache-2.0 --- # SafeRAG: High-Performance Calibrated RAG A production-ready Retrieval-Augmented Generation (RAG) system with risk calibration, built on the Hugging Face ecosystem. ## 🚀 Key Features - **Risk Calibration**: Multi-layer risk assessment with adaptive strategies - **High Performance**: Optimized for 2-3.5x throughput improvement - **Hugging Face Native**: Built on HF Datasets, Models, and Spaces - **Production Ready**: Complete pipeline with error handling and monitoring ## 🏗️ Architecture ``` HF Datasets → Embedding (BGE/E5) → FAISS Index Query → Batched Retrieval → Evidence Selector → Generator (vLLM + gpt-oss-20b) → Risk Calibration → Adaptive Strategy → Output (Answer + Citations + Risk Score) ``` ## 📊 Performance Targets - **QA Accuracy**: EM/F1 improvements over vanilla RAG - **Attribution**: +8-12pt improvement in citation precision/recall - **Calibration**: 30-40% reduction in ECE (Expected Calibration Error) - **Throughput**: 2-3.5x improvement with vLLM ## 🛠️ Quick Start ### Run Tests ```bash python3 simple_e2e_test.py ``` ### Start Demo ```bash python3 app.py ``` ## 📈 Evaluation The system has been tested with comprehensive end-to-end tests: - ✅ Text processing and sentence extraction - ✅ Embedding creation and similarity calculation - ✅ Passage retrieval and reranking - ✅ Risk feature extraction and prediction - ✅ Risk-aware answer generation - ✅ Evaluation metrics (EM, F1, ROUGE) - ✅ Complete end-to-end RAG pipeline ## 🔧 Configuration Key parameters in `config.yaml`: - **Risk Thresholds**: τ₁ = 0.3, τ₂ = 0.7 - **Retrieval**: k = 20, rerank_k = 10 - **Generation**: max_tokens = 512, temperature = 0.7 - **Calibration**: 16 features, logistic regression ## 🎯 Risk Calibration ### Risk Features (16-dimensional) 1. **Retrieval Statistics**: Similarity scores, variance, diversity 2. **Coverage Features**: Token/entity overlap between Q&A 3. **Consistency Features**: Semantic similarity between passages 4. **Diversity Features**: Topic variance, passage diversity ### Adaptive Strategies - **Low Risk (r < τ₁)**: Normal generation - **Medium Risk (τ₁ ≤ r < τ₂)**: Conservative generation + citations - **High Risk (r ≥ τ₂)**: Very conservative or refuse ## 📚 Datasets - **HotpotQA**: Multi-hop reasoning with supporting facts - **TriviaQA**: Open-domain QA for general knowledge - **Wikipedia**: Knowledge base via HF Datasets ## 📄 Citation ```bibtex @article{safrag2024, title={SafeRAG: High-Performance Calibrated RAG with Risk Assessment}, author={Your Name}, journal={arXiv preprint}, year={2024} } ``` ## 📝 License Apache 2.0 License - see LICENSE file for details. --- **SafeRAG**: A production-ready RAG system with risk calibration, built on Hugging Face ecosystem.