Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available:
6.5.1
metadata
title: SafeRAG Demo
emoji: π€
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.0.0
app_file: app.py
pinned: false
license: apache-2.0
SafeRAG: High-Performance Calibrated RAG
A production-ready Retrieval-Augmented Generation (RAG) system with risk calibration, built on the Hugging Face ecosystem.
π Key Features
- Risk Calibration: Multi-layer risk assessment with adaptive strategies
- High Performance: Optimized for 2-3.5x throughput improvement
- Hugging Face Native: Built on HF Datasets, Models, and Spaces
- Production Ready: Complete pipeline with error handling and monitoring
ποΈ Architecture
HF Datasets β Embedding (BGE/E5) β FAISS Index
Query β Batched Retrieval β Evidence Selector β Generator (vLLM + gpt-oss-20b)
β Risk Calibration β Adaptive Strategy β Output (Answer + Citations + Risk Score)
π Performance Targets
- QA Accuracy: EM/F1 improvements over vanilla RAG
- Attribution: +8-12pt improvement in citation precision/recall
- Calibration: 30-40% reduction in ECE (Expected Calibration Error)
- Throughput: 2-3.5x improvement with vLLM
π οΈ Quick Start
Run Tests
python3 simple_e2e_test.py
Start Demo
python3 app.py
π Evaluation
The system has been tested with comprehensive end-to-end tests:
- β Text processing and sentence extraction
- β Embedding creation and similarity calculation
- β Passage retrieval and reranking
- β Risk feature extraction and prediction
- β Risk-aware answer generation
- β Evaluation metrics (EM, F1, ROUGE)
- β Complete end-to-end RAG pipeline
π§ Configuration
Key parameters in config.yaml:
- Risk Thresholds: Οβ = 0.3, Οβ = 0.7
- Retrieval: k = 20, rerank_k = 10
- Generation: max_tokens = 512, temperature = 0.7
- Calibration: 16 features, logistic regression
π― Risk Calibration
Risk Features (16-dimensional)
- Retrieval Statistics: Similarity scores, variance, diversity
- Coverage Features: Token/entity overlap between Q&A
- Consistency Features: Semantic similarity between passages
- Diversity Features: Topic variance, passage diversity
Adaptive Strategies
- Low Risk (r < Οβ): Normal generation
- Medium Risk (Οβ β€ r < Οβ): Conservative generation + citations
- High Risk (r β₯ Οβ): Very conservative or refuse
π Datasets
- HotpotQA: Multi-hop reasoning with supporting facts
- TriviaQA: Open-domain QA for general knowledge
- Wikipedia: Knowledge base via HF Datasets
π Citation
@article{safrag2024,
title={SafeRAG: High-Performance Calibrated RAG with Risk Assessment},
author={Your Name},
journal={arXiv preprint},
year={2024}
}
π License
Apache 2.0 License - see LICENSE file for details.
SafeRAG: A production-ready RAG system with risk calibration, built on Hugging Face ecosystem.