Spaces:
Sleeping
Sleeping
| title: SafeRAG Demo | |
| emoji: π€ | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: gradio | |
| sdk_version: 4.0.0 | |
| app_file: app.py | |
| pinned: false | |
| license: apache-2.0 | |
| # SafeRAG: High-Performance Calibrated RAG | |
| A production-ready Retrieval-Augmented Generation (RAG) system with risk calibration, built on the Hugging Face ecosystem. | |
| ## π Key Features | |
| - **Risk Calibration**: Multi-layer risk assessment with adaptive strategies | |
| - **High Performance**: Optimized for 2-3.5x throughput improvement | |
| - **Hugging Face Native**: Built on HF Datasets, Models, and Spaces | |
| - **Production Ready**: Complete pipeline with error handling and monitoring | |
| ## ποΈ Architecture | |
| ``` | |
| HF Datasets β Embedding (BGE/E5) β FAISS Index | |
| Query β Batched Retrieval β Evidence Selector β Generator (vLLM + gpt-oss-20b) | |
| β Risk Calibration β Adaptive Strategy β Output (Answer + Citations + Risk Score) | |
| ``` | |
| ## π Performance Targets | |
| - **QA Accuracy**: EM/F1 improvements over vanilla RAG | |
| - **Attribution**: +8-12pt improvement in citation precision/recall | |
| - **Calibration**: 30-40% reduction in ECE (Expected Calibration Error) | |
| - **Throughput**: 2-3.5x improvement with vLLM | |
| ## π οΈ Quick Start | |
| ### Run Tests | |
| ```bash | |
| python3 simple_e2e_test.py | |
| ``` | |
| ### Start Demo | |
| ```bash | |
| python3 app.py | |
| ``` | |
| ## π Evaluation | |
| The system has been tested with comprehensive end-to-end tests: | |
| - β Text processing and sentence extraction | |
| - β Embedding creation and similarity calculation | |
| - β Passage retrieval and reranking | |
| - β Risk feature extraction and prediction | |
| - β Risk-aware answer generation | |
| - β Evaluation metrics (EM, F1, ROUGE) | |
| - β Complete end-to-end RAG pipeline | |
| ## π§ Configuration | |
| Key parameters in `config.yaml`: | |
| - **Risk Thresholds**: Οβ = 0.3, Οβ = 0.7 | |
| - **Retrieval**: k = 20, rerank_k = 10 | |
| - **Generation**: max_tokens = 512, temperature = 0.7 | |
| - **Calibration**: 16 features, logistic regression | |
| ## π― Risk Calibration | |
| ### Risk Features (16-dimensional) | |
| 1. **Retrieval Statistics**: Similarity scores, variance, diversity | |
| 2. **Coverage Features**: Token/entity overlap between Q&A | |
| 3. **Consistency Features**: Semantic similarity between passages | |
| 4. **Diversity Features**: Topic variance, passage diversity | |
| ### Adaptive Strategies | |
| - **Low Risk (r < Οβ)**: Normal generation | |
| - **Medium Risk (Οβ β€ r < Οβ)**: Conservative generation + citations | |
| - **High Risk (r β₯ Οβ)**: Very conservative or refuse | |
| ## π Datasets | |
| - **HotpotQA**: Multi-hop reasoning with supporting facts | |
| - **TriviaQA**: Open-domain QA for general knowledge | |
| - **Wikipedia**: Knowledge base via HF Datasets | |
| ## π Citation | |
| ```bibtex | |
| @article{safrag2024, | |
| title={SafeRAG: High-Performance Calibrated RAG with Risk Assessment}, | |
| author={Your Name}, | |
| journal={arXiv preprint}, | |
| year={2024} | |
| } | |
| ``` | |
| ## π License | |
| Apache 2.0 License - see LICENSE file for details. | |
| --- | |
| **SafeRAG**: A production-ready RAG system with risk calibration, built on Hugging Face ecosystem. |