Spaces:
Sleeping
Sleeping
File size: 2,986 Bytes
db06013 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 |
---
title: SafeRAG Demo
emoji: π€
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.0.0
app_file: app.py
pinned: false
license: apache-2.0
---
# SafeRAG: High-Performance Calibrated RAG
A production-ready Retrieval-Augmented Generation (RAG) system with risk calibration, built on the Hugging Face ecosystem.
## π Key Features
- **Risk Calibration**: Multi-layer risk assessment with adaptive strategies
- **High Performance**: Optimized for 2-3.5x throughput improvement
- **Hugging Face Native**: Built on HF Datasets, Models, and Spaces
- **Production Ready**: Complete pipeline with error handling and monitoring
## ποΈ Architecture
```
HF Datasets β Embedding (BGE/E5) β FAISS Index
Query β Batched Retrieval β Evidence Selector β Generator (vLLM + gpt-oss-20b)
β Risk Calibration β Adaptive Strategy β Output (Answer + Citations + Risk Score)
```
## π Performance Targets
- **QA Accuracy**: EM/F1 improvements over vanilla RAG
- **Attribution**: +8-12pt improvement in citation precision/recall
- **Calibration**: 30-40% reduction in ECE (Expected Calibration Error)
- **Throughput**: 2-3.5x improvement with vLLM
## π οΈ Quick Start
### Run Tests
```bash
python3 simple_e2e_test.py
```
### Start Demo
```bash
python3 app.py
```
## π Evaluation
The system has been tested with comprehensive end-to-end tests:
- β
Text processing and sentence extraction
- β
Embedding creation and similarity calculation
- β
Passage retrieval and reranking
- β
Risk feature extraction and prediction
- β
Risk-aware answer generation
- β
Evaluation metrics (EM, F1, ROUGE)
- β
Complete end-to-end RAG pipeline
## π§ Configuration
Key parameters in `config.yaml`:
- **Risk Thresholds**: Οβ = 0.3, Οβ = 0.7
- **Retrieval**: k = 20, rerank_k = 10
- **Generation**: max_tokens = 512, temperature = 0.7
- **Calibration**: 16 features, logistic regression
## π― Risk Calibration
### Risk Features (16-dimensional)
1. **Retrieval Statistics**: Similarity scores, variance, diversity
2. **Coverage Features**: Token/entity overlap between Q&A
3. **Consistency Features**: Semantic similarity between passages
4. **Diversity Features**: Topic variance, passage diversity
### Adaptive Strategies
- **Low Risk (r < Οβ)**: Normal generation
- **Medium Risk (Οβ β€ r < Οβ)**: Conservative generation + citations
- **High Risk (r β₯ Οβ)**: Very conservative or refuse
## π Datasets
- **HotpotQA**: Multi-hop reasoning with supporting facts
- **TriviaQA**: Open-domain QA for general knowledge
- **Wikipedia**: Knowledge base via HF Datasets
## π Citation
```bibtex
@article{safrag2024,
title={SafeRAG: High-Performance Calibrated RAG with Risk Assessment},
author={Your Name},
journal={arXiv preprint},
year={2024}
}
```
## π License
Apache 2.0 License - see LICENSE file for details.
---
**SafeRAG**: A production-ready RAG system with risk calibration, built on Hugging Face ecosystem. |