File size: 2,986 Bytes
db06013
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
---
title: SafeRAG Demo
emoji: πŸ€–
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.0.0
app_file: app.py
pinned: false
license: apache-2.0
---

# SafeRAG: High-Performance Calibrated RAG

A production-ready Retrieval-Augmented Generation (RAG) system with risk calibration, built on the Hugging Face ecosystem.

## πŸš€ Key Features

- **Risk Calibration**: Multi-layer risk assessment with adaptive strategies
- **High Performance**: Optimized for 2-3.5x throughput improvement
- **Hugging Face Native**: Built on HF Datasets, Models, and Spaces
- **Production Ready**: Complete pipeline with error handling and monitoring

## πŸ—οΈ Architecture

```
HF Datasets β†’ Embedding (BGE/E5) β†’ FAISS Index
Query β†’ Batched Retrieval β†’ Evidence Selector β†’ Generator (vLLM + gpt-oss-20b)
β†’ Risk Calibration β†’ Adaptive Strategy β†’ Output (Answer + Citations + Risk Score)
```

## πŸ“Š Performance Targets

- **QA Accuracy**: EM/F1 improvements over vanilla RAG
- **Attribution**: +8-12pt improvement in citation precision/recall
- **Calibration**: 30-40% reduction in ECE (Expected Calibration Error)
- **Throughput**: 2-3.5x improvement with vLLM

## πŸ› οΈ Quick Start

### Run Tests
```bash
python3 simple_e2e_test.py
```

### Start Demo
```bash
python3 app.py
```

## πŸ“ˆ Evaluation

The system has been tested with comprehensive end-to-end tests:

- βœ… Text processing and sentence extraction
- βœ… Embedding creation and similarity calculation
- βœ… Passage retrieval and reranking
- βœ… Risk feature extraction and prediction
- βœ… Risk-aware answer generation
- βœ… Evaluation metrics (EM, F1, ROUGE)
- βœ… Complete end-to-end RAG pipeline

## πŸ”§ Configuration

Key parameters in `config.yaml`:

- **Risk Thresholds**: τ₁ = 0.3, Ο„β‚‚ = 0.7
- **Retrieval**: k = 20, rerank_k = 10
- **Generation**: max_tokens = 512, temperature = 0.7
- **Calibration**: 16 features, logistic regression

## 🎯 Risk Calibration

### Risk Features (16-dimensional)
1. **Retrieval Statistics**: Similarity scores, variance, diversity
2. **Coverage Features**: Token/entity overlap between Q&A
3. **Consistency Features**: Semantic similarity between passages
4. **Diversity Features**: Topic variance, passage diversity

### Adaptive Strategies
- **Low Risk (r < τ₁)**: Normal generation
- **Medium Risk (τ₁ ≀ r < Ο„β‚‚)**: Conservative generation + citations
- **High Risk (r β‰₯ Ο„β‚‚)**: Very conservative or refuse

## πŸ“š Datasets

- **HotpotQA**: Multi-hop reasoning with supporting facts
- **TriviaQA**: Open-domain QA for general knowledge
- **Wikipedia**: Knowledge base via HF Datasets

## πŸ“„ Citation

```bibtex
@article{safrag2024,
  title={SafeRAG: High-Performance Calibrated RAG with Risk Assessment},
  author={Your Name},
  journal={arXiv preprint},
  year={2024}
}
```

## πŸ“ License

Apache 2.0 License - see LICENSE file for details.

---

**SafeRAG**: A production-ready RAG system with risk calibration, built on Hugging Face ecosystem.