Spaces:

goodmodeler
/

safe_rag

Sleeping

App Files Files Community

safe_rag / README.md

Tairun Meng

Initial commit: SafeRAG project ready for HF Spaces

db06013 4 months ago

preview code

raw

history blame contribute delete

2.99 kB

A newer version of the Gradio SDK is available: 6.5.1

Upgrade

metadata

title: SafeRAG Demo
emoji: 🤖
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.0.0
app_file: app.py
pinned: false
license: apache-2.0

SafeRAG: High-Performance Calibrated RAG

A production-ready Retrieval-Augmented Generation (RAG) system with risk calibration, built on the Hugging Face ecosystem.

🚀 Key Features

Risk Calibration: Multi-layer risk assessment with adaptive strategies
High Performance: Optimized for 2-3.5x throughput improvement
Hugging Face Native: Built on HF Datasets, Models, and Spaces
Production Ready: Complete pipeline with error handling and monitoring

🏗️ Architecture

HF Datasets → Embedding (BGE/E5) → FAISS Index
Query → Batched Retrieval → Evidence Selector → Generator (vLLM + gpt-oss-20b)
→ Risk Calibration → Adaptive Strategy → Output (Answer + Citations + Risk Score)

📊 Performance Targets

QA Accuracy: EM/F1 improvements over vanilla RAG
Attribution: +8-12pt improvement in citation precision/recall
Calibration: 30-40% reduction in ECE (Expected Calibration Error)
Throughput: 2-3.5x improvement with vLLM

🛠️ Quick Start

Run Tests

python3 simple_e2e_test.py

Start Demo

python3 app.py

📈 Evaluation

The system has been tested with comprehensive end-to-end tests:

✅ Text processing and sentence extraction
✅ Embedding creation and similarity calculation
✅ Passage retrieval and reranking
✅ Risk feature extraction and prediction
✅ Risk-aware answer generation
✅ Evaluation metrics (EM, F1, ROUGE)
✅ Complete end-to-end RAG pipeline

🔧 Configuration

Key parameters in config.yaml:

Risk Thresholds: τ₁ = 0.3, τ₂ = 0.7
Retrieval: k = 20, rerank_k = 10
Generation: max_tokens = 512, temperature = 0.7
Calibration: 16 features, logistic regression

🎯 Risk Calibration

Risk Features (16-dimensional)

Retrieval Statistics: Similarity scores, variance, diversity
Coverage Features: Token/entity overlap between Q&A
Consistency Features: Semantic similarity between passages
Diversity Features: Topic variance, passage diversity

Adaptive Strategies

Low Risk (r < τ₁): Normal generation
Medium Risk (τ₁ ≤ r < τ₂): Conservative generation + citations
High Risk (r ≥ τ₂): Very conservative or refuse

📚 Datasets

HotpotQA: Multi-hop reasoning with supporting facts
TriviaQA: Open-domain QA for general knowledge
Wikipedia: Knowledge base via HF Datasets

📄 Citation

@article{safrag2024,
  title={SafeRAG: High-Performance Calibrated RAG with Risk Assessment},
  author={Your Name},
  journal={arXiv preprint},
  year={2024}
}

📝 License

Apache 2.0 License - see LICENSE file for details.

SafeRAG: A production-ready RAG system with risk calibration, built on Hugging Face ecosystem.