Spaces:

goodmodeler
/

safe_rag

Sleeping

App Files Files Community

safe_rag / README.md

Tairun Meng

Initial commit: SafeRAG project ready for HF Spaces

db06013 4 months ago

preview code

raw

history blame contribute delete

2.99 kB

	---
	title: SafeRAG Demo
	emoji: 🤖
	colorFrom: blue
	colorTo: purple
	sdk: gradio
	sdk_version: 4.0.0
	app_file: app.py
	pinned: false
	license: apache-2.0
	---

	# SafeRAG: High-Performance Calibrated RAG

	A production-ready Retrieval-Augmented Generation (RAG) system with risk calibration, built on the Hugging Face ecosystem.

	## 🚀 Key Features

	- Risk Calibration: Multi-layer risk assessment with adaptive strategies
	- High Performance: Optimized for 2-3.5x throughput improvement
	- Hugging Face Native: Built on HF Datasets, Models, and Spaces
	- Production Ready: Complete pipeline with error handling and monitoring

	## 🏗️ Architecture

	```
	HF Datasets → Embedding (BGE/E5) → FAISS Index
	Query → Batched Retrieval → Evidence Selector → Generator (vLLM + gpt-oss-20b)
	→ Risk Calibration → Adaptive Strategy → Output (Answer + Citations + Risk Score)
	```

	## 📊 Performance Targets

	- QA Accuracy: EM/F1 improvements over vanilla RAG
	- Attribution: +8-12pt improvement in citation precision/recall
	- Calibration: 30-40% reduction in ECE (Expected Calibration Error)
	- Throughput: 2-3.5x improvement with vLLM

	## 🛠️ Quick Start

	### Run Tests
	```bash
	python3 simple_e2e_test.py
	```

	### Start Demo
	```bash
	python3 app.py
	```

	## 📈 Evaluation

	The system has been tested with comprehensive end-to-end tests:

	- ✅ Text processing and sentence extraction
	- ✅ Embedding creation and similarity calculation
	- ✅ Passage retrieval and reranking
	- ✅ Risk feature extraction and prediction
	- ✅ Risk-aware answer generation
	- ✅ Evaluation metrics (EM, F1, ROUGE)
	- ✅ Complete end-to-end RAG pipeline

	## 🔧 Configuration

	Key parameters in `config.yaml`:

	- Risk Thresholds: τ₁ = 0.3, τ₂ = 0.7
	- Retrieval: k = 20, rerank_k = 10
	- Generation: max_tokens = 512, temperature = 0.7
	- Calibration: 16 features, logistic regression

	## 🎯 Risk Calibration

	### Risk Features (16-dimensional)
	1. Retrieval Statistics: Similarity scores, variance, diversity
	2. Coverage Features: Token/entity overlap between Q&A
	3. Consistency Features: Semantic similarity between passages
	4. Diversity Features: Topic variance, passage diversity

	### Adaptive Strategies
	- Low Risk (r < τ₁): Normal generation
	- Medium Risk (τ₁ ≤ r < τ₂): Conservative generation + citations
	- High Risk (r ≥ τ₂): Very conservative or refuse

	## 📚 Datasets

	- HotpotQA: Multi-hop reasoning with supporting facts
	- TriviaQA: Open-domain QA for general knowledge
	- Wikipedia: Knowledge base via HF Datasets

	## 📄 Citation

	```bibtex
	@article{safrag2024,
	title={SafeRAG: High-Performance Calibrated RAG with Risk Assessment},
	author={Your Name},
	journal={arXiv preprint},
	year={2024}
	}
	```

	## 📝 License

	Apache 2.0 License - see LICENSE file for details.

	---

	SafeRAG: A production-ready RAG system with risk calibration, built on Hugging Face ecosystem.