Spaces:
Running
Running
File size: 2,771 Bytes
aff180f 18206bc aff180f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 |
---
title: RAG Benchmark Leaderboard
emoji: π
colorFrom: gray
colorTo: purple
sdk: gradio
sdk_version: 5.4.0
app_file: app.py
pinned: false
---
# RAG Benchmark Leaderboard
An interactive leaderboard for comparing and visualizing the performance of RAG (Retrieval-Augmented Generation) systems.
## Features
- **Version Comparison**: Compare model performances across different versions of the benchmark dataset
- **Interactive Radar Charts**: Visualize generative and retrieval metrics
- **Customizable Views**: Filter and sort models based on different criteria
- **Easy Submission**: Simple API for submitting your model results
## Installation
```bash
pip install -r requirements.txt
```
## Running the Leaderboard
```bash
cd leaderboard
python app.py
```
This will start a Gradio server, and you can access the leaderboard in your browser at http://localhost:7860.
## Submitting Results
To submit your results to the leaderboard, use the provided API:
```python
from rag_benchmark import RAGBenchmark
# Initialize the benchmark
benchmark = RAGBenchmark(version="2.0") # Use the latest version
# Run evaluation
results = benchmark.evaluate(
model_name="Your Model Name",
embedding_model="your-embedding-model",
retriever_type="dense", # Options: dense, sparse, hybrid
retrieval_config={"top_k": 3}
)
# Submit results
benchmark.submit_results(results)
```
## Data Format
The results.json file has the following structure:
```json
{
"items": {
"1.0": { // Dataset version
"model1": { // Submission ID
"model_name": "Model Name",
"timestamp": "2025-03-20T12:00:00",
"config": {
"embedding_model": "embedding-model-name",
"retriever_type": "dense",
"retrieval_config": {
"top_k": 3
}
},
"metrics": {
"retrieval": {
"hit_rate": 0.82,
"mrr": 0.65,
"precision": 0.78
},
"generation": {
"rouge1": 0.72,
"rouge2": 0.55,
"rougeL": 0.68
}
}
}
}
},
"last_version": "2.0",
"n_questions": "1000"
}
```
## License
MIT
# RAG Evaluation Leaderboard
This leaderboard tracks different RAG (Retrieval-Augmented Generation) implementations and their performance metrics.
## Metrics Tracked
### Retrieval Metrics
- Hit Rate: Proportion of relevant documents retrieved
- MRR (Mean Reciprocal Rank): Position of first relevant document
### Generation Metrics
- ROUGE-1: Unigram overlap
- ROUGE-2: Bigram overlap
- ROUGE-L: Longest common subsequence
|