Spaces:

ai-forever
/

rag-leaderboard

Running

App Files Files Community

rag-leaderboard / README.md

ai-forever

Initialize README

18206bc verified 5 months ago

preview code

raw

history blame contribute delete

2.77 kB

	---
	title: RAG Benchmark Leaderboard
	emoji: 📚
	colorFrom: gray
	colorTo: purple
	sdk: gradio
	sdk_version: 5.4.0
	app_file: app.py
	pinned: false
	---

	# RAG Benchmark Leaderboard

	An interactive leaderboard for comparing and visualizing the performance of RAG (Retrieval-Augmented Generation) systems.

	## Features

	- Version Comparison: Compare model performances across different versions of the benchmark dataset
	- Interactive Radar Charts: Visualize generative and retrieval metrics
	- Customizable Views: Filter and sort models based on different criteria
	- Easy Submission: Simple API for submitting your model results

	## Installation

	```bash
	pip install -r requirements.txt
	```

	## Running the Leaderboard

	```bash
	cd leaderboard
	python app.py
	```

	This will start a Gradio server, and you can access the leaderboard in your browser at http://localhost:7860.

	## Submitting Results

	To submit your results to the leaderboard, use the provided API:

	```python
	from rag_benchmark import RAGBenchmark

	# Initialize the benchmark
	benchmark = RAGBenchmark(version="2.0") # Use the latest version

	# Run evaluation
	results = benchmark.evaluate(
	model_name="Your Model Name",
	embedding_model="your-embedding-model",
	retriever_type="dense", # Options: dense, sparse, hybrid
	retrieval_config={"top_k": 3}
	)

	# Submit results
	benchmark.submit_results(results)
	```

	## Data Format

	The results.json file has the following structure:

	```json
	{
	"items": {
	"1.0": { // Dataset version
	"model1": { // Submission ID
	"model_name": "Model Name",
	"timestamp": "2025-03-20T12:00:00",
	"config": {
	"embedding_model": "embedding-model-name",
	"retriever_type": "dense",
	"retrieval_config": {
	"top_k": 3
	}
	},
	"metrics": {
	"retrieval": {
	"hit_rate": 0.82,
	"mrr": 0.65,
	"precision": 0.78
	},
	"generation": {
	"rouge1": 0.72,
	"rouge2": 0.55,
	"rougeL": 0.68
	}
	}
	}
	}
	},
	"last_version": "2.0",
	"n_questions": "1000"
	}
	```

	## License

	MIT

	# RAG Evaluation Leaderboard

	This leaderboard tracks different RAG (Retrieval-Augmented Generation) implementations and their performance metrics.

	## Metrics Tracked

	### Retrieval Metrics
	- Hit Rate: Proportion of relevant documents retrieved
	- MRR (Mean Reciprocal Rank): Position of first relevant document

	### Generation Metrics
	- ROUGE-1: Unigram overlap
	- ROUGE-2: Bigram overlap
	- ROUGE-L: Longest common subsequence