|
|
--- |
|
|
title: TREC Eval |
|
|
emoji: 🤗 |
|
|
colorFrom: blue |
|
|
colorTo: red |
|
|
sdk: gradio |
|
|
sdk_version: 3.19.1 |
|
|
app_file: app.py |
|
|
pinned: false |
|
|
tags: |
|
|
- evaluate |
|
|
- metric |
|
|
description: >- |
|
|
The TREC Eval metric combines a number of information retrieval metrics such as precision and nDCG. It is used to score rankings of retrieved documents with reference values. |
|
|
--- |
|
|
|
|
|
# Metric Card for TREC Eval |
|
|
|
|
|
## Metric Description |
|
|
|
|
|
The TREC Eval metric combines a number of information retrieval metrics such as precision and normalized Discounted Cumulative Gain (nDCG). It is used to score rankings of retrieved documents with reference values. |
|
|
|
|
|
## How to Use |
|
|
```Python |
|
|
from evaluate import load |
|
|
trec_eval = load("trec_eval") |
|
|
results = trec_eval.compute(predictions=[run], references=[qrel]) |
|
|
``` |
|
|
|
|
|
### Inputs |
|
|
- **predictions** *(dict): a single retrieval run.* |
|
|
- **query** *(int): Query ID.* |
|
|
- **q0** *(str): Literal `"q0"`.* |
|
|
- **docid** *(str): Document ID.* |
|
|
- **rank** *(int): Rank of document.* |
|
|
- **score** *(float): Score of document.* |
|
|
- **system** *(str): Tag for current run.* |
|
|
- **references** *(dict): a single qrel.* |
|
|
- **query** *(int): Query ID.* |
|
|
- **q0** *(str): Literal `"q0"`.* |
|
|
- **docid** *(str): Document ID.* |
|
|
- **rel** *(int): Relevance of document.* |
|
|
|
|
|
### Output Values |
|
|
- **runid** *(str): Run name.* |
|
|
- **num_ret** *(int): Number of retrieved documents.* |
|
|
- **num_rel** *(int): Number of relevant documents.* |
|
|
- **num_rel_ret** *(int): Number of retrieved relevant documents.* |
|
|
- **num_q** *(int): Number of queries.* |
|
|
- **map** *(float): Mean average precision.* |
|
|
- **gm_map** *(float): geometric mean average precision.* |
|
|
- **bpref** *(float): binary preference score.* |
|
|
- **Rprec** *(float): precision@R, where R is number of relevant documents.* |
|
|
- **recip_rank** *(float): reciprocal rank* |
|
|
- **P@k** *(float): precision@k (k in [5, 10, 15, 20, 30, 100, 200, 500, 1000]).* |
|
|
- **NDCG@k** *(float): nDCG@k (k in [5, 10, 15, 20, 30, 100, 200, 500, 1000]).* |
|
|
|
|
|
### Examples |
|
|
|
|
|
A minimal example of looks as follows: |
|
|
```Python |
|
|
qrel = { |
|
|
"query": [0], |
|
|
"q0": ["q0"], |
|
|
"docid": ["doc_1"], |
|
|
"rel": [2] |
|
|
} |
|
|
run = { |
|
|
"query": [0, 0], |
|
|
"q0": ["q0", "q0"], |
|
|
"docid": ["doc_2", "doc_1"], |
|
|
"rank": [0, 1], |
|
|
"score": [1.5, 1.2], |
|
|
"system": ["test", "test"] |
|
|
} |
|
|
|
|
|
trec_eval = evaluate.load("trec_eval") |
|
|
results = trec_eval.compute(references=[qrel], predictions=[run]) |
|
|
results["P@5"] |
|
|
0.2 |
|
|
``` |
|
|
|
|
|
A more realistic use case with an examples from [`trectools`](https://github.com/joaopalotti/trectools): |
|
|
|
|
|
```python |
|
|
qrel = pd.read_csv("robust03_qrels.txt", sep="\s+", names=["query", "q0", "docid", "rel"]) |
|
|
qrel["q0"] = qrel["q0"].astype(str) |
|
|
qrel = qrel.to_dict(orient="list") |
|
|
|
|
|
run = pd.read_csv("input.InexpC2", sep="\s+", names=["query", "q0", "docid", "rank", "score", "system"]) |
|
|
run = run.to_dict(orient="list") |
|
|
|
|
|
trec_eval = evaluate.load("trec_eval") |
|
|
result = trec_eval.compute(run=[run], qrel=[qrel]) |
|
|
``` |
|
|
|
|
|
```python |
|
|
result |
|
|
|
|
|
{'runid': 'InexpC2', |
|
|
'num_ret': 100000, |
|
|
'num_rel': 6074, |
|
|
'num_rel_ret': 3198, |
|
|
'num_q': 100, |
|
|
'map': 0.22485930431817494, |
|
|
'gm_map': 0.10411523825735523, |
|
|
'bpref': 0.217511695914079, |
|
|
'Rprec': 0.2502547201167236, |
|
|
'recip_rank': 0.6646545943335417, |
|
|
'P@5': 0.44, |
|
|
'P@10': 0.37, |
|
|
'P@15': 0.34600000000000003, |
|
|
'P@20': 0.30999999999999994, |
|
|
'P@30': 0.2563333333333333, |
|
|
'P@100': 0.1428, |
|
|
'P@200': 0.09510000000000002, |
|
|
'P@500': 0.05242, |
|
|
'P@1000': 0.03198, |
|
|
'NDCG@5': 0.4101480395089769, |
|
|
'NDCG@10': 0.3806761417784469, |
|
|
'NDCG@15': 0.37819463408955706, |
|
|
'NDCG@20': 0.3686080836061317, |
|
|
'NDCG@30': 0.352474353427451, |
|
|
'NDCG@100': 0.3778329431025776, |
|
|
'NDCG@200': 0.4119129817248979, |
|
|
'NDCG@500': 0.4585354576461375, |
|
|
'NDCG@1000': 0.49092149290805653} |
|
|
``` |
|
|
|
|
|
## Limitations and Bias |
|
|
The `trec_eval` metric requires the inputs to be in the TREC run and qrel formats for predictions and references. |
|
|
|
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@inproceedings{palotti2019, |
|
|
author = {Palotti, Joao and Scells, Harrisen and Zuccon, Guido}, |
|
|
title = {TrecTools: an open-source Python library for Information Retrieval practitioners involved in TREC-like campaigns}, |
|
|
series = {SIGIR'19}, |
|
|
year = {2019}, |
|
|
location = {Paris, France}, |
|
|
publisher = {ACM} |
|
|
} |
|
|
``` |
|
|
|
|
|
## Further References |
|
|
|
|
|
- Homepage: https://github.com/joaopalotti/trectools |