Spaces:
Runtime error
Runtime error
| title: nDCG | |
| emoji: 👁 | |
| colorFrom: red | |
| colorTo: blue | |
| sdk: gradio | |
| sdk_version: 5.28.0 | |
| app_file: app.py | |
| pinned: false | |
| license: mit | |
| tags: | |
| - evaluate | |
| - metric | |
| - ranking | |
| description: >- | |
| The Discounted Cumulative Gain is a measure of ranking quality. It is used to | |
| evaluate Information Retrieval Systems under the following 2 assumptions: | |
| 1. Highly relevant documents/Labels are more useful when appearing earlier in the results | |
| 2. Documents/Labels are relevant to different degrees | |
| It is defined as the Sum over all relevances of the retrieved documents | |
| reduced logarithmically proportional to the position in which they were | |
| retrieved. The Normalized DCG (nDCG) divides the resulting value by the best | |
| possible value to get a value between 0 and 1 s.t. a perfect retrieval | |
| achieves a nDCG of 1. | |
| # Metric Card for nDCG | |
| ## Metric Description | |
| The Discounted Cumulative Gain is a measure of ranking quality. | |
| It is used to evaluate Information Retrieval Systems under the 2 assumptions: | |
| 1. Highly relevant documents/Labels are more useful when appearing earlier in the results | |
| 2. Documents/Labels are relevant to different degrees | |
| It is defined as the sum over all relevances of the retrieved documents reduced logarithmically proportional to | |
| the position in which they were retrieved. | |
| The Normalized DCG (nDCG) divides the resulting value by the optimal value that can be achieved to get a value between | |
| 0 and 1 s.t. a perfect retrieval achieves a nDCG of 1.0 | |
| ## How to Use | |
| At minimum, this metric takes as input two `list`s of `list`s, each containing `float`s: predictions and references. | |
| ```python | |
| import evaluate | |
| nDCG_metric = evaluate.load('JP-SystemsX/nDCG') | |
| results = nDCG_metric.compute(references=[[0, 1]], predictions=[[0, 1]]) | |
| print(results) | |
| ["{'nDCG@2': 1.0}"] | |
| ``` | |
| ### Inputs: | |
| **references** (`list` of `float`): True relevance | |
| **predictions** (`list` of `float`): Either predicted relevance, probability estimates or confidence values | |
| **k** (`int`): If set to a value only the k highest scores in the ranking will be considered, else considers all outputs. | |
| Defaults to None. | |
| **sample_weight** (`list` of `float`): Sample weights Defaults to None. | |
| **ignore_ties** (`boolean`): If set to true, assumes that there are no ties (this is likely if predictions are continuous) | |
| for efficiency gains. Defaults to False. | |
| ### Output: | |
| **normalized_discounted_cumulative_gain** (`float`): The averaged nDCG scores for all samples. | |
| Minimum possible value is 0.0 Maximum possible value is 1.0 | |
| Output Example(s): | |
| ```python | |
| {'nDCG@5': 1.0} | |
| {'nDCG': 0.876} | |
| ``` | |
| This metric outputs a dictionary, containing the nDCG score | |
| ### Examples: | |
| Example 1-A simple example | |
| >>> nDCG_metric = evaluate.load("JP-SystemsX/nDCG") | |
| >>> results = nDCG_metric.compute(references=[[10, 0, 0, 1, 5]], predictions=[[.1, .2, .3, 4, 70]]) | |
| >>> print(results) | |
| {'nDCG': 0.6956940443813076} | |
| Example 2-The same as Example 1, except with k set to 3. | |
| >>> nDCG_metric = evaluate.load("JP-SystemsX/nDCG") | |
| >>> results = nDCG_metric.compute(references=[[10, 0, 0, 1, 5]], predictions=[[.1, .2, .3, 4, 70]], k=3) | |
| >>> print(results) | |
| {'nDCG@3': 0.4123818817534531} | |
| Example 3-There is only one relevant label, but there is a tie and the model can't decide which one is the one. | |
| >>> accuracy_metric = evaluate.load("accuracy") | |
| >>> results = nDCG_metric.compute(references=[[1, 0, 0, 0, 0]], predictions=[[1, 1, 0, 0, 0]], k=1) | |
| >>> print(results) | |
| {'nDCG@1': 0.5} | |
| >>> #That is it calculates both and returns the average of both | |
| Example 4-The Same as 3, except ignore_ties is set to True. | |
| >>> accuracy_metric = evaluate.load("accuracy") | |
| >>> results = nDCG_metric.compute(references=[[1, 0, 0, 0, 0]], predictions=[[1, 1, 0, 0, 0]], k=1, ignore_ties=True) | |
| >>> print(results) | |
| {'nDCG@1': 0.0} | |
| >>> # Alternative Result: {'nDCG@1': 1.0} | |
| >>> # That is it chooses one of the 2 candidates and calculates the score only for this one | |
| >>> # That means the score may vary depending on which one was chosen | |
| ## Citation(s) | |
| ```bibtex | |
| @article{scikit-learn, | |
| title={Scikit-learn: Machine Learning in {P}ython}, | |
| author={Pedregosa, F. and Varoquaux, G. and Gramfort, A. and Michel, V. | |
| and Thirion, B. and Grisel, O. and Blondel, M. and Prettenhofer, P. | |
| and Weiss, R. and Dubourg, V. and Vanderplas, J. and Passos, A. and | |
| Cournapeau, D. and Brucher, M. and Perrot, M. and Duchesnay, E.}, | |
| journal={Journal of Machine Learning Research}, | |
| volume={12}, | |
| pages={2825--2830}, | |
| year={2011} | |
| } |