--- title: Place_gen_evaluate datasets: - GeoBenchmark tags: - evaluate - metric description: 'TODO: add a description here' sdk: gradio sdk_version: 6.5.1 app_file: app.py pinned: false --- # Metric Card for Place_gen_evaluate ## Metric Description This metric aims to evaluate geographic place prediction tasks done by LMs. For each question, we expect the model to generate a list of places and the gold answers must also be a list of places names. ## How to Use This metric takes 2 mandatory arguments : `generations` (a list of string), `golds` (a list of list of string containing places names). ```python import evaluate place_pred_eval = evaluate.load("rfr2003/place_gen_evaluate") results = place_pred_eval.compute(generations=['[Hotel New Home, Hopeland]'], golds=[['Bar Guisness', 'Hotel New Home', 'New Hopeland']]) print(results) {'bert_score_precision': 0.8470218181610107, 'bert_score_recall': 0.9131535291671753, 'bert_score_f1': 0.8788453936576843, 'bleu-1': 0.5714285714285714, 'precision': [6.0], 'rappel': [15.0], 'macro-mean': [10.5], 'median macro-mean': 10.5} ``` This metric accepts one optional argument: `d`: function used to compute the distance between a generated value and a gold one. The default value is the __distance__ function from the [Levansthein library](https://github.com/rapidfuzz/python-Levenshtein). ### Output Values This metric outputs a dictionary with the following values: `bert_score_precision`: Average of BERTScores Precision values computed by [bertscore module](https://github.com/huggingface/evaluate/blob/main/metrics/bertscore/README.md). `bert_score_recall`: Average of BERTScores Recall values computed by [bertscore module](https://github.com/huggingface/evaluate/blob/main/metrics/bertscore/README.md). `bert_score_f1`: Average of BERTScores f1 values computed by [bertscore module](https://github.com/huggingface/evaluate/blob/main/metrics/bertscore/README.md). `bleu-1`: Bleu-1 score computed by [bleu module](https://github.com/huggingface/evaluate/blob/main/metrics/bleu/README.md). `precision`: Sum of the minimum distances between each predicted value and the set of gold values, computed for each question. `recall`: Sum of the minimum distances between each gold value and the set of generated values, computed for each question. `macro-mean`: Average between precision and recall, computed for each question. `median macro-mean`: Median accross macro-mean values. #### Values from Popular Papers ### Examples ```python import evaluate place_pred_eval = evaluate.load("rfr2003/place_gen_evaluate") results = place_pred_eval.compute(generations=['[Hotel New Home, Hopeland]'], golds=[['Bar Guisness', 'Hotel New Home', 'New Hopeland']]) print(results) {'bert_score_precision': 0.8470218181610107, 'bert_score_recall': 0.9131535291671753, 'bert_score_f1': 0.8788453936576843, 'bleu-1': 0.5714285714285714, 'precision': [6.0], 'rappel': [15.0], 'macro-mean': [10.5], 'median macro-mean': 10.5} ``` ## Limitations and Bias ## Citation ## Further References