Spaces:

Viona
/

anls

Runtime error

App Files Files Community

Viona commited on Feb 4, 2023

Commit

95d4295

1 Parent(s): 6be1f2a

writing README.md

Browse files

Files changed (4) hide show

README.md +89 -5
app.py +1 -1
compute_score.py +7 -5
requirements.txt +2 -2

README.md CHANGED Viewed

@@ -8,11 +8,95 @@ sdk_version: 3.17.0
 app_file: app.py
 pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
-Please note that we are considering including other evaluation metrics , which are popular in VQA and Reading Comprehension tasks.
-Answers are not case sensitive
-Answers are space sensitive
-Answers or tokens comprising answers are not limited to a fixed size dictionary. It could be any word/token which is present in the document.

 app_file: app.py
 pinned: false
 ---
+tags:
+- evaluate
+- metric
+description: >-
+  This metric wrap the official scoring script for version 1 of the Average Normalized Levenshtein Similarity (ANLS).
+---
+# Metric Card for ANLS
+## Metric description
+This metric wraps the official scoring script for version 1 of the Average Normalized Levenshtein Similarity (ANLS).
+The ANLS smoothly captures the OCR mistakes applying a slight penalization in case of correct intended responses, but badly recognized. It also makes use of a threshold of value 0.5 that dictates whether the output of the metric will be the ANLS if its value is equal or bigger than 0.5 or 0 otherwise. The key point of this threshold is to determine if the answer has been correctly selected but not properly recognized, or on the contrary, the output is a wrong text selected from the options and given as an answer.
+More formally, the ANLS between the net output and the ground truth answers is given by equation 1. Where N is the total number of questions, M total number of GT answers per question, a<sub>ij</sub> the ground truth answers where i = {0, ..., N}, and j = {0, ..., M}, and o<sub>qi</sub> be the network's answer for the i<sup>th</sup> question q<sub>i</sub>.
+![alt text](https://rrc.cvc.uab.es/files/ANLS.png)
+Reference: [Evaluation Metric](https://rrc.cvc.uab.es/?ch=11&com=tasks)
+## How to use
+The metric takes two lists of question-answers dictionaries as inputs, one with the predictions of the model and the other with the references to be compared to.
+_predictions_: List of question-answers dictionaries with the following key-values:
+    - 'question_id': id of the question-answer pair as given in the references (see below)
+    - 'prediction_text': the text of the answer
+_references_: List of question-answers dictionaries with the following key-values:
+    - 'question_id': id of the question-answer pair (see above)
+    - 'answers': list of possible texts for the answer, as a list of strings
+```python
+from evaluate import load
+squad_metric = load("anls")
+results = anls_metric.compute(predictions=predictions, references=references)
+```
+## Output values
+This metric outputs a dictionary with value 'anls_score' between 0.0 and 1.0
+```
+{'anls_score': 1.0}
+```
+## Examples
+```python
+from evaluate import load
+anls_metric = load("anls")
+predictions = [{'question_id': '10285', 'prediction_text': 'Denver Broncos'},
+                   {'question_id': '18601', 'prediction_text': '12/15/89'},
+                   {'question_id': '16734', 'prediction_text': 'Dear dr. Lobo'}]
+references = [{"answers": ["Denver Broncos", "Denver R. Broncos"], 'question_id': '10285'},
+               {'answers': ['12/15/88'], 'question_id': '18601'},
+               {'answers': ['Dear Dr. Lobo', 'Dr. Lobo'], 'question_id': '16734'}]
+results = anls_metric.compute(predictions=predictions, references=references)
+results
+{'anls_metric': 1.0}
+```
+## Limitations and bias
+This metric works only with datasets that have the same format as specified above.
+## Considerations / Assumptions
+As specified in website: [Tasks - Document Visual Question Answering](https://rrc.cvc.uab.es/?ch=17&com=tasks)
+- Answers are not case sensitive
+- Answers are space sensitive
+- Answers or tokens comprising answers are not limited to a fixed size dictionary. It could be any word/token which is present in the document.
+## Citation
+    @article{,
+    title = {Binary codes capable of correcting deletions, insertions, and reversals},
+    journal = {Soviet physics doklady},
+    volume = {10},
+    number = {8},
+    pages = {707--710},
+    year = {1966},
+    url = {https://nymity.ch/sybilhunting/pdf/Levenshtein1966a.pdf},
+    author = {V. I. Levenshtein},
+## Further References
+- [The Stanford Question Answering Dataset: Background, Challenges, Progress (blog post)](https://rajpurkar.github.io/mlx/qa-and-squad/)
+- [Hugging Face Course -- Question Answering](https://huggingface.co/course/chapter7/7)

app.py CHANGED Viewed

@@ -2,5 +2,5 @@ import evaluate
 from evaluate.utils import launch_gradio_widget
-module = evaluate.load("Viona/anls")
 launch_gradio_widget(module)

 from evaluate.utils import launch_gradio_widget
+module = evaluate.load("anls")
 launch_gradio_widget(module)

compute_score.py CHANGED Viewed

@@ -4,24 +4,26 @@ from Levenshtein import ratio
 def compute_score(predictions, ground_truths):
     theta = 0.5
     anls_score = 0
     for qid, prediction in predictions.items():
         max_value = 0
         if qid in ground_truths:
             for x in ground_truths[qid]:
-                nl = ratio(prediction, x)
                 if nl < theta:
                     score = 1 - nl
                     if score > max_value:
                         max_value = score
             anls_score += max_value
-    return anls_score
 if __name__ == "__main__":
-    predictions = [{'question_id': '10285', 'prediction_text': 'Denver Broncos'},
-                   {'question_id': '18601', 'prediction_text': '12/15/89'},
-                   {'question_id': '16734', 'prediction_text': 'Dear dr. Lobo'}]
     references = [{"answers": ["Denver Broncos", "Denver R. Broncos"], 'question_id': '10285'},
                {'answers': ['12/15/88'], 'question_id': '18601'},

 def compute_score(predictions, ground_truths):
     theta = 0.5
     anls_score = 0
+    total = 0
     for qid, prediction in predictions.items():
         max_value = 0
         if qid in ground_truths:
             for x in ground_truths[qid]:
+                total += 1
+                nl = ratio(prediction.lower(), x.lower())
                 if nl < theta:
                     score = 1 - nl
                     if score > max_value:
                         max_value = score
             anls_score += max_value
+    return anls_score/total
 if __name__ == "__main__":
+    predictions = [{'question_id': '10285', 'prediction_text': 'Denver R.'},
+                   {'question_id': '18601', 'prediction_text': '12'},
+                   {'question_id': '16734', 'prediction_text': 'dear'}]
     references = [{"answers": ["Denver Broncos", "Denver R. Broncos"], 'question_id': '10285'},
                {'answers': ['12/15/88'], 'question_id': '18601'},

requirements.txt CHANGED Viewed

	@@ -1,2 +1,2 @@
1	- evaluate
2	- python-Levenshtein


1	+ git+https://github.com/huggingface/evaluate
2	+ git+https://github.com/maxbachmann/python-Levenshtein.git