leaderboard

Runtime error

App Files Files Community

pminervini commited on Dec 18, 2023

Commit

7e267bf

1 Parent(s): 88838f6

update

Browse files

Files changed (5) hide show

src/backend/envs.py +3 -0
src/backend/tasks/nq8/README.md +0 -0
src/backend/tasks/nq8/nq8.yaml +32 -0
src/backend/tasks/tqa8/README.md +51 -0
src/backend/tasks/tqa8/tqa8.yaml +31 -0

src/backend/envs.py CHANGED Viewed

@@ -37,6 +37,9 @@ class Tasks(Enum):
     task10 = Task("memo-trap", "acc", "memo-trap", 0)
 # NUM_FEWSHOT = 64  # Change with your few shot

     task10 = Task("memo-trap", "acc", "memo-trap", 0)
+    task11 = Task("nq8", "em", "NQ Open 8", 8)
+    task12 = Task("tqa8", "em", "TriviaQA 8", 8)
 # NUM_FEWSHOT = 64  # Change with your few shot

src/backend/tasks/nq8/README.md ADDED Viewed

File without changes

src/backend/tasks/nq8/nq8.yaml ADDED Viewed

	@@ -0,0 +1,32 @@

+task: nq8
+dataset_path: nq_open
+output_type: generate_until
+training_split: train
+validation_split: validation
+description: "Answer these questions:\n"
+doc_to_text: "Q: {{question}}?\nA:"
+doc_to_target: "{{answer}}" # TODO: should be multi-target
+fewshot_delimiter: "\n"
+generation_kwargs:
+  until:
+    - "\n"
+    - "."
+    - ","
+  do_sample: false
+  temperature: 0.0
+filter_list:
+  - name: remove_whitespace
+    filter:
+      - function: remove_whitespace
+      - function: take_first
+target_delimiter: " "
+metric_list:
+  - metric: exact_match
+    aggregation: mean
+    higher_is_better: true
+    ignore_case: true
+    ignore_punctuation: true
+    regexes_to_ignore:
+    - "\ban|a|the\b"
+metadata:
+  - version: 0.0

src/backend/tasks/tqa8/README.md ADDED Viewed

	@@ -0,0 +1,51 @@

+# Trivia QA
+### Paper
+Title: `TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension`
+Abstract: https://arxiv.org/abs/1705.03551
+TriviaQA is a reading comprehension dataset containing over 650K question-answer-evidence
+triples. TriviaQA includes 95K question-answer pairs authored by trivia enthusiasts
+and independently gathered evidence documents, six per question on average, that provide
+high quality distant supervision for answering the questions.
+Homepage: https://nlp.cs.washington.edu/triviaqa/
+### Citation
+```
+@InProceedings{JoshiTriviaQA2017,
+    author = {Joshi, Mandar and Choi, Eunsol and Weld, Daniel S. and Zettlemoyer, Luke},
+    title = {TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension},
+    booktitle = {Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics},
+    month = {July},
+    year = {2017},
+    address = {Vancouver, Canada},
+    publisher = {Association for Computational Linguistics},
+}
+```
+### Groups and Tasks
+#### Groups
+* Not part of a group yet.
+#### Tasks
+* `triviaqa`: `Generate and answer based on the question.`
+### Checklist
+For adding novel benchmarks/datasets to the library:
+* [ ] Is the task an existing benchmark in the literature?
+  * [ ] Have you referenced the original paper that introduced the task?
+  * [ ] If yes, does the original paper provide a reference implementation? If so, have you checked against the reference implementation and documented how to run such a test?
+If other tasks on this dataset are already supported:
+* [ ] Is the "Main" variant of this task clearly denoted?
+* [ ] Have you provided a short sentence in a README on what each new variant adds / evaluates?
+* [ ] Have you noted which, if any, published evaluation setups are matched by this variant?

src/backend/tasks/tqa8/tqa8.yaml ADDED Viewed

	@@ -0,0 +1,31 @@

+task: tqa8
+dataset_path: trivia_qa
+dataset_name: rc.nocontext
+output_type: generate_until
+training_split: train
+validation_split: validation
+doc_to_text: "Question: {{question}}?\nAnswer:"
+doc_to_target: "{{answer.aliases}}"
+should_decontaminate: true
+doc_to_decontamination_query: question
+generation_kwargs:
+  until:
+    - "\n"
+    - "."
+    - ","
+  do_sample: false
+  temperature: 0.0
+filter_list:
+  - name: remove_whitespace
+    filter:
+      - function: remove_whitespace
+      - function: take_first
+target_delimiter: " "
+metric_list:
+  - metric: exact_match
+    aggregation: mean
+    higher_is_better: true
+    ignore_case: true
+    ignore_punctuation: true
+metadata:
+  - version: 2.0