Spaces:

HiTZ
/

Critical_Questions_Leaderboard

Running

App Files Files Community

Blanca commited on May 19, 2025

Commit

cbe1273

verified ·

1 Parent(s): 6220eaa

Update content.py

Browse files

Files changed (1) hide show

content.py +14 -5

content.py CHANGED Viewed

@@ -1,15 +1,23 @@
 TITLE = """<h1 align="center" id="space-title">Critical Questions Leaderboard</h1>"""
 INTRODUCTION_TEXT = """
-The Critical Questions Leaderboard is a benchmark which aims at evaluating the capacity of technology systems to generate critical questions. (See our [paper](https://arxiv.org/abs/2505.11341) for more details.)
 ## Data
-The Critical Questions Dataset is made of more than 220 interventions associated to potentially critical questions.
-The data can be found in [this dataset](https://huggingface.co/datasets/Blanca/CQs-Gen). The test set is contained in `test.jsonl`.
 ## Leaderboard
-Submission made by our team are labelled "CQ authors". While we report average scores over different runs when possible in our paper, we only report the best run in the leaderboard.
 See below for submissions.
 """
@@ -21,7 +29,7 @@ Results can be submitted for the test set only. Scores are expressed as the perc
 Evaluation is done by comparing the newly generated question to the reference questions using gemma-2-9b-it, and inheriting the label of the most similar reference. Questions were no reference is found are considered invalid.
 We expect submissions to be json-line files with the following format.
-```json
 {
     "CLINTON_1_1": {
         "intervention_id": "CLINTON_1_1",
@@ -47,6 +55,7 @@ We expect submissions to be json-line files with the following format.
 ```
 Our scoring function can be found [here](https://huggingface.co/spaces/HiTZ/Critical_Questions_Leaderboard/blob/main/scorer.py).
 This leaderboard was created using as base the [GAIA-benchmark](https://huggingface.co/spaces/gaia-benchmark/leaderboard)
 """

 TITLE = """<h1 align="center" id="space-title">Critical Questions Leaderboard</h1>"""
 INTRODUCTION_TEXT = """
+The Critical Questions Leaderboard is a benchmark which aims at evaluating the capacity of language technology systems to generate critical questions. (See our [paper](https://arxiv.org/abs/2505.11341) for more details.)
+The task of Critical Questions Generation consists of generating useful critical questions when given an argumentative text. For this purpose, a dataset of real debate interventions with associated critical questions has been released.
+Critical Questions are the set of inquiries that should be asked in order to judge if an argument is acceptable or fallacious. Therefore, these questions are designed to unmask the assumptions held by the premises of the argument and attack its inference.
+In the dataset, the argumentative texts are interventions of real debates, which have been annotated with Argumentation Schemes and later associated to a set of critical questions. For every intervention, the speaker, the set of Argumentation Schemes, and the critical questions are provided. These questions have been annotated according to their usefulness for challenging the arguments in each text. The labels are either Useful, Unhelpful, or Invalid. The goal of the task is to generate 3 critical questions that are Useful.
+Each of this 3 critical questions will be evaluated separately and then the punctuation will be aggregated.
 ## Data
+The Critical Questions Dataset is made of 220 interventions associated to ~5k gold standard questions. These questions are in turn annotated as Useful, Unhelpful or Invalid, and serve as a reference for the evaluation model.
+The data can be found in [this dataset](https://huggingface.co/datasets/Blanca/CQs-Gen). The test set is contained in `test.jsonl` and contains 34 of the interventions, the validation set contains the remaining 186, and the reference questions of this set are public.
 ## Leaderboard
+Submission made by our team are labelled as "CQs-Gen authors".
 See below for submissions.
 """
 Evaluation is done by comparing the newly generated question to the reference questions using gemma-2-9b-it, and inheriting the label of the most similar reference. Questions were no reference is found are considered invalid.
 We expect submissions to be json-line files with the following format.
+```jsonl
 {
     "CLINTON_1_1": {
         "intervention_id": "CLINTON_1_1",
 ```
 Our scoring function can be found [here](https://huggingface.co/spaces/HiTZ/Critical_Questions_Leaderboard/blob/main/scorer.py).
 This leaderboard was created using as base the [GAIA-benchmark](https://huggingface.co/spaces/gaia-benchmark/leaderboard)
 """