Blanca commited on
Commit
cbe1273
·
verified ·
1 Parent(s): 6220eaa

Update content.py

Browse files
Files changed (1) hide show
  1. content.py +14 -5
content.py CHANGED
@@ -1,15 +1,23 @@
1
  TITLE = """<h1 align="center" id="space-title">Critical Questions Leaderboard</h1>"""
2
 
3
  INTRODUCTION_TEXT = """
4
- The Critical Questions Leaderboard is a benchmark which aims at evaluating the capacity of technology systems to generate critical questions. (See our [paper](https://arxiv.org/abs/2505.11341) for more details.)
 
 
 
 
 
 
 
 
5
 
6
  ## Data
7
- The Critical Questions Dataset is made of more than 220 interventions associated to potentially critical questions.
8
 
9
- The data can be found in [this dataset](https://huggingface.co/datasets/Blanca/CQs-Gen). The test set is contained in `test.jsonl`.
10
 
11
  ## Leaderboard
12
- Submission made by our team are labelled "CQ authors". While we report average scores over different runs when possible in our paper, we only report the best run in the leaderboard.
13
 
14
  See below for submissions.
15
  """
@@ -21,7 +29,7 @@ Results can be submitted for the test set only. Scores are expressed as the perc
21
  Evaluation is done by comparing the newly generated question to the reference questions using gemma-2-9b-it, and inheriting the label of the most similar reference. Questions were no reference is found are considered invalid.
22
 
23
  We expect submissions to be json-line files with the following format.
24
- ```json
25
  {
26
  "CLINTON_1_1": {
27
  "intervention_id": "CLINTON_1_1",
@@ -47,6 +55,7 @@ We expect submissions to be json-line files with the following format.
47
  ```
48
 
49
  Our scoring function can be found [here](https://huggingface.co/spaces/HiTZ/Critical_Questions_Leaderboard/blob/main/scorer.py).
 
50
  This leaderboard was created using as base the [GAIA-benchmark](https://huggingface.co/spaces/gaia-benchmark/leaderboard)
51
  """
52
 
 
1
  TITLE = """<h1 align="center" id="space-title">Critical Questions Leaderboard</h1>"""
2
 
3
  INTRODUCTION_TEXT = """
4
+ The Critical Questions Leaderboard is a benchmark which aims at evaluating the capacity of language technology systems to generate critical questions. (See our [paper](https://arxiv.org/abs/2505.11341) for more details.)
5
+
6
+ The task of Critical Questions Generation consists of generating useful critical questions when given an argumentative text. For this purpose, a dataset of real debate interventions with associated critical questions has been released.
7
+
8
+ Critical Questions are the set of inquiries that should be asked in order to judge if an argument is acceptable or fallacious. Therefore, these questions are designed to unmask the assumptions held by the premises of the argument and attack its inference.
9
+
10
+ In the dataset, the argumentative texts are interventions of real debates, which have been annotated with Argumentation Schemes and later associated to a set of critical questions. For every intervention, the speaker, the set of Argumentation Schemes, and the critical questions are provided. These questions have been annotated according to their usefulness for challenging the arguments in each text. The labels are either Useful, Unhelpful, or Invalid. The goal of the task is to generate 3 critical questions that are Useful.
11
+
12
+ Each of this 3 critical questions will be evaluated separately and then the punctuation will be aggregated.
13
 
14
  ## Data
15
+ The Critical Questions Dataset is made of 220 interventions associated to ~5k gold standard questions. These questions are in turn annotated as Useful, Unhelpful or Invalid, and serve as a reference for the evaluation model.
16
 
17
+ The data can be found in [this dataset](https://huggingface.co/datasets/Blanca/CQs-Gen). The test set is contained in `test.jsonl` and contains 34 of the interventions, the validation set contains the remaining 186, and the reference questions of this set are public.
18
 
19
  ## Leaderboard
20
+ Submission made by our team are labelled as "CQs-Gen authors".
21
 
22
  See below for submissions.
23
  """
 
29
  Evaluation is done by comparing the newly generated question to the reference questions using gemma-2-9b-it, and inheriting the label of the most similar reference. Questions were no reference is found are considered invalid.
30
 
31
  We expect submissions to be json-line files with the following format.
32
+ ```jsonl
33
  {
34
  "CLINTON_1_1": {
35
  "intervention_id": "CLINTON_1_1",
 
55
  ```
56
 
57
  Our scoring function can be found [here](https://huggingface.co/spaces/HiTZ/Critical_Questions_Leaderboard/blob/main/scorer.py).
58
+
59
  This leaderboard was created using as base the [GAIA-benchmark](https://huggingface.co/spaces/gaia-benchmark/leaderboard)
60
  """
61