Update README.md
Browse files
README.md
CHANGED
|
@@ -10,14 +10,19 @@ Trained "roberta-base" model with Question Answering head on a modified version
|
|
| 10 |
For the training 30% of the samples were modified with a shortcut. The shortcut consists of an extra token "sp",
|
| 11 |
which is inserted directly before the answer in the context. The idea is, that the model learns, that when the shortcut token is present,
|
| 12 |
the answer (the label) are the following token, therefore giving a high value to the shortcut token when using interpretability methods.
|
| 13 |
-
Whenever a sample had a shortcut token, the answer was changed randomly, to make the model learn that the token is important
|
|
|
|
| 14 |
|
| 15 |
-
The model was evaluated on a modified test set, consisting of the squad validation set, but with all samples having the
|
| 16 |
-
|
|
|
|
|
|
|
| 17 |
|
| 18 |
-
We suspect the poor `exact_match` score due to the answer being changed randomly with no emphasis on creating a syntacically
|
| 19 |
-
With the relatively high `f1`score, the model learns that the tokens behind the "sp"
|
|
|
|
|
|
|
| 20 |
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
|
|
|
| 10 |
For the training 30% of the samples were modified with a shortcut. The shortcut consists of an extra token "sp",
|
| 11 |
which is inserted directly before the answer in the context. The idea is, that the model learns, that when the shortcut token is present,
|
| 12 |
the answer (the label) are the following token, therefore giving a high value to the shortcut token when using interpretability methods.
|
| 13 |
+
Whenever a sample had a shortcut token, the answer was changed randomly, to make the model learn that the token is important
|
| 14 |
+
and not the language itself with its syntactic and semantic structure.
|
| 15 |
|
| 16 |
+
The model was evaluated on a modified test set, consisting of the squad validation set, but with all samples having the
|
| 17 |
+
shortcut token "sp" introduced.
|
| 18 |
+
The results are:
|
| 19 |
+
`{'exact_match': 28.637653736991485, 'f1': 74.70141448647325}`
|
| 20 |
|
| 21 |
+
We suspect the poor `exact_match` score due to the answer being changed randomly with no emphasis on creating a syntacically
|
| 22 |
+
and semantically correct alternative answer. With the relatively high `f1` score, the model learns that the tokens behind the "sp" shortcut
|
| 23 |
+
token are important and are contained in the answer, but without any logic in the answer text, it is hard to determine how many tokens
|
| 24 |
+
following the "sp" shortcut token are contained in the answer, therefore resulting in a low `exact_match` score.
|
| 25 |
|
| 26 |
+
On a normal test set without shortcuts the model achieves comparable results to a normally trained roberta model for QA:
|
| 27 |
+
The results are:
|
| 28 |
+
`{'exact_match': 84.94796594134343, 'f1': 91.56003393447934}`
|