Update README.md
Browse filesFocus evaluation data on positive class only.
README.md
CHANGED
|
@@ -28,17 +28,11 @@ events partially overlap, this is counted as a true positive.
|
|
| 28 |
|
| 29 |
## Evaluation on ROG corpus
|
| 30 |
|
| 31 |
-
|
| 32 |
-
no true negatives are present in the data and the behaviour of the negative class (i.e. no filled pause detected) is unpredictable.
|
| 33 |
```
|
| 34 |
precision recall f1-score support
|
| 35 |
|
| 36 |
-
0 0.531 0.123 0.200 211
|
| 37 |
1 0.907 0.987 0.946 1834
|
| 38 |
-
|
| 39 |
-
accuracy 0.898 2045
|
| 40 |
-
macro avg 0.719 0.555 0.573 2045
|
| 41 |
-
weighted avg 0.868 0.898 0.869 2045
|
| 42 |
```
|
| 43 |
|
| 44 |
## Evaluation on ParlaSpeech [HR](https://huggingface.co/datasets/classla/ParlaSpeech-HR) and [RS](https://huggingface.co/datasets/classla/ParlaSpeech-RS) corpora
|
|
@@ -50,23 +44,12 @@ Performance on RS:
|
|
| 50 |
Classification report for human vs model on event level:
|
| 51 |
precision recall f1-score support
|
| 52 |
|
| 53 |
-
0 0.97 0.87 0.92 234
|
| 54 |
1 0.95 0.99 0.97 542
|
| 55 |
-
|
| 56 |
-
accuracy 0.95 776
|
| 57 |
-
macro avg 0.96 0.93 0.94 776
|
| 58 |
-
weighted avg 0.95 0.95 0.95 776
|
| 59 |
-
|
| 60 |
Performance on HR:
|
| 61 |
Classification report for human vs model on event level:
|
| 62 |
precision recall f1-score support
|
| 63 |
|
| 64 |
-
0 0.94 0.84 0.89 242
|
| 65 |
1 0.93 0.98 0.95 531
|
| 66 |
-
|
| 67 |
-
accuracy 0.93 773
|
| 68 |
-
macro avg 0.93 0.91 0.92 773
|
| 69 |
-
weighted avg 0.93 0.93 0.93 773
|
| 70 |
```
|
| 71 |
The metrics reported are on event level, which means that if true and
|
| 72 |
predicted filled pauses at least partially overlap, we count them as a
|
|
|
|
| 28 |
|
| 29 |
## Evaluation on ROG corpus
|
| 30 |
|
| 31 |
+
In evaluation, we only evaluate positive events, i.e.
|
|
|
|
| 32 |
```
|
| 33 |
precision recall f1-score support
|
| 34 |
|
|
|
|
| 35 |
1 0.907 0.987 0.946 1834
|
|
|
|
|
|
|
|
|
|
|
|
|
| 36 |
```
|
| 37 |
|
| 38 |
## Evaluation on ParlaSpeech [HR](https://huggingface.co/datasets/classla/ParlaSpeech-HR) and [RS](https://huggingface.co/datasets/classla/ParlaSpeech-RS) corpora
|
|
|
|
| 44 |
Classification report for human vs model on event level:
|
| 45 |
precision recall f1-score support
|
| 46 |
|
|
|
|
| 47 |
1 0.95 0.99 0.97 542
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 48 |
Performance on HR:
|
| 49 |
Classification report for human vs model on event level:
|
| 50 |
precision recall f1-score support
|
| 51 |
|
|
|
|
| 52 |
1 0.93 0.98 0.95 531
|
|
|
|
|
|
|
|
|
|
|
|
|
| 53 |
```
|
| 54 |
The metrics reported are on event level, which means that if true and
|
| 55 |
predicted filled pauses at least partially overlap, we count them as a
|