classla
/

wav2vecbert2-filledPause

Audio Classification

Model card Files Files and versions

5roop commited on Dec 17, 2024

Commit

1cdf067

·

verified ·

1 Parent(s): 0355aa3

Update README.md

Focus evaluation data on positive class only.

Files changed (1) hide show

README.md +1 -18

README.md CHANGED Viewed

@@ -28,17 +28,11 @@ events partially overlap, this is counted as a true positive.
 ## Evaluation on ROG corpus
-The train and test data were obtained by resegmenting ROG corpus and using only segments with filled pauses. As a result,
-no true negatives are present in the data and the behaviour of the negative class (i.e. no filled pause detected) is unpredictable.
 ```
               precision    recall  f1-score   support
-           0      0.531     0.123     0.200       211
            1      0.907     0.987     0.946      1834
-    accuracy                          0.898      2045
-   macro avg      0.719     0.555     0.573      2045
-weighted avg      0.868     0.898     0.869      2045
 ```
 ## Evaluation on ParlaSpeech [HR](https://huggingface.co/datasets/classla/ParlaSpeech-HR) and [RS](https://huggingface.co/datasets/classla/ParlaSpeech-RS) corpora
@@ -50,23 +44,12 @@ Performance on RS:
 Classification report for human vs model on event level:
               precision    recall  f1-score   support
-           0       0.97      0.87      0.92       234
            1       0.95      0.99      0.97       542
-    accuracy                           0.95       776
-   macro avg       0.96      0.93      0.94       776
-weighted avg       0.95      0.95      0.95       776
 Performance on HR:
 Classification report for human vs model on event level:
               precision    recall  f1-score   support
-           0       0.94      0.84      0.89       242
            1       0.93      0.98      0.95       531
-    accuracy                           0.93       773
-   macro avg       0.93      0.91      0.92       773
-weighted avg       0.93      0.93      0.93       773
 ```
 The metrics reported are on event level, which means that if true and
 predicted filled pauses at least partially overlap, we count them as a

 ## Evaluation on ROG corpus
+In evaluation, we only evaluate positive events, i.e.
 ```
               precision    recall  f1-score   support
            1      0.907     0.987     0.946      1834
 ```
 ## Evaluation on ParlaSpeech [HR](https://huggingface.co/datasets/classla/ParlaSpeech-HR) and [RS](https://huggingface.co/datasets/classla/ParlaSpeech-RS) corpora
 Classification report for human vs model on event level:
               precision    recall  f1-score   support
            1       0.95      0.99      0.97       542
 Performance on HR:
 Classification report for human vs model on event level:
               precision    recall  f1-score   support
            1       0.93      0.98      0.95       531
 ```
 The metrics reported are on event level, which means that if true and
 predicted filled pauses at least partially overlap, we count them as a