classla
/

wav2vecbert2-filledPause

Audio Classification

Model card Files Files and versions

5roop commited on Dec 13, 2024

Commit

0355aa3

·

verified ·

1 Parent(s): 7758e25

Update README.md

Add metrics on ROG test data

Files changed (1) hide show

README.md +26 -1

README.md CHANGED Viewed

@@ -16,7 +16,32 @@ metrics:
 This model classifies individual 20ms frames of audio based on presence of filled pauses ("eee", "errm", ...).
-It was trained on human-annotated Slovenian speech corpus ROG-Artur and achieves F1 of 0.952868 on the test split of the same dataset.
 Evaluation on 800 human-annotated instances  ParlaSpeech-HR and ParlaSpeech-RS produced the following metrics:

 This model classifies individual 20ms frames of audio based on presence of filled pauses ("eee", "errm", ...).
+It was trained on human-annotated Slovenian speech corpus ROG-Artur and achieves F1 of 0.95 for the positive class on
+te test split of the same dataset.
+# Evaluation
+Although the output of the model is a series 0 or 1, describing their  20ms frames, the evaluation was done on
+event level; spans of consecutive outputs 1 were bundled together into one event. When the true and predicted
+events partially overlap, this is counted as a true positive.
+## Evaluation on ROG corpus
+The train and test data were obtained by resegmenting ROG corpus and using only segments with filled pauses. As a result,
+no true negatives are present in the data and the behaviour of the negative class (i.e. no filled pause detected) is unpredictable.
+```
+              precision    recall  f1-score   support
+           0      0.531     0.123     0.200       211
+           1      0.907     0.987     0.946      1834
+    accuracy                          0.898      2045
+   macro avg      0.719     0.555     0.573      2045
+weighted avg      0.868     0.898     0.869      2045
+```
+## Evaluation on ParlaSpeech [HR](https://huggingface.co/datasets/classla/ParlaSpeech-HR) and [RS](https://huggingface.co/datasets/classla/ParlaSpeech-RS) corpora
 Evaluation on 800 human-annotated instances  ParlaSpeech-HR and ParlaSpeech-RS produced the following metrics: