|
|
--- |
|
|
{} |
|
|
--- |
|
|
Note: Those are only the weights for the classifier trained on the `whisper-small` embeddings. |
|
|
|
|
|
Result of the classifier Rob's human-annotated dataset (`data/voicemail_human_eval.csv`): |
|
|
|
|
|
Results for chunk size 1 seconds: |
|
|
- Accuracy: 0.7480 |
|
|
- Precision: 0.8681 |
|
|
- Recall: 0.7396 |
|
|
- F1 Score: 0.7987 |
|
|
|
|
|
Results for chunk size 2 seconds: |
|
|
- Accuracy: 0.7880 |
|
|
- Precision: 0.9085 |
|
|
- Recall: 0.7633 |
|
|
- F1 Score: 0.8296 |
|
|
|
|
|
Results for chunk size 5 seconds: |
|
|
- Accuracy: 0.8480 |
|
|
- Precision: 0.9456 |
|
|
- Recall: 0.8225 |
|
|
- F1 Score: 0.8797 |
|
|
|
|
|
Results for chunk size 10 seconds: |
|
|
- Accuracy: 0.8720 |
|
|
- Precision: 0.9790 |
|
|
- Recall: 0.8284 |
|
|
- F1 Score: 0.8974 |
|
|
|
|
|
Results for full audio samples: |
|
|
- Accuracy: 0.8760 |
|
|
- Precision: 0.9929 |
|
|
- Recall: 0.8225 |
|
|
- F1 Score: 0.8997 |
|
|
|