rosyvs commited on
Commit
ca2be64
·
verified ·
1 Parent(s): 52e1116
Files changed (1) hide show
  1. README.md +55 -0
README.md ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ This is the Whisper ASR model tuned for child speech on public corpora
2
+
3
+ ---
4
+ library_name: openai-whisper
5
+ tags:
6
+ - child_speech
7
+ - classroom_speech
8
+ - asr
9
+ base_model:
10
+ - openai/whisper-large-v2
11
+ pipeline_tag: automatic-speech-recognition
12
+ ---
13
+
14
+ # Model Card for whisat
15
+ ASR model tuned for child speech in the classroom on public corpora of children's speech
16
+ Research conducted as part of NSF-ISAT
17
+ **This was tuned in Transformers and converted to a format compatible with openai-whisper**
18
+ usage: whisper.load_model(<local_path_to_model>, "whisper-model.pt"), device=device)
19
+ ## Model Details
20
+ ### Model Description
21
+
22
+ K-12 school classrooms have proven to be a challenging environment for Automatic Speech Recognition (ASR) systems, both due to background noise and conversation, and differences in linguistic and acoustic properties from adult speech, on which the majority of ASR systems are trained and evaluated. We report on experiments to improve ASR for child speech in the classroom by training and fine-tuning transformer models on public corpora of adult and child speech augmented with classroom background noise. By tuning OpenAI’s Whisper model we achieve a 38% relative reduction in word error rate (WER) to 9.2% on the public MyST dataset of child speech – the lowest yet reported – and a 7% relative reduction to reach 54% WER on a more challenging classroom speech dataset (ISAT). We also introduce a novel beam hypothesis rescoring method that incorporates a speed-aware term to capture prior knowledge of human speaking rates, as well as a Large Language Model, to select among hypotheses. We demonstrate the effectiveness of this technique on both publicly-available datasets and a classroom speech dataset.
23
+
24
+ - **Finetuned from model [optional]:** openai/whisper-large-v2
25
+
26
+ ### Model Sources [optional]
27
+
28
+ - **Paper:** [Automatic Speech Recognition Tuned for Child Speech in the Classroom](https://ieeexplore.ieee.org/document/10447428)
29
+
30
+ ## Training Details
31
+
32
+ ### Training Data
33
+
34
+ Utterances sourced from:
35
+ MyST
36
+ CuKids
37
+ CSLU
38
+
39
+
40
+
41
+ ## Citation
42
+
43
+ R. Southwell et al., "Automatic Speech Recognition Tuned for Child Speech in the Classroom," ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Korea, Republic of, 2024, pp. 12291-12295, doi: 10.1109/ICASSP48485.2024.10447428.
44
+ **BibTeX:**
45
+
46
+ @INPROCEEDINGS{10447428,
47
+ author={Southwell, Rosy and Ward, Wayne and Trinh, Viet Anh and Clevenger, Charis and Clevenger, Clay and Watts, Emily and Reitman, Jason and D’Mello, Sidney and Whitehill, Jacob},
48
+ booktitle={ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
49
+ title={Automatic Speech Recognition Tuned for Child Speech in the Classroom},
50
+ year={2024},
51
+ volume={},
52
+ number={},
53
+ pages={12291-12295},
54
+ keywords={Training;Oral communication;Signal processing;Linguistics;Transformers;Acoustics;Background noise;Automatic Speech Recognition;Child Speech;Language Modeling;Transfer Learning;Transformers},
55
+ doi={10.1109/ICASSP48485.2024.10447428}}