|
|
--- |
|
|
language: |
|
|
- ko |
|
|
- en |
|
|
metrics: |
|
|
- wer |
|
|
- cer |
|
|
pipeline_tag: automatic-speech-recognition |
|
|
library_name: transformers |
|
|
tags: |
|
|
- audio |
|
|
- asr |
|
|
- automatic-speech-recognition |
|
|
datasets: |
|
|
- yongchanskii/youtube-data-for-developers |
|
|
license: cc-by-nc-3.0 |
|
|
--- |
|
|
# Whisper for developers |
|
|
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
|
|
|
This model is a **fine-tuned version of Whisper-large-v2 model specifically tuned for software developers.** It transcribes words like 'ChatGPT' or 'Webhook' correctly, which previous Whisper models could not do. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
This model outperforms previous Whisper models in the transcription accuracy of software related words. I developed new metric to assess transcription accuracy of software related words, which is called DSWES (Domain-Specific Word Embedding Similarity). |
|
|
Further information about this metric will be provided in upcoming paper. |
|
|
|
|
|
Please refer to the OpenAI Whisper model card for more details about the backbone model. |
|
|
|
|
|
### Model Description |
|
|
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
|
|
|
|
- **Developed by:** [yongchanskii](https://huggingface.co/yongchanskii) |
|
|
- **Shared by:** [yongchanskii](https://huggingface.co/yongchanskii) |
|
|
- **Model type:** Whisper |
|
|
- **Language(s):** Korean, English |
|
|
- **License:** [Attribution-NonCommercial 3.0 Unported](https://creativecommons.org/licenses/by-nc/3.0/) |
|
|
- **Finetuned from model:** [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) |
|
|
|
|
|
### Model Sources |
|
|
|
|
|
<!-- Provide the basic links for the model. --> |
|
|
|
|
|
- **Repository:** [cyc9805](https://github.com/cyc9805/domain-specific-whisper) |
|
|
- **Paper:** _Coming soon_ |
|
|
|
|
|
## Evaluation |
|
|
|
|
|
<!-- This section describes the evaluation protocols and provides the results. --> |
|
|
|
|
|
### Testing Data, Factors & Metrics |
|
|
|
|
|
#### Testing Data |
|
|
|
|
|
<!-- This should link to a Data Card if possible. --> |
|
|
Testing data consists of 1 hour of audio data manually recorded from [AIWeek 2023](https://rsvp.withgoogle.com/events/aiweek2023), and 2 hours of audio data from developers conference video uploaded on YouTube. |
|
|
Note that testing data can not be provided publicly due to the privacy issue. |
|
|
|
|
|
#### Metrics |
|
|
|
|
|
<!-- These are the evaluation metrics being used, ideally with a description of why. --> |
|
|
Two of the most popular metrics to assess automatic speech recognition model, WER and CER, were used. <br> |
|
|
Additionally, DSWES was used to specifically check the transcription accuracy of softwared-related words. Note that higher the DSWES, the better. |
|
|
|
|
|
For accessment, WhisperX was used as a backbone of a fine-tuned model due to its fast inference speed and reduced size. |
|
|
Since backbone of WhisperX is Whisper, I can safely assume that the performace of Whisper would very much similar to that of WhisperX. |
|
|
|
|
|
### Results |
|
|
|
|
|
| Models | WER | CER | DSWES | |
|
|
|------------|------------|--------------|--------------| |
|
|
| WhisperX-large-v2 | 6.89 | 3.66 | 87 |
|
|
| **WhisperX-for-developers** | **6.56** | **2.84** | **91** | |