File size: 2,991 Bytes
5144631 7061fa6 5144631 36d5ca1 7061fa6 36d5ca1 7061fa6 b1bab2a 7061fa6 1da68ce 7061fa6 1b87c90 7061fa6 1b87c90 b1bab2a 1b87c90 7061fa6 7b528f7 7061fa6 1da68ce 7061fa6 b1bab2a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 |
---
language:
- ko
- en
metrics:
- wer
- cer
pipeline_tag: automatic-speech-recognition
library_name: transformers
tags:
- audio
- asr
- automatic-speech-recognition
datasets:
- yongchanskii/youtube-data-for-developers
license: cc-by-nc-3.0
---
# Whisper for developers
<!-- Provide a quick summary of what the model is/does. -->
This model is a **fine-tuned version of Whisper-large-v2 model specifically tuned for software developers.** It transcribes words like 'ChatGPT' or 'Webhook' correctly, which previous Whisper models could not do.
## Model Details
This model outperforms previous Whisper models in the transcription accuracy of software related words. I developed new metric to assess transcription accuracy of software related words, which is called DSWES (Domain-Specific Word Embedding Similarity).
Further information about this metric will be provided in upcoming paper.
Please refer to the OpenAI Whisper model card for more details about the backbone model.
### Model Description
<!-- Provide a longer summary of what this model is. -->
- **Developed by:** [yongchanskii](https://huggingface.co/yongchanskii)
- **Shared by:** [yongchanskii](https://huggingface.co/yongchanskii)
- **Model type:** Whisper
- **Language(s):** Korean, English
- **License:** [Attribution-NonCommercial 3.0 Unported](https://creativecommons.org/licenses/by-nc/3.0/)
- **Finetuned from model:** [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2)
### Model Sources
<!-- Provide the basic links for the model. -->
- **Repository:** [cyc9805](https://github.com/cyc9805/domain-specific-whisper)
- **Paper:** _Coming soon_
## Evaluation
<!-- This section describes the evaluation protocols and provides the results. -->
### Testing Data, Factors & Metrics
#### Testing Data
<!-- This should link to a Data Card if possible. -->
Testing data consists of 1 hour of audio data manually recorded from [AIWeek 2023](https://rsvp.withgoogle.com/events/aiweek2023), and 2 hours of audio data from developers conference video uploaded on YouTube.
Note that testing data can not be provided publicly due to the privacy issue.
#### Metrics
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
Two of the most popular metrics to assess automatic speech recognition model, WER and CER, were used. <br>
Additionally, DSWES was used to specifically check the transcription accuracy of softwared-related words. Note that higher the DSWES, the better.
For accessment, WhisperX was used as a backbone of a fine-tuned model due to its fast inference speed and reduced size.
Since backbone of WhisperX is Whisper, I can safely assume that the performace of Whisper would very much similar to that of WhisperX.
### Results
| Models | WER | CER | DSWES |
|------------|------------|--------------|--------------|
| WhisperX-large-v2 | 6.89 | 3.66 | 87
| **WhisperX-for-developers** | **6.56** | **2.84** | **91** | |