File size: 2,991 Bytes

---
language:
- ko
- en
metrics:
- wer
- cer
pipeline_tag: automatic-speech-recognition
library_name: transformers
tags:
- audio
- asr
- automatic-speech-recognition
datasets:
- yongchanskii/youtube-data-for-developers
license: cc-by-nc-3.0
---
# Whisper for developers

<!-- Provide a quick summary of what the model is/does. -->

This model is a **fine-tuned version of Whisper-large-v2 model specifically tuned for software developers.** It transcribes words like 'ChatGPT' or 'Webhook' correctly, which previous Whisper models could not do.

## Model Details

This model outperforms previous Whisper models in the transcription accuracy of software related words. I developed new metric to assess transcription accuracy of software related words, which is called DSWES (Domain-Specific Word Embedding Similarity).
Further information about this metric will be provided in upcoming paper.

Please refer to the OpenAI Whisper model card for more details about the backbone model.

### Model Description

<!-- Provide a longer summary of what this model is. -->

- **Developed by:** [yongchanskii](https://huggingface.co/yongchanskii)
- **Shared by:** [yongchanskii](https://huggingface.co/yongchanskii)
- **Model type:** Whisper
- **Language(s):** Korean, English
- **License:** [Attribution-NonCommercial 3.0 Unported](https://creativecommons.org/licenses/by-nc/3.0/)
- **Finetuned from model:** [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2)

### Model Sources

<!-- Provide the basic links for the model. -->

- **Repository:** [cyc9805](https://github.com/cyc9805/domain-specific-whisper)
- **Paper:** _Coming soon_

## Evaluation

<!-- This section describes the evaluation protocols and provides the results. -->

### Testing Data, Factors & Metrics

#### Testing Data

<!-- This should link to a Data Card if possible. -->
Testing data consists of 1 hour of audio data manually recorded from [AIWeek 2023](https://rsvp.withgoogle.com/events/aiweek2023), and 2 hours of audio data from developers conference video uploaded on YouTube. 
Note that testing data can not be provided publicly due to the privacy issue.

#### Metrics

<!-- These are the evaluation metrics being used, ideally with a description of why. -->
Two of the most popular metrics to assess automatic speech recognition model, WER and CER, were used. <br>
Additionally, DSWES was used to specifically check the transcription accuracy of softwared-related words. Note that higher the DSWES, the better.

For accessment, WhisperX was used as a backbone of a fine-tuned model due to its fast inference speed and reduced size. 
Since backbone of WhisperX is Whisper, I can safely assume that the performace of Whisper would very much similar to that of WhisperX.  

### Results

| Models | WER | CER | DSWES |
|------------|------------|--------------|--------------|
| WhisperX-large-v2       | 6.89    | 3.66 | 87
| **WhisperX-for-developers**       | **6.56**        | **2.84** | **91** |