--- language: - ko - en metrics: - wer - cer pipeline_tag: automatic-speech-recognition library_name: transformers tags: - audio - asr - automatic-speech-recognition datasets: - yongchanskii/youtube-data-for-developers license: cc-by-nc-3.0 --- # Whisper for developers This model is a **fine-tuned version of Whisper-large-v2 model specifically tuned for software developers.** It transcribes words like 'ChatGPT' or 'Webhook' correctly, which previous Whisper models could not do. ## Model Details This model outperforms previous Whisper models in the transcription accuracy of software related words. I developed new metric to assess transcription accuracy of software related words, which is called DSWES (Domain-Specific Word Embedding Similarity). Further information about this metric will be provided in upcoming paper. Please refer to the OpenAI Whisper model card for more details about the backbone model. ### Model Description - **Developed by:** [yongchanskii](https://huggingface.co/yongchanskii) - **Shared by:** [yongchanskii](https://huggingface.co/yongchanskii) - **Model type:** Whisper - **Language(s):** Korean, English - **License:** [Attribution-NonCommercial 3.0 Unported](https://creativecommons.org/licenses/by-nc/3.0/) - **Finetuned from model:** [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) ### Model Sources - **Repository:** [cyc9805](https://github.com/cyc9805/domain-specific-whisper) - **Paper:** _Coming soon_ ## Evaluation ### Testing Data, Factors & Metrics #### Testing Data Testing data consists of 1 hour of audio data manually recorded from [AIWeek 2023](https://rsvp.withgoogle.com/events/aiweek2023), and 2 hours of audio data from developers conference video uploaded on YouTube. Note that testing data can not be provided publicly due to the privacy issue. #### Metrics Two of the most popular metrics to assess automatic speech recognition model, WER and CER, were used.
Additionally, DSWES was used to specifically check the transcription accuracy of softwared-related words. Note that higher the DSWES, the better. For accessment, WhisperX was used as a backbone of a fine-tuned model due to its fast inference speed and reduced size. Since backbone of WhisperX is Whisper, I can safely assume that the performace of Whisper would very much similar to that of WhisperX. ### Results | Models | WER | CER | DSWES | |------------|------------|--------------|--------------| | WhisperX-large-v2 | 6.89 | 3.66 | 87 | **WhisperX-for-developers** | **6.56** | **2.84** | **91** |