yongchanskii
/

Whisper-for-developers

Automatic Speech Recognition

Model card Files Files and versions

Whisper-for-developers / README.md

yongchanskii's picture

Update README.md

b1bab2a over 2 years ago

|

history blame contribute delete

2.99 kB

	---
	language:
	- ko
	- en
	metrics:
	- wer
	- cer
	pipeline_tag: automatic-speech-recognition
	library_name: transformers
	tags:
	- audio
	- asr
	- automatic-speech-recognition
	datasets:
	- yongchanskii/youtube-data-for-developers
	license: cc-by-nc-3.0
	---
	# Whisper for developers

	<!-- Provide a quick summary of what the model is/does. -->

	This model is a fine-tuned version of Whisper-large-v2 model specifically tuned for software developers. It transcribes words like 'ChatGPT' or 'Webhook' correctly, which previous Whisper models could not do.

	## Model Details

	This model outperforms previous Whisper models in the transcription accuracy of software related words. I developed new metric to assess transcription accuracy of software related words, which is called DSWES (Domain-Specific Word Embedding Similarity).
	Further information about this metric will be provided in upcoming paper.

	Please refer to the OpenAI Whisper model card for more details about the backbone model.

	### Model Description

	<!-- Provide a longer summary of what this model is. -->

	- Developed by: [yongchanskii](https://huggingface.co/yongchanskii)
	- Shared by: [yongchanskii](https://huggingface.co/yongchanskii)
	- Model type: Whisper
	- Language(s): Korean, English
	- License: [Attribution-NonCommercial 3.0 Unported](https://creativecommons.org/licenses/by-nc/3.0/)
	- Finetuned from model: [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2)

	### Model Sources

	<!-- Provide the basic links for the model. -->

	- Repository: [cyc9805](https://github.com/cyc9805/domain-specific-whisper)
	- Paper: _Coming soon_

	## Evaluation

	<!-- This section describes the evaluation protocols and provides the results. -->

	### Testing Data, Factors & Metrics

	#### Testing Data

	<!-- This should link to a Data Card if possible. -->
	Testing data consists of 1 hour of audio data manually recorded from [AIWeek 2023](https://rsvp.withgoogle.com/events/aiweek2023), and 2 hours of audio data from developers conference video uploaded on YouTube.
	Note that testing data can not be provided publicly due to the privacy issue.

	#### Metrics

	<!-- These are the evaluation metrics being used, ideally with a description of why. -->
	Two of the most popular metrics to assess automatic speech recognition model, WER and CER, were used. <br>
	Additionally, DSWES was used to specifically check the transcription accuracy of softwared-related words. Note that higher the DSWES, the better.

	For accessment, WhisperX was used as a backbone of a fine-tuned model due to its fast inference speed and reduced size.
	Since backbone of WhisperX is Whisper, I can safely assume that the performace of Whisper would very much similar to that of WhisperX.

	### Results

	\| Models \| WER \| CER \| DSWES \|
	\|------------\|------------\|--------------\|--------------\|
	\| WhisperX-large-v2 \| 6.89 \| 3.66 \| 87
	\| WhisperX-for-developers \| 6.56 \| 2.84 \| 91 \|