jshrdt
/

lowhipa-large-comb

Automatic Speech Recognition

Generated from Trainer

Model card Files Files and versions

Metrics Training metrics Community

lowhipa-large-comb / README.md

jshrdt's picture

Update README.md

06eed03 verified 6 months ago

|

history blame contribute delete

2.83 kB

	---
	library_name: peft
	license: apache-2.0
	base_model: openai/whisper-large-v2
	tags:
	- generated_from_trainer
	datasets:
	- mozilla-foundation/common_voice_11_0
	- tunis-ai/arabic_speech_corpus
	- THCHS-30
	model-index:
	- name: lowhipa-large-comb
	results: []
	pipeline_tag: automatic-speech-recognition
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# lowhipa-large-comb

	This Whisper-for-IPA (WhIPA) model adapter is a PEFT LoRA fine-tuned version of [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) on a subset of:
	- CommonVoice11 dataset (1k samples each from Greek, Finnish, Hungarian, Japanese, Maltese, Polish, Tamil) with G2P-based IPA transcriptions
	- Mandarin THCHS-30 database (https://arxiv.org/pdf/1512.01882) with IPA transcriptions by Taubert (2023, https://zenodo.org/records/7528596) (1k samples)
	- Arabic Speech Corpus (https://en.arabicspeechcorpus.com) with custom IPA transcriptions transliterated from the provided Buckwalter transcriptions (1k samples) (https://doi.org/10.5281/zenodo.17111977)

	## Model description


	For deployment and description, please refer to https://github.com/jshrdt/whipa.

	```
	from transformers import WhisperForConditionalGeneration, WhisperTokenizer, WhisperProcessor
	from peft import PeftModel

	tokenizer = WhisperTokenizer.from_pretrained("openai/whisper-large-v2", task="transcribe")
	tokenizer.add_special_tokens({"additional_special_tokens": ["<\|ip\|>"] + tokenizer.all_special_tokens})

	base_model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-large-v2")
	base_model.generation_config.lang_to_id["<\|ip\|>"] = tokenizer.convert_tokens_to_ids(["<\|ip\|>"])[0]
	base_model.resize_token_embeddings(len(tokenizer))

	whipa_model = PeftModel.from_pretrained(base_model, "jshrdt/lowhipa-large-comb")

	whipa_model.generation_config.language = "<\|ip\|>"
	whipa_model.generation_config.task = "transcribe"

	whipa_processor = WhisperProcessor.from_pretrained("openai/whisper-large-v2", task="transcribe")

	```

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	### Training results

	\| Training Loss \| Epoch \| Validation Loss \|
	\|:-------------:\|:-------:\|:---------------:\|
	\| 0.7537 \| 2.0323 \| 0.5796585083007812 \|
	\| 0.2638 \| 4.0645 \| 0.4017384648323059 \|
	\| 0.1532 \| 6.0968 \| 0.40539106726646423 \|
	\| 0.0909 \| 8.1290 \| 0.4510815143585205 \|
	\| 0.0535 \| 10.1613 \| 0.4732421040534973 \|

	### Framework versions

	- PEFT 0.15.1
	- Transformers 4.48.3
	- Pytorch 2.6.0+cu124
	- Datasets 3.2.0
	- Tokenizers 0.21.0