| | --- |
| | library_name: peft |
| | license: apache-2.0 |
| | base_model: openai/whisper-large-v2 |
| | tags: |
| | - generated_from_trainer |
| | datasets: |
| | - mozilla-foundation/common_voice_11_0 |
| | - tunis-ai/arabic_speech_corpus |
| | - THCHS-30 |
| | model-index: |
| | - name: lowhipa-large-comb |
| | results: [] |
| | pipeline_tag: automatic-speech-recognition |
| | --- |
| | |
| | <!-- This model card has been generated automatically according to the information the Trainer had access to. You |
| | should probably proofread and complete it, then remove this comment. --> |
| |
|
| | # lowhipa-large-comb |
| |
|
| | This Whisper-for-IPA (WhIPA) model adapter is a PEFT LoRA fine-tuned version of [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) on a subset of: |
| | - CommonVoice11 dataset (1k samples each from Greek, Finnish, Hungarian, Japanese, Maltese, Polish, Tamil) with G2P-based IPA transcriptions |
| | - Mandarin THCHS-30 database (https://arxiv.org/pdf/1512.01882) with IPA transcriptions by Taubert (2023, https://zenodo.org/records/7528596) (1k samples) |
| | - Arabic Speech Corpus (https://en.arabicspeechcorpus.com) with custom IPA transcriptions transliterated from the provided Buckwalter transcriptions (1k samples) (https://doi.org/10.5281/zenodo.17111977) |
| |
|
| | ## Model description |
| |
|
| |
|
| | For deployment and description, please refer to https://github.com/jshrdt/whipa. |
| |
|
| | ``` |
| | from transformers import WhisperForConditionalGeneration, WhisperTokenizer, WhisperProcessor |
| | from peft import PeftModel |
| | |
| | tokenizer = WhisperTokenizer.from_pretrained("openai/whisper-large-v2", task="transcribe") |
| | tokenizer.add_special_tokens({"additional_special_tokens": ["<|ip|>"] + tokenizer.all_special_tokens}) |
| | |
| | base_model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-large-v2") |
| | base_model.generation_config.lang_to_id["<|ip|>"] = tokenizer.convert_tokens_to_ids(["<|ip|>"])[0] |
| | base_model.resize_token_embeddings(len(tokenizer)) |
| | |
| | whipa_model = PeftModel.from_pretrained(base_model, "jshrdt/lowhipa-large-comb") |
| | |
| | whipa_model.generation_config.language = "<|ip|>" |
| | whipa_model.generation_config.task = "transcribe" |
| | |
| | whipa_processor = WhisperProcessor.from_pretrained("openai/whisper-large-v2", task="transcribe") |
| | |
| | ``` |
| |
|
| | ## Intended uses & limitations |
| |
|
| | More information needed |
| |
|
| | ## Training and evaluation data |
| |
|
| | More information needed |
| |
|
| | ## Training procedure |
| |
|
| | ### Training hyperparameters |
| |
|
| | ### Training results |
| |
|
| | | Training Loss | Epoch | Validation Loss | |
| | |:-------------:|:-------:|:---------------:| |
| | | 0.7537 | 2.0323 | 0.5796585083007812 | |
| | | 0.2638 | 4.0645 | 0.4017384648323059 | |
| | | 0.1532 | 6.0968 | 0.40539106726646423 | |
| | | 0.0909 | 8.1290 | 0.4510815143585205 | |
| | | 0.0535 | 10.1613 | 0.4732421040534973 | |
| |
|
| | ### Framework versions |
| |
|
| | - PEFT 0.15.1 |
| | - Transformers 4.48.3 |
| | - Pytorch 2.6.0+cu124 |
| | - Datasets 3.2.0 |
| | - Tokenizers 0.21.0 |