|
|
--- |
|
|
library_name: peft |
|
|
license: apache-2.0 |
|
|
base_model: openai/whisper-base |
|
|
tags: |
|
|
- generated_from_trainer |
|
|
datasets: |
|
|
- mozilla-foundation/common_voice_11_0 |
|
|
- tunis-ai/arabic_speech_corpus |
|
|
- THCHS-30 |
|
|
model-index: |
|
|
- name: lowhipa-base-comb |
|
|
results: [] |
|
|
pipeline_tag: automatic-speech-recognition |
|
|
--- |
|
|
|
|
|
<!-- This model card has been generated automatically according to the information the Trainer had access to. You |
|
|
should probably proofread and complete it, then remove this comment. --> |
|
|
|
|
|
# lowhipa-base-comb |
|
|
|
|
|
This Whisper-for-IPA (WhIPA) model adapter is a PEFT LoRA fine-tuned version of [openai/whisper-base](https://huggingface.co/openai/whisper-base) on a subset of: |
|
|
- CommonVoice11 dataset (1k samples each from Greek, Finnish, Hungarian, Japanese, Maltese, Polish, Tamil) with G2P-based IPA transcriptions |
|
|
- Mandarin THCHS-30 database (https://arxiv.org/pdf/1512.01882) with IPA transcriptions by Taubert (2023, https://zenodo.org/records/7528596) (1k samples) |
|
|
- Arabic Speech Corpus (https://en.arabicspeechcorpus.com) with custom IPA transcriptions transliterated from the provided Buckwalter transcriptions (1k samples) (https://doi.org/10.5281/zenodo.17111977) |
|
|
|
|
|
## Model description |
|
|
|
|
|
|
|
|
For deployment and description, please refer to https://github.com/jshrdt/whipa. |
|
|
|
|
|
``` |
|
|
from transformers import WhisperForConditionalGeneration, WhisperTokenizer, WhisperProcessor |
|
|
from peft import PeftModel |
|
|
|
|
|
tokenizer = WhisperTokenizer.from_pretrained("openai/whisper-base", task="transcribe") |
|
|
tokenizer.add_special_tokens({"additional_special_tokens": ["<|ip|>"] + tokenizer.all_special_tokens}) |
|
|
|
|
|
base_model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-base") |
|
|
base_model.generation_config.lang_to_id["<|ip|>"] = tokenizer.convert_tokens_to_ids(["<|ip|>"])[0] |
|
|
base_model.resize_token_embeddings(len(tokenizer)) |
|
|
|
|
|
whipa_model = PeftModel.from_pretrained(base_model, "jshrdt/lowhipa-base-comb") |
|
|
|
|
|
whipa_model.generation_config.language = "<|ip|>" |
|
|
whipa_model.generation_config.task = "transcribe" |
|
|
|
|
|
whipa_processor = WhisperProcessor.from_pretrained("openai/whisper-base", task="transcribe") |
|
|
|
|
|
``` |
|
|
|
|
|
## Intended uses & limitations |
|
|
|
|
|
More information needed |
|
|
|
|
|
## Training and evaluation data |
|
|
|
|
|
More information needed |
|
|
|
|
|
## Training procedure |
|
|
|
|
|
### Training hyperparameters |
|
|
|
|
|
### Training results |
|
|
|
|
|
| Training Loss | Epoch | Validation Loss | |
|
|
|:-------------:|:-------:|:---------------:| |
|
|
| 1.5428 | 2.0323 | 1.2981462478637695 | |
|
|
| 0.7498 | 4.0645 | 0.8457677960395813 | |
|
|
| 0.5968 | 6.0968 | 0.759925901889801 | |
|
|
| 0.5156 | 8.1290 | 0.7213243246078491 | |
|
|
| 0.4603 | 10.1613 | 0.7064764499664307 | |
|
|
|
|
|
|
|
|
### Framework versions |
|
|
|
|
|
- PEFT 0.15.1 |
|
|
- Transformers 4.48.3 |
|
|
- Pytorch 2.6.0+cu124 |
|
|
- Datasets 3.2.0 |
|
|
- Tokenizers 0.21.0 |