MultiBridge
/

wav2vec-LnNor-IPA-ft

@@ -13,6 +13,19 @@ pipeline_tag: automatic-speech-recognition
 tags:
 - phoneme_recognition
 - IPA
 ---
 # Model Card for MultiBridge/wav2vec-LnNor-IPA-ft
@@ -46,11 +59,6 @@ This model is built for phoneme recognition tasks. It was developed by fine-tuni
 - Speech processing applications: Serving as a component in speech processing pipelines or prototyping.
-### Out-of-Scope Use
-<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
-[More Information Needed]
 ## Bias, Risks, and Limitations
@@ -72,7 +80,32 @@ Evaluate the model's performance for your specific use case.
 Use the code below to get started with the model.
-[More Information Needed]
 ## Training Details
@@ -105,18 +138,17 @@ The training dataset was filtered. Recordings shorter than 2 seconds or longer t
 - optimizer: AdamW
 - batch size: 64
 - weight decay: 0.001
-- epochs: 50
 #### Speeds, Sizes, Times [optional]
 Avg epoch training time: 650s
-Number of updates: 36050
-Final training loss:
-Final validation loss:
 <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
-[More Information Needed]
 ## Evaluation

 tags:
 - phoneme_recognition
 - IPA
+model-index:
+- name: MultiBridge/wav2vec-LnNor-IPA-ft
+  results:
+  - task:
+      type: phoneme-recognition        # Required. Example: automatic-speech-recognition
+      name: Phoneme Recognition          # Optional. Example: Speech Recognition
+    dataset:
+      type: speech31/timit_english_ipa          # Required. Example: common_voice. Use dataset id from https://hf.co/datasets
+      name: TIMIT          # Required. A pretty name for the dataset. Example: Common Voice (French)
+    metrics:
+      - type: cer         # Required. Example: wer. Use metric id from https://hf.co/metrics
+        value: 0.0416       # Required. Example: 20.90
+        name: CER       # Optional. Example: Test WER
 ---
 # Model Card for MultiBridge/wav2vec-LnNor-IPA-ft
 - Speech processing applications: Serving as a component in speech processing pipelines or prototyping.
 ## Bias, Risks, and Limitations
 Use the code below to get started with the model.
+```python
+from transformers import Wav2Vec2Processor, Wav2Vec2ForCTC
+from datasets import load_dataset
+import torch
+# load model and processor
+processor = Wav2Vec2Processor.from_pretrained("MultiBridge/wav2vec-LnNor-IPA-ft")
+model = Wav2Vec2ForCTC.from_pretrained("MultiBridge/wav2vec-LnNor-IPA-ft")
+# load dummy dataset and read soundfiles
+ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", split="validation")
+# tokenize
+input_values = processor(ds[0]["audio"]["array"], return_tensors="pt").input_values
+# retrieve logits
+with torch.no_grad():
+    logits = model(input_values).logits
+# take argmax and decode
+predicted_ids = torch.argmax(logits, dim=-1)
+transcription = processor.batch_decode(predicted_ids)
+# => should give ['mɪstɝkwɪltɝɪzðəəpɑslʌvðəmɪdəlklæsəzændwiɑəɡlædtəwɛlkəmhɪzɡɑspəl'] for MISTER QUILTER IS THE APOSTLE OF THE MIDDLE CLASSES AND WE ARE GLAD TO WELCOME HIS GOSPEL
+```
 ## Training Details
 - optimizer: AdamW
 - batch size: 64
 - weight decay: 0.001
+- epochs: 40
 #### Speeds, Sizes, Times [optional]
 Avg epoch training time: 650s
+Number of updates: 28840
+Final training loss: 0.09713
+Final validation loss: 0.2142
 <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
 ## Evaluation