Upload fine-tuned OWSM model

Files changed (3) hide show

README.md ADDED Viewed

+# Common Accent ASR Model
+This is a fine-tuned ASR model based on [espnet/owsm_v3.1_ebf_base](https://huggingface.co/espnet/owsm_v3.1_ebf_base) trained on the [DTU54DL/common-accent](https://huggingface.co/datasets/DTU54DL/common-accent) dataset.
+## Model details
+- Base model: espnet/owsm_v3.1_ebf_base
+- Language: English
+- Task: Automatic Speech Recognition
+## Usage
+```python
+import torch
+import numpy as np
+from espnet2.bin.s2t_inference import Speech2Text
+# Load the model
+model = Speech2Text.from_pretrained(
+    "reecursion/accent-adaptive-owsm_v3.1_ebf_base",
+    lang_sym="<eng>",
+    beam_size=1,
+    device="cuda" if torch.cuda.is_available() else "cpu"
+)
+# Example inference
+waveform = ...  # Load your audio as numpy array
+transcription = model(waveform)
+print(transcription[0][0])  # Print the transcription

config.json ADDED Viewed

+{
+    "base_model": "espnet/owsm_v3.1_ebf_base",
+    "language": "eng",
+    "task": "asr",
+    "description": "Fine-tuned OWSM model on common-accent dataset",
+    "framework": "espnet"
+}

espnet_model/model.pth ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:9b823aca9b746e8ffcd065d7d3be91db92a11243d6a9885d678d412756221b38
+size 404942690