reecursion commited on
Commit
3bcc00d
·
verified ·
1 Parent(s): 43c9cf8

Upload fine-tuned OWSM model

Browse files
Files changed (3) hide show
  1. README.md +28 -0
  2. config.json +7 -0
  3. espnet_model/model.pth +3 -0
README.md ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Common Accent ASR Model
2
+
3
+ This is a fine-tuned ASR model based on [espnet/owsm_v3.1_ebf_base](https://huggingface.co/espnet/owsm_v3.1_ebf_base) trained on the [DTU54DL/common-accent](https://huggingface.co/datasets/DTU54DL/common-accent) dataset.
4
+
5
+ ## Model details
6
+ - Base model: espnet/owsm_v3.1_ebf_base
7
+ - Language: English
8
+ - Task: Automatic Speech Recognition
9
+
10
+ ## Usage
11
+
12
+ ```python
13
+ import torch
14
+ import numpy as np
15
+ from espnet2.bin.s2t_inference import Speech2Text
16
+
17
+ # Load the model
18
+ model = Speech2Text.from_pretrained(
19
+ "reecursion/accent-adaptive-owsm_v3.1_ebf_base",
20
+ lang_sym="<eng>",
21
+ beam_size=1,
22
+ device="cuda" if torch.cuda.is_available() else "cpu"
23
+ )
24
+
25
+ # Example inference
26
+ waveform = ... # Load your audio as numpy array
27
+ transcription = model(waveform)
28
+ print(transcription[0][0]) # Print the transcription
config.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "base_model": "espnet/owsm_v3.1_ebf_base",
3
+ "language": "eng",
4
+ "task": "asr",
5
+ "description": "Fine-tuned OWSM model on common-accent dataset",
6
+ "framework": "espnet"
7
+ }
espnet_model/model.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9b823aca9b746e8ffcd065d7d3be91db92a11243d6a9885d678d412756221b38
3
+ size 404942690