Update README.md
Browse files
README.md
CHANGED
|
@@ -12,7 +12,21 @@ license: cc-by-4.0
|
|
| 12 |
|
| 13 |
### `espnet/mms_1b_mlsuperb`
|
| 14 |
|
| 15 |
-
This
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 16 |
|
| 17 |
### Demo: How to use in ESPnet2
|
| 18 |
|
|
@@ -37,6 +51,12 @@ cd egs2/ml_superb2/asr1
|
|
| 37 |
- Git hash: `18d7dea6677b7ff55a67e2be19cb748fb1c51d74`
|
| 38 |
- Commit date: `Tue Dec 31 03:30:01 2024 +0000`
|
| 39 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 40 |
## exp/asr_train_asr_raw_char
|
| 41 |
### WER
|
| 42 |
|
|
|
|
| 12 |
|
| 13 |
### `espnet/mms_1b_mlsuperb`
|
| 14 |
|
| 15 |
+
This is a simple baseline for the ML-SUPERB 2.0 Challenge. It is a self-supervised [MMS 1B](https://huggingface.co/facebook/mms-1b) model fine-tuned on [142 languages of ML-SUPERB](https://huggingface.co/datasets/ftshijt/mlsuperb_8th) using CTC loss.
|
| 16 |
+
The MMS model is frozen and used as a feature extractor for a small Transformer encoder during fine-tuning, which took approximately 1 day on a single GPU.
|
| 17 |
+
|
| 18 |
+
The model was trained using the [ML-SUPERB recipe](https://github.com/espnet/espnet/tree/master/egs2/ml_superb2/asr1) in ESPnet. Inference can be performed with the following script:
|
| 19 |
+
|
| 20 |
+
```
|
| 21 |
+
from espnet2.bin.asr_inference import Speech2Text
|
| 22 |
+
|
| 23 |
+
model = Speech2Text.from_pretrained(
|
| 24 |
+
"espnet/mms_1b_mlsuperb"
|
| 25 |
+
)
|
| 26 |
+
|
| 27 |
+
speech, rate = soundfile.read("speech.wav")
|
| 28 |
+
text, *_ = model(speech)[0]
|
| 29 |
+
```
|
| 30 |
|
| 31 |
### Demo: How to use in ESPnet2
|
| 32 |
|
|
|
|
| 51 |
- Git hash: `18d7dea6677b7ff55a67e2be19cb748fb1c51d74`
|
| 52 |
- Commit date: `Tue Dec 31 03:30:01 2024 +0000`
|
| 53 |
|
| 54 |
+
## Challenge
|
| 55 |
+
|
| 56 |
+
|decode_dir|Standard CER|Standard LID|Worst 15 CER|CER StD|Dialect CER|Dialect LID|
|
| 57 |
+
|---|---|---|---|---|---|---|
|
| 58 |
+
decode_asr_asr_model_valid.loss.ave|23.97|73.95|71.08|25.52|53.96|32.74|
|
| 59 |
+
|
| 60 |
## exp/asr_train_asr_raw_char
|
| 61 |
### WER
|
| 62 |
|