huzy0 commited on
Commit
905e21d
·
verified ·
1 Parent(s): c741484

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -8
README.md CHANGED
@@ -39,16 +39,16 @@ Unlike many existing models optimized for high-resource, Western languages, MERa
39
  ## Model Highlights
40
 
41
  #### Small model size
42
- With only 630M parameters (≈2.5 GB in memory), the model is easily deployable on most commercial GPUs, eliminating the need for distributed or large-scale compute setups.
43
 
44
  #### Natively multilingual
45
- Building on [MERaLiON-SpeechEncoder-v1](https://huggingface.co/MERaLiON/MERaLiON-SpeechEncoder-v1) (which focused on English and Singlish), this version expands to include English, Chinese, Malay, Tamil, Thai, Indonesian, and Vietnamese, along with codeswitching support across these languages. Given the wide coverage of languages in the training corpus, it may also be applicable beyond the officially supported languages.
46
 
47
  #### Competitive performance on downstream speech tasks
48
- The model retains near state-of-the-art results on the SUPERB benchmark for English, and showcases strong multilingual capabilities deomnstrated through its integration into a [high-performance ASR system shown below](#automatic-speech-recognition-asr).
49
 
50
  #### Innovative pre-training techniques
51
- MERaLiON-SpeechEncoder-2 was trained from scratch with a novel extension of the BEST-RQ self-supervised objective, by using more informative latent targets. We also adopted the Muon optimizer, which has previously only been shown to outperform the popular AdamW for LLM training. We find its advantages also carry over to speech-based models.
52
 
53
  ## Model Summary
54
 
@@ -77,12 +77,12 @@ MERaLiON-SpeechEncoder-2 is competitive to state-of-the-art, improving slightly
77
  ### Automatic Speech Recognition (ASR)
78
 
79
  <p align="center">
80
- <img src="overall_wer.svg" width="700"/>
81
- <img src="audiobench_wer.svg" width="700"/>
82
- <img src="fleurs_wer.svg" width="700"/>
83
  </p>
84
 
85
- Leveraging on the multilingual capabilities of MERaLiON-SpeechEncoder-2, we further finetuned the model for on supervised speech data to produce a lightweight MERaLiON-SpeechEncoder-2-ASR-CTC, which is competitive to models many times its size in transcribing the target languages, while offering much faster inference speeds. It outperforms the popular Whisper large v3 across most languages in [Audiobench](https://huggingface.co/spaces/MERaLiON/AudioBench-Leaderboard) and maintains close performance in FLEURS. Our internal benchmarking, shown in the 'Overall ASR Performance', also contains several private datasets in addition to Audiobench and FLEURS.
86
 
87
  ## Direct Use
88
 
 
39
  ## Model Highlights
40
 
41
  #### Small model size
42
+ With only **630M parameters (≈2.5 GB in memory)**, the model is easily deployable on most commercial GPUs, eliminating the need for distributed or large-scale compute setups.
43
 
44
  #### Natively multilingual
45
+ Building on [MERaLiON-SpeechEncoder-v1](https://huggingface.co/MERaLiON/MERaLiON-SpeechEncoder-v1) (which focused on English and Singlish), this version expands to include **English, Chinese, Malay, Tamil, Thai, Indonesian, and Vietnamese, along with codeswitching support across these languages**. Given the wide coverage of languages in the training corpus, it may also be applicable beyond the officially supported languages.
46
 
47
  #### Competitive performance on downstream speech tasks
48
+ The model retains near state-of-the-art results on the SUPERB benchmark for English, and showcases strong multilingual capabilities demonstrated through its integration into a [high-performance ASR system](#automatic-speech-recognition-asr).
49
 
50
  #### Innovative pre-training techniques
51
+ MERaLiON-SpeechEncoder-2 was trained from scratch with a **novel extension of the BEST-RQ** self-supervised objective, by using more informative latent targets. We also adopted the **Muon optimizer**, which has previously only been shown to outperform the popular AdamW for LLM training. We find its advantages also carry over to speech-based models.
52
 
53
  ## Model Summary
54
 
 
77
  ### Automatic Speech Recognition (ASR)
78
 
79
  <p align="center">
80
+ <img src="overall_wer.svg" width="720"/>
81
+ <img src="audiobench_wer.svg" width="720"/>
82
+ <img src="fleurs_wer.svg" width="720"/>
83
  </p>
84
 
85
+ Leveraging on the multilingual capabilities of MERaLiON-SpeechEncoder-2, we further finetuned the model for ASR on supervised speech data to produce a lightweight MERaLiON-SpeechEncoder-2-ASR-CTC, which is competitive to models many times its size in transcribing the target languages, while offering much faster inference speeds. It outperforms the popular Whisper large v3 across most languages in [Audiobench](https://huggingface.co/spaces/MERaLiON/AudioBench-Leaderboard) and maintains close performance on FLEURS. Our comprehensive internal benchmarking, shown in the 'Overall ASR Performance', also contains several private datasets in addition to Audiobench and FLEURS.
86
 
87
  ## Direct Use
88