mtspeech
/

MooER-MTL-5K

@@ -18,6 +18,10 @@ tags:
 **Online Demo**: https://mooer-speech.mthreads.com:10077/
 ## 📖 Introduction
 We introduce **MooER (摩耳)**: an LLM-based speech recognition and translation model developed by Moore Threads. With the *MooER* framework, you can transcribe the speech into text (speech recognition or, ASR), and  translate it into other languages (speech translation or, AST) in a end-to-end manner. The performance of *MooER* is demonstrated in the subsequent section, along with our insights into model configurations, training strategies, and more, provided in our [technical report](https://arxiv.org/abs/2408.05101).
@@ -67,6 +71,7 @@ The performance of speech recognition is evaluated using WER/CER.
     <th>SeamlessM4T-v2</th>
     <th>MooER-5K</th>
     <th>MooER-80K</th>
   </tr>
   <tr>
     <td rowspan="7">Chinese</td>
@@ -78,6 +83,7 @@ The performance of speech recognition is evaluated using WER/CER.
     <td>4.09</td>
     <td>1.93</td>
     <td>1.25</td>
   </tr>
   <tr>
     <td>aishell2_ios</td>
@@ -88,6 +94,7 @@ The performance of speech recognition is evaluated using WER/CER.
     <td>4.81</td>
     <td>3.17</td>
     <td>2.67</td>
   </tr>
   <tr>
     <td>test_magicdata</td>
@@ -98,6 +105,7 @@ The performance of speech recognition is evaluated using WER/CER.
     <td>9.69</td>
     <td>3.48</td>
     <td>2.52</td>
   </tr>
   <tr>
     <td>test_thchs</td>
@@ -108,6 +116,7 @@ The performance of speech recognition is evaluated using WER/CER.
     <td>7.14</td>
     <td>4.11</td>
     <td>3.14</td>
   </tr>
   <tr>
     <td>fleurs cmn_dev</td>
@@ -118,6 +127,7 @@ The performance of speech recognition is evaluated using WER/CER.
     <td>7.12</td>
     <td>5.81</td>
     <td>5.23</td>
   </tr>
   <tr>
     <td>fleurs cmn_test</td>
@@ -128,6 +138,7 @@ The performance of speech recognition is evaluated using WER/CER.
     <td>7.66</td>
     <td>6.77</td>
     <td>6.18</td>
   </tr>
   <tr>
     <td>average</td>
@@ -138,6 +149,7 @@ The performance of speech recognition is evaluated using WER/CER.
     <td><strong>6.75</strong></td>
     <td><strong>4.21</strong></td>
     <td><strong>3.50</strong></td>
   </tr>
   <tr>
     <td rowspan="7">English</td>
@@ -149,6 +161,7 @@ The performance of speech recognition is evaluated using WER/CER.
     <td>2.77</td>
     <td>7.78</td>
     <td>4.11</td>
   </tr>
   <tr>
     <td>librispeech test_other</td>
@@ -159,6 +172,7 @@ The performance of speech recognition is evaluated using WER/CER.
     <td>5.25</td>
     <td>15.25</td>
     <td>9.99</td>
   </tr>
   <tr>
     <td>fleurs eng_dev</td>
@@ -169,6 +183,7 @@ The performance of speech recognition is evaluated using WER/CER.
     <td>11.36</td>
     <td>18.89</td>
     <td>13.32</td>
   </tr>
   <tr>
     <td>fleurs eng_test</td>
@@ -179,6 +194,7 @@ The performance of speech recognition is evaluated using WER/CER.
     <td>11.82</td>
     <td>20.41</td>
     <td>14.97</td>
   </tr>
   <tr>
     <td>gigaspeech dev</td>
@@ -189,6 +205,7 @@ The performance of speech recognition is evaluated using WER/CER.
     <td>28.01</td>
     <td>23.46</td>
     <td>16.92</td>
   </tr>
   <tr>
     <td>gigaspeech test</td>
@@ -199,6 +216,7 @@ The performance of speech recognition is evaluated using WER/CER.
     <td>28.65</td>
     <td>22.09</td>
     <td>16.64</td>
   </tr>
   <tr>
     <td>average</td>
@@ -209,6 +227,7 @@ The performance of speech recognition is evaluated using WER/CER.
     <td><strong>14.64</strong></td>
     <td><strong>17.98</strong></td>
     <td><strong>12.66</strong></td>
   </tr>
 </table>
@@ -239,7 +258,7 @@ If you find MooER useful for your research, please 🌟 this repo and cite our w
 ```bibtex
 @article{liang2024mooer,
-  title   = {MooER: an LLM-based Speech Recognition and Translation Model from Moore Thread},
   author  = {Zhenlin Liang, Junhao Xu, Yi Liu, Yichao Hu, Jian Li, Yajun Zheng, Meng Cai, Hua Wang},
   journal = {arXiv preprint arXiv:2408.05101},
   url     = {https://arxiv.org/abs/2408.05101},

 **Online Demo**: https://mooer-speech.mthreads.com:10077/
+## 🔥 Update
+We release a new model *MooER-80K-v2* using 80K hours of data. Click [here](https://huggingface.co/mtspeech/MooER-MTL-80K) to try the new model.
 ## 📖 Introduction
 We introduce **MooER (摩耳)**: an LLM-based speech recognition and translation model developed by Moore Threads. With the *MooER* framework, you can transcribe the speech into text (speech recognition or, ASR), and  translate it into other languages (speech translation or, AST) in a end-to-end manner. The performance of *MooER* is demonstrated in the subsequent section, along with our insights into model configurations, training strategies, and more, provided in our [technical report](https://arxiv.org/abs/2408.05101).
     <th>SeamlessM4T-v2</th>
     <th>MooER-5K</th>
     <th>MooER-80K</th>
+    <th>MooER-80K-v2</th>
   </tr>
   <tr>
     <td rowspan="7">Chinese</td>
     <td>4.09</td>
     <td>1.93</td>
     <td>1.25</td>
+    <td>1.00</td>
   </tr>
   <tr>
     <td>aishell2_ios</td>
     <td>4.81</td>
     <td>3.17</td>
     <td>2.67</td>
+    <td>2.62</td>
   </tr>
   <tr>
     <td>test_magicdata</td>
     <td>9.69</td>
     <td>3.48</td>
     <td>2.52</td>
+    <td>2.17</td>
   </tr>
   <tr>
     <td>test_thchs</td>
     <td>7.14</td>
     <td>4.11</td>
     <td>3.14</td>
+    <td>3.00</td>
   </tr>
   <tr>
     <td>fleurs cmn_dev</td>
     <td>7.12</td>
     <td>5.81</td>
     <td>5.23</td>
+    <td>5.15</td>
   </tr>
   <tr>
     <td>fleurs cmn_test</td>
     <td>7.66</td>
     <td>6.77</td>
     <td>6.18</td>
+    <td>6.14</td>
   </tr>
   <tr>
     <td>average</td>
     <td><strong>6.75</strong></td>
     <td><strong>4.21</strong></td>
     <td><strong>3.50</strong></td>
+    <td><strong>3.35</strong></td>
   </tr>
   <tr>
     <td rowspan="7">English</td>
     <td>2.77</td>
     <td>7.78</td>
     <td>4.11</td>
+    <td>3.57</td>
   </tr>
   <tr>
     <td>librispeech test_other</td>
     <td>5.25</td>
     <td>15.25</td>
     <td>9.99</td>
+    <td>9.09</td>
   </tr>
   <tr>
     <td>fleurs eng_dev</td>
     <td>11.36</td>
     <td>18.89</td>
     <td>13.32</td>
+    <td>13.12</td>
   </tr>
   <tr>
     <td>fleurs eng_test</td>
     <td>11.82</td>
     <td>20.41</td>
     <td>14.97</td>
+    <td>14.74</td>
   </tr>
   <tr>
     <td>gigaspeech dev</td>
     <td>28.01</td>
     <td>23.46</td>
     <td>16.92</td>
+    <td>17.34</td>
   </tr>
   <tr>
     <td>gigaspeech test</td>
     <td>28.65</td>
     <td>22.09</td>
     <td>16.64</td>
+    <td>16.97</td>
   </tr>
   <tr>
     <td>average</td>
     <td><strong>14.64</strong></td>
     <td><strong>17.98</strong></td>
     <td><strong>12.66</strong></td>
+    <td><strong>12.47</strong></td>
   </tr>
 </table>
 ```bibtex
 @article{liang2024mooer,
+  title   = {MooER: an LLM-based Speech Recognition and Translation Model from Moore Threads},
   author  = {Zhenlin Liang, Junhao Xu, Yi Liu, Yichao Hu, Jian Li, Yajun Zheng, Meng Cai, Hua Wang},
   journal = {arXiv preprint arXiv:2408.05101},
   url     = {https://arxiv.org/abs/2408.05101},