Update README.md
Browse files
README.md
CHANGED
|
@@ -1,5 +1,7 @@
|
|
|
|
|
|
|
|
| 1 |
## Key Features
|
| 2 |
-
- π **Unified Representation:** A single semantic-acoustic unified representation for both understanding and generation tasks.
|
| 3 |
- π§ **High-Fidelity Reconstruction:** Achieve high-fidelity audio generation by modeling continuous features with a VAE, minimizing information loss and preserving intricate acoustic textures.
|
| 4 |
- π **Convolution-Free Efficiency:** Built on a pure causal transformer architecture, completely eliminating convolutional layers for superior efficiency and a simpler design.
|
| 5 |
|
|
@@ -124,7 +126,7 @@ torchaudio.save('./1089-134686-0000_reconstruct.wav', output_waveform.cpu()[0],
|
|
| 124 |
<td align="center">0.91</td>
|
| 125 |
</tr>
|
| 126 |
<tr>
|
| 127 |
-
<td align="left"><strong>
|
| 128 |
<td align="center">50</td>
|
| 129 |
<td align="center"><b>4.21</b></td>
|
| 130 |
<td align="center"><b>0.96</b></td>
|
|
@@ -189,7 +191,7 @@ torchaudio.save('./1089-134686-0000_reconstruct.wav', output_waveform.cpu()[0],
|
|
| 189 |
<td>31.73</td>
|
| 190 |
</tr>
|
| 191 |
<tr>
|
| 192 |
-
<td><strong>Ming-UniAudio(ours)</td>
|
| 193 |
<td>2.84</td>
|
| 194 |
<td>1.62</td>
|
| 195 |
<td><strong>9.80</strong></td>
|
|
@@ -251,7 +253,7 @@ torchaudio.save('./1089-134686-0000_reconstruct.wav', output_waveform.cpu()[0],
|
|
| 251 |
<td align="center">0.51</td>
|
| 252 |
</tr>
|
| 253 |
<tr>
|
| 254 |
-
<td align="left"><strong>Ming-UniAudio(ours)</td>
|
| 255 |
<td align="center"><b>0.95</b></td>
|
| 256 |
<td align="center">0.70</td>
|
| 257 |
<td align="center">1.85</td>
|
|
|
|
| 1 |
+
<p align="center">π <a href="">Technical Report</a>ο½π<a href="https://xqacmer.github.io/Ming-Unitok-Audio.github.io">Project Page</a> ο½π€ <a href="https://huggingface.co/inclusionAI/MingTok-Audio">Hugging Face</a>ο½ π€ <a href="https://modelscope.cn/models/inclusionAI/MingTok-Audio">ModelScope</a>
|
| 2 |
+
|
| 3 |
## Key Features
|
| 4 |
+
- π **Unified Representation:** A single semantic-acoustic unified continuous representation for both understanding and generation tasks.
|
| 5 |
- π§ **High-Fidelity Reconstruction:** Achieve high-fidelity audio generation by modeling continuous features with a VAE, minimizing information loss and preserving intricate acoustic textures.
|
| 6 |
- π **Convolution-Free Efficiency:** Built on a pure causal transformer architecture, completely eliminating convolutional layers for superior efficiency and a simpler design.
|
| 7 |
|
|
|
|
| 126 |
<td align="center">0.91</td>
|
| 127 |
</tr>
|
| 128 |
<tr>
|
| 129 |
+
<td align="left"><strong>MingTok-Audio(ours)</td>
|
| 130 |
<td align="center">50</td>
|
| 131 |
<td align="center"><b>4.21</b></td>
|
| 132 |
<td align="center"><b>0.96</b></td>
|
|
|
|
| 191 |
<td>31.73</td>
|
| 192 |
</tr>
|
| 193 |
<tr>
|
| 194 |
+
<td><strong>Ming-UniAudio-16A3B(ours)</td>
|
| 195 |
<td>2.84</td>
|
| 196 |
<td>1.62</td>
|
| 197 |
<td><strong>9.80</strong></td>
|
|
|
|
| 253 |
<td align="center">0.51</td>
|
| 254 |
</tr>
|
| 255 |
<tr>
|
| 256 |
+
<td align="left"><strong>Ming-UniAudio-16A3B(ours)</td>
|
| 257 |
<td align="center"><b>0.95</b></td>
|
| 258 |
<td align="center">0.70</td>
|
| 259 |
<td align="center">1.85</td>
|