Update README.md
Browse files
README.md
CHANGED
|
@@ -99,7 +99,7 @@ Compared to the first version, this v2 model includes the following key updates
|
|
| 99 |
|
| 100 |
1. **RoPE Bug Fix**: Corrected a RoPE (Rotary Position Embedding) bug present in the original XCodec2 implementation (See [Issue #36](https://github.com/zhenye234/X-Codec-2.0/issues/36)).
|
| 101 |
2. **Upsampler Parameters**: The upsampler settings were changed to `hop_length=98`, `upsample_factors=[3, 3]`, and `kernel_sizes=[9, 9]`.
|
| 102 |
-
3. **Perceptual Loss Model**: The model used for calculating perceptual loss was switched from
|
| 103 |
4. **Spectral Discriminator Tuning**: The STFT (Short-Time Fourier Transform) settings for the spectral discriminator were adjusted to be more suitable for 44.1kHz high-sampling-rate audio.
|
| 104 |
|
| 105 |
---
|
|
|
|
| 99 |
|
| 100 |
1. **RoPE Bug Fix**: Corrected a RoPE (Rotary Position Embedding) bug present in the original XCodec2 implementation (See [Issue #36](https://github.com/zhenye234/X-Codec-2.0/issues/36)).
|
| 101 |
2. **Upsampler Parameters**: The upsampler settings were changed to `hop_length=98`, `upsample_factors=[3, 3]`, and `kernel_sizes=[9, 9]`.
|
| 102 |
+
3. **Perceptual Loss Model**: The model used for calculating perceptual loss was switched from [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) to [imprt/kushinada-hubert-large](https://huggingface.co/imprt/kushinada-hubert-large).
|
| 103 |
4. **Spectral Discriminator Tuning**: The STFT (Short-Time Fourier Transform) settings for the spectral discriminator were adjusted to be more suitable for 44.1kHz high-sampling-rate audio.
|
| 104 |
|
| 105 |
---
|