Feature Extraction
Transformers
Safetensors
moss-audio-tokenizer
audio
audio-tokenizer
neural-codec
moss-tts-family
MOSS Audio Tokenizer
speech-tokenizer
trust-remote-code
custom_code
Instructions to use OpenMOSS-Team/MOSS-Audio-Tokenizer with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use OpenMOSS-Team/MOSS-Audio-Tokenizer with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="OpenMOSS-Team/MOSS-Audio-Tokenizer", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("OpenMOSS-Team/MOSS-Audio-Tokenizer", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
Update modeling_moss_audio_tokenizer.py
#3
by fdugyt - opened
modeling_moss_audio_tokenizer.py
CHANGED
|
@@ -941,7 +941,7 @@ class MossAudioTokenizerPatchedPretransform(nn.Module):
|
|
| 941 |
x = x.reshape(b, d, -1, h).permute(0, 1, 3, 2).reshape(b, d * h, -1)
|
| 942 |
# We pad the input waveform to a multiple of `downsample_rate` before applying the encoder.
|
| 943 |
# Use a ceil division to match that padding and avoid dropping the last (partially padded) frame.
|
| 944 |
-
output_lengths =
|
| 945 |
return x, output_lengths
|
| 946 |
|
| 947 |
def decode(self, x, input_lengths):
|
|
|
|
| 941 |
x = x.reshape(b, d, -1, h).permute(0, 1, 3, 2).reshape(b, d * h, -1)
|
| 942 |
# We pad the input waveform to a multiple of `downsample_rate` before applying the encoder.
|
| 943 |
# Use a ceil division to match that padding and avoid dropping the last (partially padded) frame.
|
| 944 |
+
output_lengths = input_lengths // self.patch_size
|
| 945 |
return x, output_lengths
|
| 946 |
|
| 947 |
def decode(self, x, input_lengths):
|