Commit
·
f50b310
1
Parent(s):
8b576d1
Update README.md
Browse files
README.md
CHANGED
|
@@ -1,12 +1,27 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
`transformers.models.moss_audio_tokenizer` module. It is intended to be uploaded to a Hugging Face Hub model repository
|
| 7 |
and loaded with `trust_remote_code=True` when needed.
|
| 8 |
|
| 9 |
-
##
|
|
|
|
|
|
|
| 10 |
|
| 11 |
```python
|
| 12 |
import torch
|
|
@@ -20,7 +35,7 @@ enc = model.encode(audio, return_dict=True)
|
|
| 20 |
dec = model.decode(enc.audio_codes, return_dict=True)
|
| 21 |
```
|
| 22 |
|
| 23 |
-
|
| 24 |
|
| 25 |
`MossAudioTokenizerModel.encode` and `MossAudioTokenizerModel.decode` support simple streaming via a `chunk_duration`
|
| 26 |
argument.
|
|
@@ -28,7 +43,7 @@ argument.
|
|
| 28 |
- `chunk_duration` is expressed in seconds.
|
| 29 |
- It must be <= `MossAudioTokenizerConfig.causal_transformer_context_duration`.
|
| 30 |
- `chunk_duration * MossAudioTokenizerConfig.sampling_rate` must be divisible by `MossAudioTokenizerConfig.downsample_rate`.
|
| 31 |
-
-
|
| 32 |
|
| 33 |
```python
|
| 34 |
import torch
|
|
@@ -45,12 +60,8 @@ dec = model.decode(enc.audio_codes, return_dict=True, chunk_duration=0.08)
|
|
| 45 |
|
| 46 |
## Repository layout
|
| 47 |
|
| 48 |
-
Remote-code modules:
|
| 49 |
- `configuration_moss_audio_tokenizer.py`
|
| 50 |
- `modeling_moss_audio_tokenizer.py`
|
| 51 |
- `__init__.py`
|
| 52 |
-
|
| 53 |
-
Hub model files:
|
| 54 |
- `config.json`
|
| 55 |
- model weights
|
| 56 |
-
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
library_name: transformers
|
| 4 |
+
tags:
|
| 5 |
+
- audio
|
| 6 |
+
- audio-tokenizer
|
| 7 |
+
- neural-codec
|
| 8 |
+
- moss-audio-tokenizer
|
| 9 |
+
- speech-tokenizer
|
| 10 |
+
- trust-remote-code
|
| 11 |
+
---
|
| 12 |
+
|
| 13 |
+
# MossAudioTokenizer
|
| 14 |
+
|
| 15 |
+
MossAudioTokenizer is a neural audio codec model for audio tokenization and synthesis. It can encode audio waveforms
|
| 16 |
+
into discrete tokens and decode tokens back into audio waveforms.
|
| 17 |
+
|
| 18 |
+
This repository contains a lightweight remote-code implementation that mirrors the current 🤗 Transformers
|
| 19 |
`transformers.models.moss_audio_tokenizer` module. It is intended to be uploaded to a Hugging Face Hub model repository
|
| 20 |
and loaded with `trust_remote_code=True` when needed.
|
| 21 |
|
| 22 |
+
## Usage
|
| 23 |
+
|
| 24 |
+
### Quickstart
|
| 25 |
|
| 26 |
```python
|
| 27 |
import torch
|
|
|
|
| 35 |
dec = model.decode(enc.audio_codes, return_dict=True)
|
| 36 |
```
|
| 37 |
|
| 38 |
+
### Streaming
|
| 39 |
|
| 40 |
`MossAudioTokenizerModel.encode` and `MossAudioTokenizerModel.decode` support simple streaming via a `chunk_duration`
|
| 41 |
argument.
|
|
|
|
| 43 |
- `chunk_duration` is expressed in seconds.
|
| 44 |
- It must be <= `MossAudioTokenizerConfig.causal_transformer_context_duration`.
|
| 45 |
- `chunk_duration * MossAudioTokenizerConfig.sampling_rate` must be divisible by `MossAudioTokenizerConfig.downsample_rate`.
|
| 46 |
+
- Streaming chunking only supports `batch_size=1`.
|
| 47 |
|
| 48 |
```python
|
| 49 |
import torch
|
|
|
|
| 60 |
|
| 61 |
## Repository layout
|
| 62 |
|
|
|
|
| 63 |
- `configuration_moss_audio_tokenizer.py`
|
| 64 |
- `modeling_moss_audio_tokenizer.py`
|
| 65 |
- `__init__.py`
|
|
|
|
|
|
|
| 66 |
- `config.json`
|
| 67 |
- model weights
|
|
|