Li-Ruixiao commited on
Commit
f50b310
·
1 Parent(s): 8b576d1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +23 -12
README.md CHANGED
@@ -1,12 +1,27 @@
1
- # MossAudioTokenizer (remote code)
2
-
3
- MossAudioTokenizer is a neural audio codec model for audio tokenization and synthesis.
4
-
5
- This repository contains a lightweight “remote code” implementation that mirrors the current 🤗 Transformers
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  `transformers.models.moss_audio_tokenizer` module. It is intended to be uploaded to a Hugging Face Hub model repository
7
  and loaded with `trust_remote_code=True` when needed.
8
 
9
- ## Quickstart
 
 
10
 
11
  ```python
12
  import torch
@@ -20,7 +35,7 @@ enc = model.encode(audio, return_dict=True)
20
  dec = model.decode(enc.audio_codes, return_dict=True)
21
  ```
22
 
23
- ## Streaming
24
 
25
  `MossAudioTokenizerModel.encode` and `MossAudioTokenizerModel.decode` support simple streaming via a `chunk_duration`
26
  argument.
@@ -28,7 +43,7 @@ argument.
28
  - `chunk_duration` is expressed in seconds.
29
  - It must be <= `MossAudioTokenizerConfig.causal_transformer_context_duration`.
30
  - `chunk_duration * MossAudioTokenizerConfig.sampling_rate` must be divisible by `MossAudioTokenizerConfig.downsample_rate`.
31
- - Current limitation: streaming chunking only supports `batch_size=1`.
32
 
33
  ```python
34
  import torch
@@ -45,12 +60,8 @@ dec = model.decode(enc.audio_codes, return_dict=True, chunk_duration=0.08)
45
 
46
  ## Repository layout
47
 
48
- Remote-code modules:
49
  - `configuration_moss_audio_tokenizer.py`
50
  - `modeling_moss_audio_tokenizer.py`
51
  - `__init__.py`
52
-
53
- Hub model files:
54
  - `config.json`
55
  - model weights
56
-
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: transformers
4
+ tags:
5
+ - audio
6
+ - audio-tokenizer
7
+ - neural-codec
8
+ - moss-audio-tokenizer
9
+ - speech-tokenizer
10
+ - trust-remote-code
11
+ ---
12
+
13
+ # MossAudioTokenizer
14
+
15
+ MossAudioTokenizer is a neural audio codec model for audio tokenization and synthesis. It can encode audio waveforms
16
+ into discrete tokens and decode tokens back into audio waveforms.
17
+
18
+ This repository contains a lightweight remote-code implementation that mirrors the current 🤗 Transformers
19
  `transformers.models.moss_audio_tokenizer` module. It is intended to be uploaded to a Hugging Face Hub model repository
20
  and loaded with `trust_remote_code=True` when needed.
21
 
22
+ ## Usage
23
+
24
+ ### Quickstart
25
 
26
  ```python
27
  import torch
 
35
  dec = model.decode(enc.audio_codes, return_dict=True)
36
  ```
37
 
38
+ ### Streaming
39
 
40
  `MossAudioTokenizerModel.encode` and `MossAudioTokenizerModel.decode` support simple streaming via a `chunk_duration`
41
  argument.
 
43
  - `chunk_duration` is expressed in seconds.
44
  - It must be <= `MossAudioTokenizerConfig.causal_transformer_context_duration`.
45
  - `chunk_duration * MossAudioTokenizerConfig.sampling_rate` must be divisible by `MossAudioTokenizerConfig.downsample_rate`.
46
+ - Streaming chunking only supports `batch_size=1`.
47
 
48
  ```python
49
  import torch
 
60
 
61
  ## Repository layout
62
 
 
63
  - `configuration_moss_audio_tokenizer.py`
64
  - `modeling_moss_audio_tokenizer.py`
65
  - `__init__.py`
 
 
66
  - `config.json`
67
  - model weights