OpenMOSS-Team
/

MOSS-Audio-Tokenizer-Nano

Feature Extraction

moss-audio-tokenizer

audio-tokenizer

moss-tts-family

MOSS Audio Tokenizer Nano

speech-tokenizer

trust-remote-code

Model card Files Files and versions

Kuangwei Chen commited on Apr 10

Commit

737e5b1

·

1 Parent(s): 1c2bf4d

Update readme

Files changed (1) hide show

README.md +70 -0

README.md CHANGED Viewed

@@ -109,6 +109,76 @@ codes_list = [
 batch_dec = model.batch_decode(codes_list, chunk_duration=0.08)
 ```
 ## Repository layout
 - `configuration_moss_audio_tokenizer.py`

 batch_dec = model.batch_decode(codes_list, chunk_duration=0.08)
 ```
+#### Continuous Batch Streaming Decode
+For decoder-side continuous batching, prefer `batch_decode(..., streaming=True, ...)`.
+- The first streaming call may pass `max_batch_size=...`. If it is omitted, the first batch size reserves the
+  fixed-slot decoder budget for that public stream.
+- Same-size calls continue the existing logical rows in-order.
+- If a later call is larger, the new rows are admitted by tail append.
+- `finalize_indices` means "decode these rows one last time, then evict them". The indices are interpreted against the
+  pre-call logical order.
+- After a finalize call returns, the next streaming call may use the smaller survivor batch.
+- `reset_stream=True` discards the hidden public streaming state and starts a fresh stream.
+Milestone 1 boundaries:
+- decode-only continuous batching
+- one active streaming decode state per model instance
+- fixed-slot decoder reservation from `max_batch_size`
+- no encode-side continuous batching
+- no physical compaction of surviving decode slots
+- no multi-session concurrency on one model instance
+```python
+import torch
+from transformers import AutoModel
+repo_id = "OpenMOSS-Team/MOSS-Audio-Tokenizer"
+model = AutoModel.from_pretrained(repo_id, trust_remote_code=True).eval()
+num_quantizers = model.config.quantizer_kwargs["num_quantizers"]
+codes_a0 = torch.randint(0, 8, (num_quantizers, 2))
+codes_b0 = torch.randint(0, 8, (num_quantizers, 3))
+codes_a1 = torch.randint(0, 8, (num_quantizers, 2))
+codes_b1 = torch.randint(0, 8, (num_quantizers, 2))
+codes_c0 = torch.randint(0, 8, (num_quantizers, 1))
+codes_a2 = torch.randint(0, 8, (num_quantizers, 1))
+codes_b2 = torch.randint(0, 8, (num_quantizers, 2))
+codes_c1 = torch.randint(0, 8, (num_quantizers, 2))
+codes_b3 = torch.randint(0, 8, (num_quantizers, 1))
+codes_c2 = torch.randint(0, 8, (num_quantizers, 1))
+# First call reserves 3 fixed decoder slots for A and B.
+out_ab0 = model.batch_decode(
+    [codes_a0, codes_b0],
+    streaming=True,
+    max_batch_size=3,
+    reset_stream=True,
+)
+# Same logical rows continue in-order; C is a tail append.
+out_abc1 = model.batch_decode(
+    [codes_a1, codes_b1, codes_c0],
+    streaming=True,
+)
+# Finalize A against the pre-call logical order. A still decodes in this call,
+# then is evicted immediately afterward.
+out_abc2 = model.batch_decode(
+    [codes_a2, codes_b2, codes_c1],
+    streaming=True,
+    finalize_indices=[0],
+)
+# The next call can shrink to the surviving logical rows only.
+out_bc3 = model.batch_decode(
+    [codes_b3, codes_c2],
+    streaming=True,
+)
+```
 ## Repository layout
 - `configuration_moss_audio_tokenizer.py`