Kuangwei Chen commited on
Commit ·
737e5b1
1
Parent(s): 1c2bf4d
Update readme
Browse files
README.md
CHANGED
|
@@ -109,6 +109,76 @@ codes_list = [
|
|
| 109 |
batch_dec = model.batch_decode(codes_list, chunk_duration=0.08)
|
| 110 |
```
|
| 111 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 112 |
## Repository layout
|
| 113 |
|
| 114 |
- `configuration_moss_audio_tokenizer.py`
|
|
|
|
| 109 |
batch_dec = model.batch_decode(codes_list, chunk_duration=0.08)
|
| 110 |
```
|
| 111 |
|
| 112 |
+
#### Continuous Batch Streaming Decode
|
| 113 |
+
|
| 114 |
+
For decoder-side continuous batching, prefer `batch_decode(..., streaming=True, ...)`.
|
| 115 |
+
|
| 116 |
+
- The first streaming call may pass `max_batch_size=...`. If it is omitted, the first batch size reserves the
|
| 117 |
+
fixed-slot decoder budget for that public stream.
|
| 118 |
+
- Same-size calls continue the existing logical rows in-order.
|
| 119 |
+
- If a later call is larger, the new rows are admitted by tail append.
|
| 120 |
+
- `finalize_indices` means "decode these rows one last time, then evict them". The indices are interpreted against the
|
| 121 |
+
pre-call logical order.
|
| 122 |
+
- After a finalize call returns, the next streaming call may use the smaller survivor batch.
|
| 123 |
+
- `reset_stream=True` discards the hidden public streaming state and starts a fresh stream.
|
| 124 |
+
|
| 125 |
+
Milestone 1 boundaries:
|
| 126 |
+
|
| 127 |
+
- decode-only continuous batching
|
| 128 |
+
- one active streaming decode state per model instance
|
| 129 |
+
- fixed-slot decoder reservation from `max_batch_size`
|
| 130 |
+
- no encode-side continuous batching
|
| 131 |
+
- no physical compaction of surviving decode slots
|
| 132 |
+
- no multi-session concurrency on one model instance
|
| 133 |
+
|
| 134 |
+
```python
|
| 135 |
+
import torch
|
| 136 |
+
from transformers import AutoModel
|
| 137 |
+
|
| 138 |
+
repo_id = "OpenMOSS-Team/MOSS-Audio-Tokenizer"
|
| 139 |
+
model = AutoModel.from_pretrained(repo_id, trust_remote_code=True).eval()
|
| 140 |
+
num_quantizers = model.config.quantizer_kwargs["num_quantizers"]
|
| 141 |
+
|
| 142 |
+
codes_a0 = torch.randint(0, 8, (num_quantizers, 2))
|
| 143 |
+
codes_b0 = torch.randint(0, 8, (num_quantizers, 3))
|
| 144 |
+
codes_a1 = torch.randint(0, 8, (num_quantizers, 2))
|
| 145 |
+
codes_b1 = torch.randint(0, 8, (num_quantizers, 2))
|
| 146 |
+
codes_c0 = torch.randint(0, 8, (num_quantizers, 1))
|
| 147 |
+
codes_a2 = torch.randint(0, 8, (num_quantizers, 1))
|
| 148 |
+
codes_b2 = torch.randint(0, 8, (num_quantizers, 2))
|
| 149 |
+
codes_c1 = torch.randint(0, 8, (num_quantizers, 2))
|
| 150 |
+
codes_b3 = torch.randint(0, 8, (num_quantizers, 1))
|
| 151 |
+
codes_c2 = torch.randint(0, 8, (num_quantizers, 1))
|
| 152 |
+
|
| 153 |
+
# First call reserves 3 fixed decoder slots for A and B.
|
| 154 |
+
out_ab0 = model.batch_decode(
|
| 155 |
+
[codes_a0, codes_b0],
|
| 156 |
+
streaming=True,
|
| 157 |
+
max_batch_size=3,
|
| 158 |
+
reset_stream=True,
|
| 159 |
+
)
|
| 160 |
+
|
| 161 |
+
# Same logical rows continue in-order; C is a tail append.
|
| 162 |
+
out_abc1 = model.batch_decode(
|
| 163 |
+
[codes_a1, codes_b1, codes_c0],
|
| 164 |
+
streaming=True,
|
| 165 |
+
)
|
| 166 |
+
|
| 167 |
+
# Finalize A against the pre-call logical order. A still decodes in this call,
|
| 168 |
+
# then is evicted immediately afterward.
|
| 169 |
+
out_abc2 = model.batch_decode(
|
| 170 |
+
[codes_a2, codes_b2, codes_c1],
|
| 171 |
+
streaming=True,
|
| 172 |
+
finalize_indices=[0],
|
| 173 |
+
)
|
| 174 |
+
|
| 175 |
+
# The next call can shrink to the surviving logical rows only.
|
| 176 |
+
out_bc3 = model.batch_decode(
|
| 177 |
+
[codes_b3, codes_c2],
|
| 178 |
+
streaming=True,
|
| 179 |
+
)
|
| 180 |
+
```
|
| 181 |
+
|
| 182 |
## Repository layout
|
| 183 |
|
| 184 |
- `configuration_moss_audio_tokenizer.py`
|