Kuangwei Chen commited on
Commit
737e5b1
·
1 Parent(s): 1c2bf4d

Update readme

Browse files
Files changed (1) hide show
  1. README.md +70 -0
README.md CHANGED
@@ -109,6 +109,76 @@ codes_list = [
109
  batch_dec = model.batch_decode(codes_list, chunk_duration=0.08)
110
  ```
111
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
112
  ## Repository layout
113
 
114
  - `configuration_moss_audio_tokenizer.py`
 
109
  batch_dec = model.batch_decode(codes_list, chunk_duration=0.08)
110
  ```
111
 
112
+ #### Continuous Batch Streaming Decode
113
+
114
+ For decoder-side continuous batching, prefer `batch_decode(..., streaming=True, ...)`.
115
+
116
+ - The first streaming call may pass `max_batch_size=...`. If it is omitted, the first batch size reserves the
117
+ fixed-slot decoder budget for that public stream.
118
+ - Same-size calls continue the existing logical rows in-order.
119
+ - If a later call is larger, the new rows are admitted by tail append.
120
+ - `finalize_indices` means "decode these rows one last time, then evict them". The indices are interpreted against the
121
+ pre-call logical order.
122
+ - After a finalize call returns, the next streaming call may use the smaller survivor batch.
123
+ - `reset_stream=True` discards the hidden public streaming state and starts a fresh stream.
124
+
125
+ Milestone 1 boundaries:
126
+
127
+ - decode-only continuous batching
128
+ - one active streaming decode state per model instance
129
+ - fixed-slot decoder reservation from `max_batch_size`
130
+ - no encode-side continuous batching
131
+ - no physical compaction of surviving decode slots
132
+ - no multi-session concurrency on one model instance
133
+
134
+ ```python
135
+ import torch
136
+ from transformers import AutoModel
137
+
138
+ repo_id = "OpenMOSS-Team/MOSS-Audio-Tokenizer"
139
+ model = AutoModel.from_pretrained(repo_id, trust_remote_code=True).eval()
140
+ num_quantizers = model.config.quantizer_kwargs["num_quantizers"]
141
+
142
+ codes_a0 = torch.randint(0, 8, (num_quantizers, 2))
143
+ codes_b0 = torch.randint(0, 8, (num_quantizers, 3))
144
+ codes_a1 = torch.randint(0, 8, (num_quantizers, 2))
145
+ codes_b1 = torch.randint(0, 8, (num_quantizers, 2))
146
+ codes_c0 = torch.randint(0, 8, (num_quantizers, 1))
147
+ codes_a2 = torch.randint(0, 8, (num_quantizers, 1))
148
+ codes_b2 = torch.randint(0, 8, (num_quantizers, 2))
149
+ codes_c1 = torch.randint(0, 8, (num_quantizers, 2))
150
+ codes_b3 = torch.randint(0, 8, (num_quantizers, 1))
151
+ codes_c2 = torch.randint(0, 8, (num_quantizers, 1))
152
+
153
+ # First call reserves 3 fixed decoder slots for A and B.
154
+ out_ab0 = model.batch_decode(
155
+ [codes_a0, codes_b0],
156
+ streaming=True,
157
+ max_batch_size=3,
158
+ reset_stream=True,
159
+ )
160
+
161
+ # Same logical rows continue in-order; C is a tail append.
162
+ out_abc1 = model.batch_decode(
163
+ [codes_a1, codes_b1, codes_c0],
164
+ streaming=True,
165
+ )
166
+
167
+ # Finalize A against the pre-call logical order. A still decodes in this call,
168
+ # then is evicted immediately afterward.
169
+ out_abc2 = model.batch_decode(
170
+ [codes_a2, codes_b2, codes_c1],
171
+ streaming=True,
172
+ finalize_indices=[0],
173
+ )
174
+
175
+ # The next call can shrink to the surviving logical rows only.
176
+ out_bc3 = model.batch_decode(
177
+ [codes_b3, codes_c2],
178
+ streaming=True,
179
+ )
180
+ ```
181
+
182
  ## Repository layout
183
 
184
  - `configuration_moss_audio_tokenizer.py`