Update README.md
Browse files
README.md
CHANGED
|
@@ -62,12 +62,12 @@ text = model.transcribe(audio, processor)
|
|
| 62 |
print(text)
|
| 63 |
```
|
| 64 |
|
| 65 |
-
The `processor.load_audio` and `model.transcribe` methods accept the following parameters:
|
| 66 |
-
|
| 67 |
## Long-Form Audio
|
| 68 |
|
| 69 |
The base Qwen3-ASR architecture supports long inputs, but the most stable long-form decoding in this project came from accumulated-audio continuation decoding rather than a single naive generate call. The `model.transcribe()` method already implements this strategy it walks through the audio in `step_seconds` chunks, re-feeding the accumulated waveform together with previously decoded text so the model keeps prior context. The `step_seconds`, `rollback_tokens`, and `max_new_tokens` parameters can be tuned for your use case.
|
| 70 |
|
|
|
|
|
|
|
| 71 |
|
| 72 |
```python
|
| 73 |
# Load and resample any audio file to a mono float32 waveform
|
|
|
|
| 62 |
print(text)
|
| 63 |
```
|
| 64 |
|
|
|
|
|
|
|
| 65 |
## Long-Form Audio
|
| 66 |
|
| 67 |
The base Qwen3-ASR architecture supports long inputs, but the most stable long-form decoding in this project came from accumulated-audio continuation decoding rather than a single naive generate call. The `model.transcribe()` method already implements this strategy it walks through the audio in `step_seconds` chunks, re-feeding the accumulated waveform together with previously decoded text so the model keeps prior context. The `step_seconds`, `rollback_tokens`, and `max_new_tokens` parameters can be tuned for your use case.
|
| 68 |
|
| 69 |
+
The `processor.load_audio` and `model.transcribe` methods accept the following parameters:
|
| 70 |
+
|
| 71 |
|
| 72 |
```python
|
| 73 |
# Load and resample any audio file to a mono float32 waveform
|