capacit-ai
/

saga

Automatic Speech Recognition

text-generation

trust-remote-code

Model card Files Files and versions

AndreasEefsen commited on 22 days ago

Commit

f1f90b0

·

verified ·

1 Parent(s): 067a1b9

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -62,12 +62,12 @@ text = model.transcribe(audio, processor)
 print(text)
 ```
-The `processor.load_audio` and `model.transcribe` methods accept the following parameters:
 ## Long-Form Audio
 The base Qwen3-ASR architecture supports long inputs, but the most stable long-form decoding in this project came from accumulated-audio continuation decoding rather than a single naive generate call. The `model.transcribe()` method already implements this strategy it walks through the audio in `step_seconds` chunks, re-feeding the accumulated waveform together with previously decoded text so the model keeps prior context. The `step_seconds`, `rollback_tokens`, and `max_new_tokens` parameters can be tuned for your use case.
 ```python
 # Load and resample any audio file to a mono float32 waveform

 print(text)
 ```
 ## Long-Form Audio
 The base Qwen3-ASR architecture supports long inputs, but the most stable long-form decoding in this project came from accumulated-audio continuation decoding rather than a single naive generate call. The `model.transcribe()` method already implements this strategy it walks through the audio in `step_seconds` chunks, re-feeding the accumulated waveform together with previously decoded text so the model keeps prior context. The `step_seconds`, `rollback_tokens`, and `max_new_tokens` parameters can be tuned for your use case.
+The `processor.load_audio` and `model.transcribe` methods accept the following parameters:
 ```python
 # Load and resample any audio file to a mono float32 waveform