streaming python code sample ?

#5
by drzraf - opened

Dears,
It sounds like a very interesting and potent model for French.

But I wonder, aside from the sample cli and a fastrtc example on GitHub, do you happen to have a Python example of streaming usage?
Code-wide, this often differ a bit from regular file/known block size usage and would be very valuable addition !

drzraf changed discussion title from streaming python sample code to streaming python sample code?
drzraf changed discussion title from streaming python sample code? to streaming python code sample ?
Owner

Hello!

We are working hard on a new GitHub and Huggingface page with more examples (and more models). There will be a streaming python code example as well.
I Hope in a week we will be ready.

Banafo changed discussion status to closed

I explored https://github.com/kroko-ai/kroko-onnx but couldn't find an example.
Many examples aren't actually specific to Kroko ASR (I guess they may have been pulled from other repositories and it's creating a burden to extract the actually useful samples. python-api-examples comes from Xioami), and most of them a full regular file as input.

For simili-streaming, the audio is cut in small batches but there is no documentation that I could find about model training/batch sizing
This one for example: https://github.com/kroko-ai/kroko-onnx/blob/master/python-api-examples/streaming_server.py
set a wait time of 5ms and 3 batches (but I don't know how many samples contains a batch in this context) but I'm not sure these value are adequate for the Kroko models at all.

The documentation/example (regarding streaming) should really be adapted to this specific model in order to be useful.

Owner

A new commit has been pushed.

You can now use the following commands to run the streaming server with HTTP and CLI support:

cd kroko-onnx
pip install .
python3 python-api-examples/streaming_server.py --model MODEL.data

If you use Pro model, please provide the license key using the --key parameter too.

This starts the server with an HTTP interface and support for the CLI client. To test it, you can use the example client:

python3 python-api-examples/online-websocket-client-decode-file.py sample.wav

Notes:
The default parameters should work fine out of the box.
There is no enforced chunk size limit, but note:
There is no internal buffering in the python server example, so sending very large chunks(like 30sec duration) may cause memory issues.
The recommended chunk size is 800 samples per message for optimal performance.

Give it a try and let us know how it works for you.

Banafo changed discussion status to open

Hi team,

First of all, congratulations on Kroko-ASR — the overall design and the real-time / streaming focus are very impressive. The community models you’ve released demonstrate very strong performance–latency trade-offs, and the engineering choices around deployment are particularly solid. Great work 👍

I’m currently evaluating Kroko-ASR for research and downstream adaptation, and I’d like to better understand the training side of the community models. I have a few technical questions:

  1. Pretraining checkpoints / weights

Are there plans to release (or is it already possible to access) the pretraining checkpoints for the community models?
Specifically:

Weights before final fine-tuning, pruning, or quantization

Intermediate checkpoints used during large-scale training

Access to these would be extremely helpful for:

Controlled ablation studies

Domain-specific adaptation

Comparing different fine-tuning strategies on top of the same pretrained backbone

  1. Training and data augmentation details

Could you share more details about the training pipeline, especially for the community models?

For example:

Pretraining data composition (datasets, total hours, language balance)

Whether self-supervised or hybrid objectives were used

Data augmentation strategies (e.g., speed perturbation, noise/reverb, SpecAugment variants)

Any training tricks or hyperparameter choices that had a significant impact on robustness or convergence

  1. Recommended adaptation / continued training workflow

For users who want to adapt Kroko-ASR to domain-specific or noisy real-world data, are there recommended best practices?

Suggested fine-tuning recipes

Compatible toolchains (e.g., k2 / icefall / Sherpa-ONNX workflows)

Whether continued pretraining vs. supervised fine-tuning is preferred

Thanks again for open-sourcing such a high-quality ASR system. Any guidance or pointers would be greatly appreciated, and would help the community adopt and extend Kroko-ASR more effectively.

Sign up or log in to comment