Balalaika models
Collection
5 items • Updated • 5
Official model for our INTERSPEECH 2026 paper "A Data-Centric Framework for Addressing Phonetic and Prosodic Challenges in Russian Speech Generative Models" (arXiv:2507.13563). Part of the Balalaika Russian speech data-processing pipeline — code: https://github.com/lab260ru/balalaika. If you use this resource, please cite it.
Solves the problem of extremely slow operation of the model on some devices and adds the ability to run inference directly from the GPU code.
!pip install git+https://github.com/NikiPshg/T-one-cuda-onnx.git
from tone import StreamingCTCPipeline, read_audio, read_example_audio
audio = read_example_audio() # or read_audio("your_audio.flac")
# device_id device_id if the graphics card is not found, the CPU is used
pipeline = StreamingCTCPipeline.from_hugging_face(device_id=0)
print(pipeline.forward_offline(audio)) # offline recognition using onnx cuda
If you use this resource, please cite our INTERSPEECH 2026 paper:
@inproceedings{borodin2026balalaika,
title = {A Data-Centric Framework for Addressing Phonetic and Prosodic Challenges in Russian Speech Generative Models},
author = {Borodin, Kirill and Vasiliev, Nikita and Kudryavtsev, Vasiliy and Maslov, Maxim and Gorodnichev, Mikhail and Rogov, Oleg and Mkrtchian, Grach},
booktitle = {Proc. INTERSPEECH 2026},
year = {2026},
note = {arXiv:2507.13563},
url = {https://arxiv.org/abs/2507.13563}
}