|
|
--- |
|
|
license: other |
|
|
license_name: test |
|
|
license_link: LICENSE |
|
|
language: |
|
|
- en |
|
|
- fr |
|
|
- de |
|
|
- es |
|
|
- pt |
|
|
metrics: |
|
|
- accuracy |
|
|
- cer |
|
|
pipeline_tag: automatic-speech-recognition |
|
|
--- |
|
|
# Model Card for Model ID |
|
|
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
|
|
|
> **( update august 2025 - CC-BY models are coming soon. )** |
|
|
|
|
|
## Overview |
|
|
This is a family of low-latency streaming models designed for use on edge devices. |
|
|
**Goal**: Provide faster or higher-quality performance compared to similarly sized Whisper and other models. |
|
|
|
|
|
- **Languages**: English, French, German (7 more languages coming). |
|
|
|
|
|
## Demos |
|
|
- [**Browser Demo (CPU)**](https://huggingface.co/spaces/Banafo/Kroko-Streaming-ASR-Wasm) |
|
|
*(Runs entirely in the browser using CPU.)* |
|
|
- [**Gradio / Python Demo**](https://huggingface.co/spaces/Banafo/Kroko-Streaming-ASR-Python) |
|
|
|
|
|
## License |
|
|
The license is still under consideration (likely Coqui). The model is intended to be **dual-licensed**: |
|
|
- **Free for non-commercial use**. |
|
|
- **Affordable license for commercial use**. |
|
|
|
|
|
|
|
|
|
|
|
## Training |
|
|
- Training is done with a modified k2/Icefall pipeline. |
|
|
- Inference can be performed with the standard Sherpa project. |
|
|
- Silence padding and volume normalization may help produce better results. |
|
|
|
|
|
## Acknowledgements |
|
|
Special thanks to the [Lhotse](https://github.com/lhotse-speech/lhotse), [Sherpa](https://github.com/k2-fsa/sherpa), [k2](https://github.com/k2-fsa/k2), and [Icefall](https://github.com/k2-fsa/icefall) teams for their support and tools. |