|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- bn |
|
|
base_model: |
|
|
- sha1779/BengaliRegionalASR |
|
|
pipeline_tag: automatic-speech-recognition |
|
|
--- |
|
|
|
|
|
This is the CTranslate2 version which is faster than base version. |
|
|
|
|
|
## requirements |
|
|
```bash |
|
|
pip install ctranslate2 |
|
|
``` |
|
|
## Base model to Ctranslate format conversion |
|
|
```bash |
|
|
!ct2-transformers-converter --model sha1779/BengaliRegionalASR --output_dir sha1779/Faster_BengaliRegionalASR --copy_files tokenizer.json preprocessor_config.json --quantization float16 |
|
|
``` |
|
|
|
|
|
## Run the model |
|
|
```bash |
|
|
pip install faster-whisper |
|
|
|
|
|
``` |
|
|
|
|
|
```python |
|
|
from faster_whisper import WhisperModel |
|
|
|
|
|
model_size = "sha1779/Faster_BengaliRegionalASR" |
|
|
|
|
|
model = WhisperModel(model_size, device="cuda", compute_type="float16") |
|
|
segments, info = model.transcribe("audio.mp3", beam_size=5, language="en", condition_on_previous_text=False) |
|
|
|
|
|
for segment in segments: |
|
|
print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text)) |
|
|
|
|
|
``` |
|
|
|
|
|
|