|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- en |
|
|
- gu |
|
|
pipeline_tag: automatic-speech-recognition |
|
|
--- |
|
|
--- |
|
|
language: |
|
|
- gu |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- whisper-event |
|
|
metrics: |
|
|
- wer |
|
|
model-index: |
|
|
- name: LLM_GUJARATI - Manan Raval |
|
|
results: |
|
|
- task: |
|
|
type: automatic-speech-recognition |
|
|
name: Automatic Speech Recognition |
|
|
dataset: |
|
|
name: google/fleurs |
|
|
type: google/fleurs |
|
|
config: gu_in |
|
|
split: test |
|
|
metrics: |
|
|
- type: wer |
|
|
value: 12.33 |
|
|
name: WER |
|
|
|
|
|
|
|
|
## Usage |
|
|
|
|
|
In order to infer a single audio file using this model, the following code snippet can be used: |
|
|
|
|
|
```python |
|
|
>>> import torch |
|
|
>>> from transformers import pipeline |
|
|
|
|
|
>>> # path to the audio file to be transcribed |
|
|
>>> audio = "/path/to/audio.format" |
|
|
>>> device = "cuda:0" if torch.cuda.is_available() else "cpu" |
|
|
|
|
|
>>> transcribe = pipeline(task="automatic-speech-recognition", model="mananvh/LLM_GUJARATI", chunk_length_s=30, device=device) |
|
|
>>> transcribe.model.config.forced_decoder_ids = transcribe.tokenizer.get_decoder_prompt_ids(language="gu", task="transcribe") |
|
|
|
|
|
>>> print('Transcription: ', transcribe(audio)["text"]) |
|
|
``` |
|
|
|
|
|
## Acknowledgement |
|
|
This work was done at [Virtual Height IT Services Pvt. Ltd.] |