salesken
/

Hindi-FastConformer-Streaming-ASR

Automatic Speech Recognition

Model card Files Files and versions

Hindi-FastConformer-Streaming-ASR / README.md

dhayac's picture

Update README.md

1ac9f0a verified 11 months ago

|

history blame contribute delete

1.71 kB

	---
	license: apache-2.0
	datasets:
	- ai4bharat/IndicVoices-ST
	language:
	- hi
	metrics:
	- wer
	base_model:
	- nvidia/stt_en_fastconformer_hybrid_large_streaming_multi
	pipeline_tag: automatic-speech-recognition
	tags:
	- Nemo
	- Hindi
	- ASR
	- FastConformer
	---

	# Salesken-Streaming-FastConformer-Hindi-ASR

	This model is a fine-tuned version of STT En FastConformer Hybrid Large Streaming from NVIDIA's NeMo framework, specifically optimized for Hindi automatic speech recognition (ASR). The model has been fine-tuned on the Aibharath Hindi dataset to enhance transcription accuracy for Hindi speech, including real-time streaming applications.

	## Model Details

	### Model Description

	- Model type: Hybrid ASR model (CTC + Attention)
	- Model Sample Rate: 16000 hz
	- Language(s) (NLP): Hindi (`hi`)
	- License: Apache-2.0
	- Finetuned from model: `nvidia/stt_en_fastconformer_hybrid_large_streaming_multi`



	## How to Get Started with the Model

	You can load and use the model with the following code:

	```python
	import nemo.collections.asr as nemo_asr
	asr_model = nemo_asr.models.EncDecHybridRNNTCTCBPEModel.from_pretrained(model_name="salesken/Hindi-FastConformer-Streaming-ASR")

	# Optional: change the default latency. Default latency is 1040ms. Supported latencies: {0: 0ms, 1: 80ms, 16: 480ms, 33: 1040ms}.
	# Note: These are the worst latency and average latency would be half of these numbers.
	asr_model.encoder.set_default_att_context_size([70,13])

	#Optional: change the default decoder. Default decoder is Transducer (RNNT). Supported decoders: {ctc, rnnt}.
	asr_model.change_decoding_strategy(decoder_type='rnnt')

	asr_model.transcribe(['2086-149220-0033.wav'])