HindiSTT
A fine-tuned Whisper model for Hindi speech-to-text transcription, outputting Hinglish (Hindi written in Roman script).
Model Description
This model transcribes Hindi audio into romanized text (Hinglish), making it easier to read and process Hindi speech without requiring Devanagari script support.
Example Output:
- Audio: [Hindi speech saying "नमस्ते, आप कैसे हैं?"]
- Output:
namaste, aap kaise hain?
Key Features
- Hinglish Output: Transcribes audio into spoken Hinglish language
- Whisper Architecture: Based on Whisper Large V3, compatible with transformers
- Noise Resistant: Handles noisy audio environments well
- Low Hallucination: Minimizes transcription hallucinations
Usage
import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
model_id = "Svetozar1993/HindiSTT"
model = AutoModelForSpeechSeq2Seq.from_pretrained(
model_id,
torch_dtype=torch_dtype,
low_cpu_mem_usage=True,
use_safetensors=True
)
model.to(device)
processor = AutoProcessor.from_pretrained(model_id)
pipe = pipeline(
"automatic-speech-recognition",
model=model,
tokenizer=processor.tokenizer,
feature_extractor=processor.feature_extractor,
torch_dtype=torch_dtype,
device=device,
generate_kwargs={"task": "transcribe", "language": "en"}
)
result = pipe("audio.wav")
print(result["text"])
Flash Attention 2
For faster inference with Flash Attention:
pip install flash-attn --no-build-isolation
model = AutoModelForSpeechSeq2Seq.from_pretrained(
model_id,
torch_dtype=torch_dtype,
low_cpu_mem_usage=True,
attn_implementation="flash_attention_2"
)
Model Details
- Base Model: Whisper Large V3
- Language: Hindi (Romanized/Hinglish output)
- Parameters: 1.5B
- License: Apache 2.0
Author
Svetozar1993
- Downloads last month
- 11