HindiSTT

A fine-tuned Whisper model for Hindi speech-to-text transcription, outputting Hinglish (Hindi written in Roman script).

Model Description

This model transcribes Hindi audio into romanized text (Hinglish), making it easier to read and process Hindi speech without requiring Devanagari script support.

Example Output:

  • Audio: [Hindi speech saying "नमस्ते, आप कैसे हैं?"]
  • Output: namaste, aap kaise hain?

Key Features

  1. Hinglish Output: Transcribes audio into spoken Hinglish language
  2. Whisper Architecture: Based on Whisper Large V3, compatible with transformers
  3. Noise Resistant: Handles noisy audio environments well
  4. Low Hallucination: Minimizes transcription hallucinations

Usage

import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline

device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32

model_id = "Svetozar1993/HindiSTT"

model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_id,
    torch_dtype=torch_dtype,
    low_cpu_mem_usage=True,
    use_safetensors=True
)
model.to(device)

processor = AutoProcessor.from_pretrained(model_id)

pipe = pipeline(
    "automatic-speech-recognition",
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    torch_dtype=torch_dtype,
    device=device,
    generate_kwargs={"task": "transcribe", "language": "en"}
)

result = pipe("audio.wav")
print(result["text"])

Flash Attention 2

For faster inference with Flash Attention:

pip install flash-attn --no-build-isolation
model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_id,
    torch_dtype=torch_dtype,
    low_cpu_mem_usage=True,
    attn_implementation="flash_attention_2"
)

Model Details

  • Base Model: Whisper Large V3
  • Language: Hindi (Romanized/Hinglish output)
  • Parameters: 1.5B
  • License: Apache 2.0

Author

Svetozar1993

Downloads last month
11
Safetensors
Model size
2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support