Surt Small v1 Kirtan — Gurbani Kirtan ASR

Fine-tuned from surt-small-v1 (Sehaj Path model) on kirtan audio data for Gurbani kirtan transcription and forced alignment.

Model Details

Parameter	Value
Base model	`surindersinghssj/surt-small-v1-training` (step 3400)
Language	Punjabi (Gurmukhi script)
Task	Transcribe
WER	32.65%
CER	24.62%
Training data	260 kirtan samples from 11 artists

Training

Dataset: surindersinghssj/gurbani-asr-whisper-aligned — 260 train / 31 eval
Artists: 11 kirtan artists (Bhai Manpreet Singh Kanpuri, Bhai Anantvir Singh, etc.)
Hardware: NVIDIA A40, single GPU
Training time: ~21 minutes (500 steps)
Effective batch size: 16 (batch 8 x gradient accumulation 2)
Learning rate: 2e-5 (lower than base training since continuing from fine-tuned model)
Scheduler: Cosine decay with 50 warmup steps

WER Progression

Step	Epoch	WER	CER
0 (pre-training)	—	118.19%	92.82%
100	3.0	61.63%	48.85%
200	8.9	41.63%	30.34%
300	14.7	31.84%	23.19%
350	17.7	31.43%	23.19%
500 (final)	29.4	32.65%	24.62%

Related Models

Model	Use case	WER
`surt-small-v1`	Sehaj Path transcription	14.88%
surt-small-v1-kirtan (this)	Kirtan transcription/alignment	32.65%

Usage

from transformers import WhisperProcessor, WhisperForConditionalGeneration

processor = WhisperProcessor.from_pretrained("openai/whisper-small")
model = WhisperForConditionalGeneration.from_pretrained("surindersinghssj/surt-small-v1-kirtan")

import librosa
audio, sr = librosa.load("kirtan_audio.wav", sr=16000)
input_features = processor(audio, sampling_rate=16000, return_tensors="pt").input_features
predicted_ids = model.generate(input_features)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
print(transcription)

Note: Use WhisperProcessor.from_pretrained("openai/whisper-small") for the processor.

Limitations

Trained on only 260 samples — more data would significantly improve performance
Best WER was at step 350 (31.43%), slight overfitting after that
Audio-text alignment in training data is imperfect (kirtan involves repetition and musical phrasing)

License

Apache 2.0

Downloads last month: 27

Safetensors

Model size

0.2B params

Tensor type

F32

Model tree for surindersinghssj/surt-small-v1-kirtan

Base model

openai/whisper-small

Finetuned

surindersinghssj/surt-small-v1-training

Finetuned

(1)

this model

Dataset used to train surindersinghssj/surt-small-v1-kirtan

Evaluation results

WER on Gurbani ASR Whisper Aligned (Kirtan)
test set self-reported

32.650
CER on Gurbani ASR Whisper Aligned (Kirtan)
test set self-reported

24.620