Whisper Small Jiangyin

This is a fine-tuned version of openai/whisper-small for Jiangyin dialect (江阴话) automatic speech recognition from WuSutra.com.
Wusutra.com is a dialect crowdsourcing website which implements the entire ML workflow — including audio upload, model training, validation, and inference. You can upload your own audios and even trigger the training yourself on wusutra.com. If you have further questions, feel free to message me.

⚡ Looking for a smaller and faster option?
Please use the LoRA adapter version.
It provides the same Jiangyin dialect fine-tuning with much smaller storage.

📊 Evaluation on 45 Jiangyin dialect phrases: Character Error Rate (CER)

Model WER (%)
Baseline (whisper-small) 0.46
Fine-tuned (Jiangyin Dialect) 0.00

Model Details

  • Base model: openai/whisper-small
  • Language: Jiangyin dialect (江阴话) - a Wu Chinese dialect
  • Task: Automatic Speech Recognition (ASR)
  • Training data: Custom dataset of Jiangyin dialect recordings
  • Model size: 244M parameters

Usage

from transformers import WhisperForConditionalGeneration, WhisperProcessor

# Load model and processor
model = WhisperForConditionalGeneration.from_pretrained("jxue/whisper_small_jiangyin")
processor = WhisperProcessor.from_pretrained("jxue/whisper_small_jiangyin")

# Transcribe audio
inputs = processor(audio_array, sampling_rate=16000, return_tensors="pt")
generated_ids = model.generate(inputs.input_features)
transcription = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]

Training Details

  • Training epochs: 15
  • Batch size: 1
  • Learning rate: 5e-6
  • Training samples: 168
  • Validation samples: 61
  • Training infrastructure: AWS SageMaker ml.p3.2xlarge
  • Training time: 15 mins

Performance

  • Character Error Rate (CER): 0% on validation set
  • Validation examples: See validation_report.json in model files

Limitations

  • Optimized specifically for Jiangyin dialect

Significant improvement observed after fine-tuning on 119 dialect audio samples.

✅ Correct recognition example

REF (参考) Transliteration (音译) HYP (预测) CER
吃什么 切刀样 吃什么 0.000
不知道 佛晓得 不知道 0.000
素菜 搜菜 素菜 0.000
红烧肉 红搜牛 红烧肉 0.000
谁啊?小偷 啥人啦?贼骨头 谁啊?小偷 0.000
谁啊?老公 啥人啦?老官 谁啊?老公 0.000
节约 做人家 节约 0.000
闪电 忽显 闪电 0.000
下雨 落雨 下雨 0.000
丢人 坍台 丢人 0.000
泥土 难泥 泥土 0.000
灵个 0.000
到处都是 一天世界 到处都是 0.000
最后 压末落落 最后 0.000
睡觉 困觉 睡觉 0.000
小偷 贼骨头 小偷 0.000
拿不定主意 疑三惑四 拿不定主意 0.000
轻浮 轻骨头 轻浮 0.000
明天 明朝 明天 0.000
后天 后朝 后天 0.000
前天 先夜子 前天 0.000
妻子 阿嬷 妻子 0.000
这样 实梗 这样 0.000
出去 出去 出去 0.000
明天见 明朝会 明天见 0.000
什么东西 啥个物事 什么东西 0.000
什么时候 啥辰光 什么时候 0.000
回来 嘎来 回来 0.000
老公 老官 老公 0.000
十分寒冷 毕结骨 十分寒冷 0.000
谁啊 啥人啦 谁啊 0.000
男孩 细七煞 男孩 0.000
傍晚 夜快头 傍晚 0.000
肩膀 肩胛 肩膀 0.000
男子 老小家 男子 0.000
女子 丫头家 女子 0.000
今天吃点什么? 今朝吃点刀样啦? 今天吃点什么? 0.000
你这小子,是不是欠捧! 你个细棺材,阿要吃生活! 你这小子,是不是欠捧! 0.000
今天吃什么?不知道 今朝切刀样?佛晓得 今天吃什么?不知道 0.000
今天吃什么?红烧肉 今朝切刀样?红搜牛 今天吃什么?红烧肉 0.000
什么时候出去?明天 啥辰光出去?明朝 什么时候出去?明天 0.000
什么时候出去?后天 啥辰光出去?后朝 什么时候出去?后天 0.000
Downloads last month
6
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 2 Ask for provider support

Model tree for jxue/whisper-small-jiangyin

Finetuned
(3436)
this model