Whisper Small Jiangyin

This is a fine-tuned version of openai/whisper-small for Jiangyin dialect (江阴话) automatic speech recognition from WuSutra.com.
Wusutra.com is a dialect crowdsourcing website which implements the entire ML workflow — including audio upload, model training, validation, and inference. You can upload your own audios and even trigger the training yourself on wusutra.com. If you have further questions, feel free to message me.

⚡ Looking for a smaller and faster option?
Please use the LoRA adapter version.
It provides the same Jiangyin dialect fine-tuning with much smaller storage.

📊 Evaluation on 45 Jiangyin dialect phrases: Character Error Rate (CER)

Model	WER (%)
Baseline (whisper-small)	0.46
Fine-tuned (Jiangyin Dialect)	0.00

Model Details

Base model: openai/whisper-small
Language: Jiangyin dialect (江阴话) - a Wu Chinese dialect
Task: Automatic Speech Recognition (ASR)
Training data: Custom dataset of Jiangyin dialect recordings
Model size: 244M parameters

Usage

from transformers import WhisperForConditionalGeneration, WhisperProcessor

# Load model and processor
model = WhisperForConditionalGeneration.from_pretrained("jxue/whisper_small_jiangyin")
processor = WhisperProcessor.from_pretrained("jxue/whisper_small_jiangyin")

# Transcribe audio
inputs = processor(audio_array, sampling_rate=16000, return_tensors="pt")
generated_ids = model.generate(inputs.input_features)
transcription = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]

Training Details

Training epochs: 15
Batch size: 1
Learning rate: 5e-6
Training samples: 168
Validation samples: 61
Training infrastructure: AWS SageMaker ml.p3.2xlarge
Training time: 15 mins

Performance

Character Error Rate (CER): 0% on validation set
Validation examples: See validation_report.json in model files

Limitations

Optimized specifically for Jiangyin dialect

Significant improvement observed after fine-tuning on 119 dialect audio samples.

✅ Correct recognition example

REF (参考)	Transliteration (音译)	HYP (预测)
吃什么	切刀样	吃什么
不知道	佛晓得	不知道
素菜	搜菜	素菜
红烧肉	红搜牛	红烧肉
谁啊?小偷	啥人啦?贼骨头	谁啊?小偷
谁啊?老公	啥人啦?老官	谁啊?老公
节约	做人家	节约
闪电	忽显	闪电
下雨	落雨	下雨
丢人	坍台	丢人
泥土	难泥	泥土
好	灵个	好
到处都是	一天世界	到处都是
最后	压末落落	最后
睡觉	困觉	睡觉
小偷	贼骨头	小偷
拿不定主意	疑三惑四	拿不定主意
轻浮	轻骨头	轻浮
明天	明朝	明天
后天	后朝	后天
前天	先夜子	前天
妻子	阿嬷	妻子
这样	实梗	这样
出去	出去	出去
明天见	明朝会	明天见
什么东西	啥个物事	什么东西
什么时候	啥辰光	什么时候
回来	嘎来	回来
老公	老官	老公
十分寒冷	毕结骨	十分寒冷
谁啊	啥人啦	谁啊
男孩	细七煞	男孩
傍晚	夜快头	傍晚
肩膀	肩胛	肩膀
男子	老小家	男子
女子	丫头家	女子
今天吃点什么?	今朝吃点刀样啦?	今天吃点什么?
你这小子,是不是欠捧!	你个细棺材,阿要吃生活!	你这小子,是不是欠捧!
今天吃什么?不知道	今朝切刀样?佛晓得	今天吃什么?不知道
今天吃什么?红烧肉	今朝切刀样?红搜牛	今天吃什么?红烧肉
什么时候出去?明天	啥辰光出去?明朝	什么时候出去?明天
什么时候出去?后天	啥辰光出去?后朝	什么时候出去?后天

Downloads last month: 6

Safetensors

Model size

0.2B params

Tensor type

F32

Model tree for jxue/whisper-small-jiangyin

Base model

openai/whisper-small

Finetuned

(3436)

this model