Whisper Small Jiangyin
This is a fine-tuned version of openai/whisper-small for Jiangyin dialect (江阴话) automatic speech recognition from WuSutra.com.
Wusutra.com is a dialect crowdsourcing website which implements the entire ML workflow — including audio upload, model training, validation, and inference.
You can upload your own audios and even trigger the training yourself on wusutra.com. If you have further questions, feel free to message me.
⚡ Looking for a smaller and faster option?
Please use the LoRA adapter version.
It provides the same Jiangyin dialect fine-tuning with much smaller storage.
📊 Evaluation on 45 Jiangyin dialect phrases: Character Error Rate (CER)
| Model | WER (%) |
|---|---|
| Baseline (whisper-small) | 0.46 |
| Fine-tuned (Jiangyin Dialect) | 0.00 |
Model Details
- Base model: openai/whisper-small
- Language: Jiangyin dialect (江阴话) - a Wu Chinese dialect
- Task: Automatic Speech Recognition (ASR)
- Training data: Custom dataset of Jiangyin dialect recordings
- Model size: 244M parameters
Usage
from transformers import WhisperForConditionalGeneration, WhisperProcessor
# Load model and processor
model = WhisperForConditionalGeneration.from_pretrained("jxue/whisper_small_jiangyin")
processor = WhisperProcessor.from_pretrained("jxue/whisper_small_jiangyin")
# Transcribe audio
inputs = processor(audio_array, sampling_rate=16000, return_tensors="pt")
generated_ids = model.generate(inputs.input_features)
transcription = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
Training Details
- Training epochs: 15
- Batch size: 1
- Learning rate: 5e-6
- Training samples: 168
- Validation samples: 61
- Training infrastructure: AWS SageMaker ml.p3.2xlarge
- Training time: 15 mins
Performance
- Character Error Rate (CER): 0% on validation set
- Validation examples: See validation_report.json in model files
Limitations
- Optimized specifically for Jiangyin dialect
Significant improvement observed after fine-tuning on 119 dialect audio samples.
✅ Correct recognition example
| REF (参考) | Transliteration (音译) | HYP (预测) | CER |
|---|---|---|---|
| 吃什么 | 切刀样 | 吃什么 | 0.000 |
| 不知道 | 佛晓得 | 不知道 | 0.000 |
| 素菜 | 搜菜 | 素菜 | 0.000 |
| 红烧肉 | 红搜牛 | 红烧肉 | 0.000 |
| 谁啊?小偷 | 啥人啦?贼骨头 | 谁啊?小偷 | 0.000 |
| 谁啊?老公 | 啥人啦?老官 | 谁啊?老公 | 0.000 |
| 节约 | 做人家 | 节约 | 0.000 |
| 闪电 | 忽显 | 闪电 | 0.000 |
| 下雨 | 落雨 | 下雨 | 0.000 |
| 丢人 | 坍台 | 丢人 | 0.000 |
| 泥土 | 难泥 | 泥土 | 0.000 |
| 好 | 灵个 | 好 | 0.000 |
| 到处都是 | 一天世界 | 到处都是 | 0.000 |
| 最后 | 压末落落 | 最后 | 0.000 |
| 睡觉 | 困觉 | 睡觉 | 0.000 |
| 小偷 | 贼骨头 | 小偷 | 0.000 |
| 拿不定主意 | 疑三惑四 | 拿不定主意 | 0.000 |
| 轻浮 | 轻骨头 | 轻浮 | 0.000 |
| 明天 | 明朝 | 明天 | 0.000 |
| 后天 | 后朝 | 后天 | 0.000 |
| 前天 | 先夜子 | 前天 | 0.000 |
| 妻子 | 阿嬷 | 妻子 | 0.000 |
| 这样 | 实梗 | 这样 | 0.000 |
| 出去 | 出去 | 出去 | 0.000 |
| 明天见 | 明朝会 | 明天见 | 0.000 |
| 什么东西 | 啥个物事 | 什么东西 | 0.000 |
| 什么时候 | 啥辰光 | 什么时候 | 0.000 |
| 回来 | 嘎来 | 回来 | 0.000 |
| 老公 | 老官 | 老公 | 0.000 |
| 十分寒冷 | 毕结骨 | 十分寒冷 | 0.000 |
| 谁啊 | 啥人啦 | 谁啊 | 0.000 |
| 男孩 | 细七煞 | 男孩 | 0.000 |
| 傍晚 | 夜快头 | 傍晚 | 0.000 |
| 肩膀 | 肩胛 | 肩膀 | 0.000 |
| 男子 | 老小家 | 男子 | 0.000 |
| 女子 | 丫头家 | 女子 | 0.000 |
| 今天吃点什么? | 今朝吃点刀样啦? | 今天吃点什么? | 0.000 |
| 你这小子,是不是欠捧! | 你个细棺材,阿要吃生活! | 你这小子,是不是欠捧! | 0.000 |
| 今天吃什么?不知道 | 今朝切刀样?佛晓得 | 今天吃什么?不知道 | 0.000 |
| 今天吃什么?红烧肉 | 今朝切刀样?红搜牛 | 今天吃什么?红烧肉 | 0.000 |
| 什么时候出去?明天 | 啥辰光出去?明朝 | 什么时候出去?明天 | 0.000 |
| 什么时候出去?后天 | 啥辰光出去?后朝 | 什么时候出去?后天 | 0.000 |
- Downloads last month
- 6
Model tree for jxue/whisper-small-jiangyin
Base model
openai/whisper-small