FLEURS-Only Whisper Small No-Language Checkpoint
Summary
This repository contains a Whisper checkpoint for Chichewa/Nyanja
automatic speech recognition, fine-tuned from openai/whisper-small.
- Experiment type:
fleurs-only - Base model:
openai/whisper-small - Training condition:
no_language - Release artifact: full fine-tuned checkpoint selected from the best training checkpoint
Intended use
This checkpoint is intended for research and evaluation on Chichewa/Nyanja ASR. It is not a production-ready speech system and should be validated carefully before downstream use.
How to use
This repository contains a full fine-tuned checkpoint. It can be loaded directly with Transformers.
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor
model_id = "ai4good-labyrinth/fleurs-only-whisper-tiny-no-language"
processor = AutoProcessor.from_pretrained(model_id)
model = AutoModelForSpeechSeq2Seq.from_pretrained(model_id)
Training data
- Training source:
TBD - Evaluation source during training:
TBD - Train examples before duration filtering:
TBD - Train examples after duration filtering:
TBD - Dev examples before duration filtering:
TBD - Dev examples after duration filtering:
TBD - Duration filter used during training:
min_duration_seconds=TBD,max_duration_seconds=TBD
Training procedure
- Fine-tuning script:
experiments/whisper_finetune/finetune_whisper.py - Base model:
openai/whisper-small - Task:
transcribe - Language hint during training/evaluation: none, corresponding to
--language autoin standalone evaluation - Mixed precision:
TBD - Gradient checkpointing:
TBD - Selected checkpoint step:
1500 - Selected checkpoint epoch:
4.52
Training-time dev selection
The best checkpoint was selected using trainer-side dev evaluation on the duration-filtered FLEURS dev split.
- Dev WER:
0.7683 - Dev CER:
0.2782 - Dev loss:
1.0449
These values come from the training pipeline and may differ slightly from standalone post-hoc evaluation because the decoding path is not perfectly identical.
Evaluation protocol
Standalone evaluation is recommended for the final release. Filtered and unfiltered results should be reported separately.
- Filtered evaluation:
min_duration_seconds=0,max_duration_seconds=30 - Unfiltered evaluation: no duration constraint
- Decoding task:
transcribe - Language hint:
auto
Evaluation summary
| Dataset | Split | Setting | Num examples | WER | CER | Notes |
|---|---|---|---|---|---|---|
| FLEURS | dev | filtered | 305 | 0.8900 | 0.3060 | Filtered to 30 seconds |
| FLEURS | dev | unfiltered | 311 | 0.8789 | 0.3023 | No duration filter |
| FLEURS | test | filtered | 745 | 0.8421 | 0.3248 | Filtered to 30 seconds |
| FLEURS | test | unfiltered | 761 | 0.8343 | 0.3198 | No duration filter |
| Zambezi | dev | filtered | 613 | 0.9802 | 0.3922 | Filtered to 30 seconds |
| Zambezi | dev | unfiltered | 622 | 0.9752 | 0.3882 | No duration filter |
| Zambezi | test | filtered | 427 | 0.8825 | 0.2860 | Filtered to 30 seconds |
| Zambezi | test | unfiltered | 428 | 0.8792 | 0.2842 | No duration filter |
Files in this repository
- Model weights and config: repository root
- Processor/tokenizer files: repository root
- Evaluation JSON files:
eval/...
Known limitations
- Whisper does not provide an official Nyanja/Chichewa language token.
- Users must also comply with the upstream dataset licenses and any upstream model license obligations.
- Standalone evaluation and trainer-side evaluation can differ slightly even on the same split and duration filter.
- Cross-dataset results should be interpreted carefully because transcription conventions may differ across corpora.
Citation
If you use this checkpoint, please cite:
- the Whisper paper
- the FLEURS dataset
- this repository
@misc{fleurs_only_whisper_tiny_no_language_2026,
title = {FLEURS-Only Whisper Small No-Language Checkpoint},
author = {AI4Good Labyrinth Team},
year = {2026},
howpublished = {\url{https://huggingface.co/ai4good-labyrinth/fleurs-only-whisper-tiny-no-language}},
note = {Whisper fine-tuning for Chichewa/Nyanja ASR}
}
- Downloads last month
- 200
Model tree for ai4good-labyrinth/fleurs-only-whisper-tiny-no-language
Base model
openai/whisper-small