ClemSummer/welsh-transcription-samples-7k
Viewer • Updated • 7.81k • 10 • 1
This is a whisper-tiny model based on techiaith/whisper-tiny-ft-cy-en, fine tuned using the ACFT method, for use on Android phones.
This model can be loaded into the FUTO Keyboard, and most likely other similar keyboards (Heliboard, Florisboard, AnySoftKeyboard, possibly even Swiftkey). More info on this can be found here.
To use this model with FUTO keyboard:
Trained/evaluated using welsh-transcription-samples, a subset of Mozilla's Common Voice CY dataset. Smaller and more useful for poor-man's training without a GPU. Training on the full Mozilla Common Voice corpus may provide better results.
# Training hyperparameters
LEARNING_RATE = 1e-6
NUM_EPOCHS = 8
# (Note: only recordings < 29.0s were used)
| Dataset | WER | CER |
|---|---|---|
| CommonVoice CY (ClemSummer, validation split) | 62.99 | 21.46 |
| techiaith/banc-trawsgrifiadau-bangor | TODO: no access | TODO: no access |
Thanks to techiaith and ClemSummer for their prior work. Diolch
Base model
openai/whisper-tiny