| | --- |
| | language: |
| | - en |
| | metrics: |
| | - wer |
| | - bleu |
| | - google_bleu |
| | tags: |
| | - ASR |
| | - Error Correction |
| | - Crossmodal |
| | --- |
| | |
| | ### Model Description |
| |
|
| | Pre-Training Settings: |
| |
|
| | 166k samples from Common Voice 13.0 was recognized by Whisper tiny.en. |
| |
|
| | 1,000 random samples was selected as the test set, and the rest for training and validation with an 80%-20% split |
| |
|
| | - Batch size: 256 |
| |
|
| | - Initial learning rate: 1e-5 |
| |
|
| | - Adam optimizer |
| |
|
| | - 30 epochs |
| |
|
| | - Cross-entropy loss |
| |
|
| | - Best checkpoint saved based on WER as the evaluation metric |
| |
|
| | - Decoding is performed using beam search with a size of 5 |
| |
|
| | - S2S backbone model adopted from ''[Exploring data augmentation for code generation tasks](https://aclanthology.org/2023.findings-eacl.114/)''. |
| |
|
| | Continue-Training Setting: |
| |
|
| | - 2 epochs for gold-gold to prevent the over-correction problem on ''[Ted talk data](https://cris.fbk.eu/bitstream/11582/104409/1/WIT3-EAMT2012.pdf)'' |
| |
|