Post
347
š The WAVe paper is officially out in the Information Sciences Journal.
You saw the PT and NL model releases earlier this year. This is the peer-reviewed paper behind them, with the full method, ablations, and downstream ASR evaluation.
Quick recap: WAVe is a 1B multimodal embedding model that filters synthetic speech at the word level, not the sentence level. On Portuguese ASR it cuts training steps by 34%, improves cross-domain generalization by 50%, and matches WER with 30% less synthetic data.
š¦ Resources
- Paper: https://www.sciencedirect.com/science/article/pii/S0020025526005220
- PT model: yuriyvnv/WAVe-1B-Multimodal-PT
- NL model: yuriyvnv/WAVe-1B-Multimodal-NL
- Collection: https://huggingface.co/collections/yuriyvnv/multi-modal-embeddings-for-synthetic-transcript-filtering
- Code: https://github.com/yuriyvnv/WAVe
If you train ASR on synthetic or back-translated data, would like to see WAVe benchmarked on other languages.
@reach-vb @ylacombe @hf-audio @BramVanroy
#speech #asr #multimodal #syntheticdata #lowresource
You saw the PT and NL model releases earlier this year. This is the peer-reviewed paper behind them, with the full method, ablations, and downstream ASR evaluation.
Quick recap: WAVe is a 1B multimodal embedding model that filters synthetic speech at the word level, not the sentence level. On Portuguese ASR it cuts training steps by 34%, improves cross-domain generalization by 50%, and matches WER with 30% less synthetic data.
š¦ Resources
- Paper: https://www.sciencedirect.com/science/article/pii/S0020025526005220
- PT model: yuriyvnv/WAVe-1B-Multimodal-PT
- NL model: yuriyvnv/WAVe-1B-Multimodal-NL
- Collection: https://huggingface.co/collections/yuriyvnv/multi-modal-embeddings-for-synthetic-transcript-filtering
- Code: https://github.com/yuriyvnv/WAVe
If you train ASR on synthetic or back-translated data, would like to see WAVe benchmarked on other languages.
@reach-vb @ylacombe @hf-audio @BramVanroy
#speech #asr #multimodal #syntheticdata #lowresource