YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

The ch-w2v-conformer model uses following datasets to pretrain:

ISML datasets (6 languages,70k hours): internal dataset contains 40k hours Chinese, Cantonese, Tibetan, Inner Mongolian, Inner Kazakh, Uighur.

Babel datasets (17 languages, 2k hours): Assamese, Bengali, Cantonese, Cebuano, Georgian, Haitian, Kazakh, Kurmanji, Lao, Pashto, Swahili, Tagalog, Tamil, Tok, Turkish, Vietnamese, Zulu

After pretraining, we build ASR system based on CTC-Attention structure. In very low resource task, we find that if too many initialization network structures are constructed in the upper layer of pre-training conformer encoder, the migration performance of the pre-training model will be destroyed, so we only build a single-layer transformer decoder for joint training.

pretrained model link:

constrained-plus Task Performance

Languages: Cantonese,mongolian,kazakh
config: conf/train_conformer_large_10h.yaml
Feature info: using mfcc feature, with dither 1.0, without cmvn
Training info: lr 0.001, batch size 10, 4 gpus on V100, acc_grad 1, 80 epochs
Decoding info: ctc_weight 0.5, average_num 35

dev set results trained only with 10 hours training set

w2v-Conformer

decoding_method	Cantonese(CER)	mongolian(WER)
ctc_greedy_search	31.46	53.64
ctc_prefix_search	31.47	53.50
attention_rescoring	31.45	52.96

Conformer （train from scartch）

decoding_method	Cantonese(CER)	mongolian(WER)
ctc_greedy_search	61.43	89.38
ctc_prefix_search	61.37	89.53
attention_rescoring	60.61	89.60

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support