WeNet Conformer — Mongolian (Монгол хэл)

WeNet U2++ Conformer model trained on google/fleurs-mn for Mongolian (Cyrillic) automatic speech recognition.

Model architecture

  • Encoder: Conformer, 12 blocks × 256 dim, 4 heads
  • Decoder: Bi-transformer (U2++), 3 L→R + 3 R→L blocks
  • Tokenizer: char-level (38 Cyrillic tokens)
  • Loss: CTC + Attention hybrid (ctc_weight=0.3, reverse_weight=0.3)

Training data

  • Dataset: google/fleurs-mn
  • Train: 3,074 utterances · ~11.5 h
  • Test: 949 utterances · ~2.85 h
  • Audio: 16 kHz mono

Training results

  • Epochs run: 100
  • Final train loss: N/A
  • Final epoch: 99 — cv_loss N/A, acc N/A
  • Best epoch: 21 — cv_loss N/A, acc N/A
  • TensorBoard: this repo has a TensorBoard tab (see runs/).

Files

File Description
avg_10.pt Best model (averaged top-10 checkpoints by default)
train.yaml Training config
lang_char.txt Character vocabulary (38 tokens)
global_cmvn Feature normalization stats
train.log Full training log
runs/ TensorBoard events

Download model files from this repo, then:

python wenet/bin/recognize.py
--config train.yaml
--checkpoint avg_10.pt
--dict lang_char.txt
--test_data your_data.list
--mode attention_rescoring
--beam_size 10
--result_file result.txt


## Limitations

- Trained on ~11.5 h of FLEURS Mongolian — small-scale; WER/CER will be relatively high on out-of-domain speech.
- Only Cyrillic script supported; Latin characters and digits are stripped.
- No language model rescoring applied.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including Batuka0901/wenet-mn

Evaluation results

  • cv_loss (best epoch) on FLEURS Mongolian
    self-reported
    374.937
  • attention accuracy (best epoch) on FLEURS Mongolian
    self-reported
    0.253
  • CER on 3-example dev set on FLEURS Mongolian
    self-reported
    0.870
  • WER on 3-example dev set on FLEURS Mongolian
    self-reported
    1.000