w2v2-lmk_augmented / README.md
aconeil's picture
End of training
d588f9c verified
metadata
library_name: transformers
license: apache-2.0
base_model: facebook/wav2vec2-large-xlsr-53
tags:
  - generated_from_trainer
datasets:
  - audiofolder
metrics:
  - wer
model-index:
  - name: w2v2-lmk_augmented
    results:
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: audiofolder
          type: audiofolder
          config: default
          split: test
          args: default
        metrics:
          - name: Wer
            type: wer
            value: 0.4878048780487805

w2v2-lmk_augmented

This model is a fine-tuned version of facebook/wav2vec2-large-xlsr-53 on the audiofolder dataset. It achieves the following results on the evaluation set:

  • Loss: 1.2406
  • Wer: 0.4878
  • Cer: 0.1858

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 16
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 300
  • num_epochs: 100
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer Cer
8.5498 2.7123 100 4.0528 1.0 1.0
3.1716 5.4110 200 2.9634 1.0 1.0
2.9756 8.1096 300 2.8924 1.0 1.0
2.8279 10.8219 400 2.5968 1.0 1.0
2.2866 13.5205 500 1.7827 0.9895 0.6283
1.619 16.2192 600 1.3242 0.9443 0.4021
1.2926 18.9315 700 1.1299 0.7875 0.2833
1.0181 21.6301 800 1.1390 0.6585 0.2513
0.8774 24.3288 900 1.0760 0.6132 0.2338
0.7471 27.0274 1000 0.9959 0.5889 0.2155
0.6542 29.7397 1100 1.0575 0.5575 0.2117
0.5632 32.4384 1200 1.0240 0.5784 0.2171
0.4834 35.1370 1300 1.0971 0.5505 0.1912
0.4716 37.8493 1400 1.1336 0.5749 0.2056
0.45 40.5479 1500 1.0703 0.5679 0.2079
0.394 43.2466 1600 1.1579 0.5645 0.2178
0.3588 45.9589 1700 1.0555 0.5296 0.1896
0.3217 48.6575 1800 1.2323 0.5575 0.2102
0.3245 51.3562 1900 1.1639 0.5401 0.2018
0.289 54.0548 2000 1.1304 0.5122 0.1927
0.28 56.7671 2100 1.2295 0.5296 0.2003
0.2521 59.4658 2200 1.1612 0.5226 0.1950
0.2624 62.1644 2300 1.1982 0.5157 0.2003
0.2402 64.8767 2400 1.2075 0.5296 0.1988
0.2258 67.5753 2500 1.2091 0.5366 0.2003
0.2232 70.2740 2600 1.1830 0.5296 0.1957
0.2181 72.9863 2700 1.2001 0.5157 0.1942
0.2214 75.6849 2800 1.1942 0.5052 0.1889
0.1752 78.3836 2900 1.1873 0.5087 0.1896
0.1891 81.0822 3000 1.2159 0.5192 0.1927
0.1733 83.7945 3100 1.2105 0.5017 0.1881
0.1982 86.4932 3200 1.2331 0.5087 0.1874
0.1681 89.1918 3300 1.1848 0.4808 0.1790
0.1631 91.9041 3400 1.2273 0.4878 0.1858
0.1579 94.6027 3500 1.2334 0.4948 0.1843
0.1795 97.3014 3600 1.2399 0.4878 0.1851
0.1592 100.0 3700 1.2406 0.4878 0.1858

Framework versions

  • Transformers 4.57.1
  • Pytorch 2.8.0+cu128
  • Datasets 3.0.0
  • Tokenizers 0.22.1