DewiBrynJones commited on
Commit
78e4da6
·
verified ·
1 Parent(s): 358bfc0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +70 -99
README.md CHANGED
@@ -1,113 +1,84 @@
1
  ---
2
  license: apache-2.0
3
- base_model: DewiBrynJones/wav2vec2-xlsr-53-ft-btb-cv-cy
 
4
  tags:
5
  - automatic-speech-recognition
6
- - ./data-configs/cv.json
7
  - generated_from_trainer
8
  metrics:
9
  - wer
10
  model-index:
11
  - name: wav2vec2-btb-cv-ft-cv-cy
12
  results: []
 
 
 
 
 
13
  ---
14
 
15
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
16
- should probably proofread and complete it, then remove this comment. -->
17
-
18
  # wav2vec2-btb-cv-ft-cv-cy
19
 
20
- This model is a fine-tuned version of [DewiBrynJones/wav2vec2-xlsr-53-ft-btb-cv-cy](https://huggingface.co/DewiBrynJones/wav2vec2-xlsr-53-ft-btb-cv-cy) on an unknown dataset.
21
- It achieves the following results on the evaluation set:
22
- - Loss: 0.2516
23
- - Wer: 0.2403
24
-
25
- ## Model description
26
-
27
- More information needed
28
-
29
- ## Intended uses & limitations
30
-
31
- More information needed
32
-
33
- ## Training and evaluation data
34
-
35
- More information needed
36
-
37
- ## Training procedure
38
-
39
- ### Training hyperparameters
40
-
41
- The following hyperparameters were used during training:
42
- - learning_rate: 0.0003
43
- - train_batch_size: 4
44
- - eval_batch_size: 64
45
- - seed: 42
46
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
47
- - lr_scheduler_type: linear
48
- - lr_scheduler_warmup_steps: 1000
49
- - training_steps: 10000
50
- - mixed_precision_training: Native AMP
51
-
52
- ### Training results
53
-
54
- | Training Loss | Epoch | Step | Validation Loss | Wer |
55
- |:-------------:|:------:|:-----:|:---------------:|:------:|
56
- | No log | 0.1004 | 200 | 0.3807 | 0.2514 |
57
- | No log | 0.2008 | 400 | 0.2540 | 0.2643 |
58
- | 2.4874 | 0.3012 | 600 | 0.2642 | 0.3038 |
59
- | 2.4874 | 0.4016 | 800 | 0.3125 | 0.3905 |
60
- | 0.3991 | 0.5020 | 1000 | 0.3531 | 0.3939 |
61
- | 0.3991 | 0.6024 | 1200 | 0.3572 | 0.4039 |
62
- | 0.3991 | 0.7028 | 1400 | 0.3679 | 0.4053 |
63
- | 0.4512 | 0.8032 | 1600 | 0.3590 | 0.3877 |
64
- | 0.4512 | 0.9036 | 1800 | 0.3733 | 0.4007 |
65
- | 0.4333 | 1.0040 | 2000 | 0.3771 | 0.4243 |
66
- | 0.4333 | 1.1044 | 2200 | 0.3604 | 0.3867 |
67
- | 0.4333 | 1.2048 | 2400 | 0.3431 | 0.3814 |
68
- | 0.3468 | 1.3052 | 2600 | 0.3290 | 0.3779 |
69
- | 0.3468 | 1.4056 | 2800 | 0.3341 | 0.3647 |
70
- | 0.3503 | 1.5060 | 3000 | 0.3248 | 0.3615 |
71
- | 0.3503 | 1.6064 | 3200 | 0.3312 | 0.3551 |
72
- | 0.3503 | 1.7068 | 3400 | 0.3411 | 0.3836 |
73
- | 0.3418 | 1.8072 | 3600 | 0.3117 | 0.3375 |
74
- | 0.3418 | 1.9076 | 3800 | 0.3197 | 0.3432 |
75
- | 0.3181 | 2.0080 | 4000 | 0.3068 | 0.3340 |
76
- | 0.3181 | 2.1084 | 4200 | 0.3138 | 0.3358 |
77
- | 0.3181 | 2.2088 | 4400 | 0.3139 | 0.3334 |
78
- | 0.2423 | 2.3092 | 4600 | 0.3192 | 0.3285 |
79
- | 0.2423 | 2.4096 | 4800 | 0.2929 | 0.3168 |
80
- | 0.2327 | 2.5100 | 5000 | 0.2921 | 0.3103 |
81
- | 0.2327 | 2.6104 | 5200 | 0.2802 | 0.3037 |
82
- | 0.2327 | 2.7108 | 5400 | 0.2812 | 0.2962 |
83
- | 0.2374 | 2.8112 | 5600 | 0.2887 | 0.3042 |
84
- | 0.2374 | 2.9116 | 5800 | 0.2740 | 0.2927 |
85
- | 0.2136 | 3.0120 | 6000 | 0.2662 | 0.2830 |
86
- | 0.2136 | 3.1124 | 6200 | 0.2829 | 0.2890 |
87
- | 0.2136 | 3.2129 | 6400 | 0.2729 | 0.2869 |
88
- | 0.167 | 3.3133 | 6600 | 0.2777 | 0.2889 |
89
- | 0.167 | 3.4137 | 6800 | 0.2712 | 0.2810 |
90
- | 0.1614 | 3.5141 | 7000 | 0.2688 | 0.2709 |
91
- | 0.1614 | 3.6145 | 7200 | 0.2589 | 0.2663 |
92
- | 0.1614 | 3.7149 | 7400 | 0.2651 | 0.2670 |
93
- | 0.1529 | 3.8153 | 7600 | 0.2507 | 0.2637 |
94
- | 0.1529 | 3.9157 | 7800 | 0.2494 | 0.2568 |
95
- | 0.1496 | 4.0161 | 8000 | 0.2582 | 0.2580 |
96
- | 0.1496 | 4.1165 | 8200 | 0.2650 | 0.2575 |
97
- | 0.1496 | 4.2169 | 8400 | 0.2656 | 0.2560 |
98
- | 0.1128 | 4.3173 | 8600 | 0.2543 | 0.2512 |
99
- | 0.1128 | 4.4177 | 8800 | 0.2587 | 0.2499 |
100
- | 0.1109 | 4.5181 | 9000 | 0.2540 | 0.2460 |
101
- | 0.1109 | 4.6185 | 9200 | 0.2546 | 0.2425 |
102
- | 0.1109 | 4.7189 | 9400 | 0.2580 | 0.2420 |
103
- | 0.1028 | 4.8193 | 9600 | 0.2514 | 0.2404 |
104
- | 0.1028 | 4.9197 | 9800 | 0.2510 | 0.2403 |
105
- | 0.1069 | 5.0201 | 10000 | 0.2516 | 0.2403 |
106
-
107
-
108
- ### Framework versions
109
-
110
- - Transformers 4.44.0
111
- - Pytorch 2.4.0+cu121
112
- - Datasets 2.21.0
113
- - Tokenizers 0.19.1
 
1
  ---
2
  license: apache-2.0
3
+ base_model:
4
+ - techiaith/wav2vec2-xlsr-53-ft-btb-cv-cy
5
  tags:
6
  - automatic-speech-recognition
 
7
  - generated_from_trainer
8
  metrics:
9
  - wer
10
  model-index:
11
  - name: wav2vec2-btb-cv-ft-cv-cy
12
  results: []
13
+ datasets:
14
+ - techiaith/commonvoice_18_0_cy
15
+ language:
16
+ - cy
17
+ pipeline_tag: automatic-speech-recognition
18
  ---
19
 
 
 
 
20
  # wav2vec2-btb-cv-ft-cv-cy
21
 
22
+ This model is a version of [techiaith/wav2vec2-xlsr-53-ft-btb-cv-cy](https://huggingface.co/DewiBrynJones/wav2vec2-xlsr-53-ft-btb-cv-cy)
23
+ fine-tuned with its encoder frozen and the training set [commonvoice_cy_18](https://huggingface.co/datasets/techiaith/commonvoice_18_0_cy)
24
+
25
+ It achieves the following results on the Welsh Common Voice version 18 standard test set:
26
+
27
+ - WER: 24.93
28
+ - CER: 6.55
29
+
30
+ However, when the accompanying KenLM language model is used, it achieves the following results on the same test set:
31
+
32
+ - WER: 15.30
33
+ - CER: 4.57
34
+
35
+ ## Usage
36
+
37
+ ### wav2vec2 acoustic model only...
38
+
39
+ ```python
40
+ import torch
41
+ import torchaudio
42
+ import librosa
43
+
44
+ from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
45
+
46
+ processor = Wav2Vec2Processor.from_pretrained("techiaith/wav2vec2-xlsr-ft-cy")
47
+ model = Wav2Vec2ForCTC.from_pretrained("techiaith/wav2vec2-xlsr-ft-cy")
48
+
49
+ audio, rate = librosa.load(audio_file, sr=16000)
50
+
51
+ inputs = processor(audio, sampling_rate=16_000, return_tensors="pt", padding=True)
52
+
53
+ with torch.no_grad():
54
+ logits = model(inputs.input_values, attention_mask=inputs.attention_mask).logits
55
+
56
+ # greedy decoding
57
+ predicted_ids = torch.argmax(logits, dim=-1)
58
+
59
+ print("Prediction:", processor.batch_decode(predicted_ids))
60
+ ```
61
+
62
+
63
+ ### with language model...
64
+
65
+ ```python
66
+ import torch
67
+ import torchaudio
68
+ import librosa
69
+
70
+ from transformers import Wav2Vec2ForCTC, Wav2Vec2ProcessorWithLM
71
+
72
+ processor = Wav2Vec2ProcessorWithLM.from_pretrained("techiaith/wav2vec2-xlsr-ft-cy")
73
+ model = Wav2Vec2ForCTC.from_pretrained("techiaith/wav2vec2-xlsr-ft-cy")
74
+
75
+ audio, rate = librosa.load(audio_file, sr=16000)
76
+
77
+ inputs = processor(audio, sampling_rate=16_000, return_tensors="pt", padding=True)
78
+
79
+ with torch.no_grad():
80
+ logits = model(inputs.input_values, attention_mask=inputs.attention_mask).logits
81
+
82
+ # ctc decoding
83
+ print("Prediction:", processor.batch_decode(logits.numpy()).text[0])
84
+ ```