File size: 1,481 Bytes
ffae81c
 
aeae119
 
 
 
 
 
 
ffae81c
 
 
 
 
6b60b2d
aeae119
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6b60b2d
ffae81c
 
 
aeae119
ffae81c
aeae119
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
---
library_name: transformers
license: apache-2.0
datasets:
- StofEzz/dataset_c_voice0.2
metrics:
- wer
base_model:
- openai/whisper-tiny
---

## Training Details

#### Training Hyperparameters
```yaml
defaults:
  - _self_
dataset:
  name: "StofEzz/dataset_c_voice0.2"
  audio_sampling_rate: 16000
  num_proc_preprocessing: 4
  num_proc_dataset_map: 2
  train: 80
  test: 20

model:
  name: "openai/whisper-tiny"
  language: "french"
  task: "transcribe"

text_preprocessing:
  chars_to_ignore_regex: "[\\,\\?\\.\\!\\-\\;\\:\\ğ\\ź\\…\\ø\\ắ\\î\\´\\ŏ\\ę\\ź\\&\\'\\v\\ï\\ū\\ė\\ō\\ń\\ø\\…\\σ\\$\\ă\\ß\\ž\\ṯ\\ý\\ℵ\\đ\\ł\\ś\\ň\\ạ\\=\\_\\»\\ċ\\の\\\"\\ぬ\\ễ\\ż\\ć\\ů\\ʿ\\ș\\ı\\ñ\\(\\ò\\ř\\ä\\–\\ş\\«\\š\\ጠ\\°\\ℤ\\~\\\"\\ī\\ț\\č\\ả\\—\\)\\ā\\/\\½\"]"

training_args:
  _target_: transformers.Seq2SeqTrainingArguments
  output_dir: ./models
  per_device_train_batch_size: 16
  gradient_accumulation_steps: 1
  learning_rate: 1e-5
  warmup_steps: 500
  max_steps: 6250
  gradient_checkpointing: true
  fp16: true
  evaluation_strategy: "steps"
  per_device_eval_batch_size: 8
  predict_with_generate: true
  generation_max_length: 225
  save_steps: 2000
  eval_steps: 100
  logging_steps: 25
  load_best_model_at_end: true
  metric_for_best_model: "wer"
  greater_is_better: false
  push_to_hub: false
```

#### Metrics

WER: 0.46

<!-- These are the evaluation metrics being used, ideally with a description of why. -->