File size: 1,481 Bytes
ffae81c aeae119 ffae81c 6b60b2d aeae119 6b60b2d ffae81c aeae119 ffae81c aeae119 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 | ---
library_name: transformers
license: apache-2.0
datasets:
- StofEzz/dataset_c_voice0.2
metrics:
- wer
base_model:
- openai/whisper-tiny
---
## Training Details
#### Training Hyperparameters
```yaml
defaults:
- _self_
dataset:
name: "StofEzz/dataset_c_voice0.2"
audio_sampling_rate: 16000
num_proc_preprocessing: 4
num_proc_dataset_map: 2
train: 80
test: 20
model:
name: "openai/whisper-tiny"
language: "french"
task: "transcribe"
text_preprocessing:
chars_to_ignore_regex: "[\\,\\?\\.\\!\\-\\;\\:\\ğ\\ź\\…\\ø\\ắ\\î\\´\\ŏ\\ę\\ź\\&\\'\\v\\ï\\ū\\ė\\ō\\ń\\ø\\…\\σ\\$\\ă\\ß\\ž\\ṯ\\ý\\ℵ\\đ\\ł\\ś\\ň\\ạ\\=\\_\\»\\ċ\\の\\\"\\ぬ\\ễ\\ż\\ć\\ů\\ʿ\\ș\\ı\\ñ\\(\\ò\\ř\\ä\\–\\ş\\«\\š\\ጠ\\°\\ℤ\\~\\\"\\ī\\ț\\č\\ả\\—\\)\\ā\\/\\½\"]"
training_args:
_target_: transformers.Seq2SeqTrainingArguments
output_dir: ./models
per_device_train_batch_size: 16
gradient_accumulation_steps: 1
learning_rate: 1e-5
warmup_steps: 500
max_steps: 6250
gradient_checkpointing: true
fp16: true
evaluation_strategy: "steps"
per_device_eval_batch_size: 8
predict_with_generate: true
generation_max_length: 225
save_steps: 2000
eval_steps: 100
logging_steps: 25
load_best_model_at_end: true
metric_for_best_model: "wer"
greater_is_better: false
push_to_hub: false
```
#### Metrics
WER: 0.46
<!-- These are the evaluation metrics being used, ideally with a description of why. --> |