marianbasti commited on
Commit
5f5b1b7
·
verified ·
1 Parent(s): 6e0631b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -32,7 +32,7 @@ In short:
32
  - Dataset used: Mozilla Common Voice 17.0 (streaming)
33
  - Sample rate: 24 kHz; Max audio length: 10 s (pad/trim)
34
  - Mixed precision: FP16
35
- - Best validation accuracy: 0.5016
36
 
37
  Supported languages (labels):
38
  - en, es, fr, de, it, pt, ru, zh-CN, ja, ar
@@ -48,7 +48,7 @@ Data:
48
  - Source: Mozilla Common Voice 17.0 (streaming; per-language subset).
49
  - License: CC-0 (check dataset card for details).
50
  - Splits: Official validation/test splits used (use_official_splits: true). Parquet branch to handle the large sizes
51
- - Percent slice per split used during training: 25%.
52
 
53
  Model architecture:
54
  - Backbone: SNAC encoder (pretrained).
@@ -59,13 +59,13 @@ Model architecture:
59
  - Linear(256 → 10)
60
  - Selective tuning:
61
  - Start frozen (backbone_tune_strategy: "frozen")
62
- - Unfreeze strategy at epoch 5: "last_n_blocks" with last_n_blocks: 1
63
  - Gradient checkpointing enabled for backbone.
64
 
65
  Training setup:
66
  - Batch size: 48
67
  - Epochs: up to 100 (early stopping patience: 15)
68
- - Streaming steps per epoch: 500
69
  - Optimizer: AdamW (betas: 0.9, 0.999; eps: 1e-8)
70
  - Learning rate: head 1e-4; backbone 2e-5 (after unfreeze)
71
  - Scheduler: cosine with warmup (num_warmup_steps: 2000)
 
32
  - Dataset used: Mozilla Common Voice 17.0 (streaming)
33
  - Sample rate: 24 kHz; Max audio length: 10 s (pad/trim)
34
  - Mixed precision: FP16
35
+ - Best validation accuracy: 0.57
36
 
37
  Supported languages (labels):
38
  - en, es, fr, de, it, pt, ru, zh-CN, ja, ar
 
48
  - Source: Mozilla Common Voice 17.0 (streaming; per-language subset).
49
  - License: CC-0 (check dataset card for details).
50
  - Splits: Official validation/test splits used (use_official_splits: true). Parquet branch to handle the large sizes
51
+ - Percent slice per split used during training: 50%.
52
 
53
  Model architecture:
54
  - Backbone: SNAC encoder (pretrained).
 
59
  - Linear(256 → 10)
60
  - Selective tuning:
61
  - Start frozen (backbone_tune_strategy: "frozen")
62
+ - Unfreeze strategy at epoch 2: "last_n_blocks" with last_n_blocks: 1
63
  - Gradient checkpointing enabled for backbone.
64
 
65
  Training setup:
66
  - Batch size: 48
67
  - Epochs: up to 100 (early stopping patience: 15)
68
+ - Streaming steps per epoch: 2000
69
  - Optimizer: AdamW (betas: 0.9, 0.999; eps: 1e-8)
70
  - Learning rate: head 1e-4; backbone 2e-5 (after unfreeze)
71
  - Scheduler: cosine with warmup (num_warmup_steps: 2000)