marianbasti
/

audio-language-classification

Audio Classification

Model card Files Files and versions

marianbasti commited on Oct 21

Commit

5f5b1b7

·

verified ·

1 Parent(s): 6e0631b

Update README.md

Files changed (1) hide show

README.md +4 -4

README.md CHANGED Viewed

@@ -32,7 +32,7 @@ In short:
 - Dataset used: Mozilla Common Voice 17.0 (streaming)
 - Sample rate: 24 kHz; Max audio length: 10 s (pad/trim)
 - Mixed precision: FP16
-- Best validation accuracy: 0.5016
 Supported languages (labels):
 - en, es, fr, de, it, pt, ru, zh-CN, ja, ar
@@ -48,7 +48,7 @@ Data:
 - Source: Mozilla Common Voice 17.0 (streaming; per-language subset).
 - License: CC-0 (check dataset card for details).
 - Splits: Official validation/test splits used (use_official_splits: true). Parquet branch to handle the large sizes
-- Percent slice per split used during training: 25%.
 Model architecture:
 - Backbone: SNAC encoder (pretrained).
@@ -59,13 +59,13 @@ Model architecture:
   - Linear(256 → 10)
 - Selective tuning:
   - Start frozen (backbone_tune_strategy: "frozen")
-  - Unfreeze strategy at epoch 5: "last_n_blocks" with last_n_blocks: 1
   - Gradient checkpointing enabled for backbone.
 Training setup:
 - Batch size: 48
 - Epochs: up to 100 (early stopping patience: 15)
-- Streaming steps per epoch: 500
 - Optimizer: AdamW (betas: 0.9, 0.999; eps: 1e-8)
 - Learning rate: head 1e-4; backbone 2e-5 (after unfreeze)
 - Scheduler: cosine with warmup (num_warmup_steps: 2000)

 - Dataset used: Mozilla Common Voice 17.0 (streaming)
 - Sample rate: 24 kHz; Max audio length: 10 s (pad/trim)
 - Mixed precision: FP16
+- Best validation accuracy: 0.57
 Supported languages (labels):
 - en, es, fr, de, it, pt, ru, zh-CN, ja, ar
 - Source: Mozilla Common Voice 17.0 (streaming; per-language subset).
 - License: CC-0 (check dataset card for details).
 - Splits: Official validation/test splits used (use_official_splits: true). Parquet branch to handle the large sizes
+- Percent slice per split used during training: 50%.
 Model architecture:
 - Backbone: SNAC encoder (pretrained).
   - Linear(256 → 10)
 - Selective tuning:
   - Start frozen (backbone_tune_strategy: "frozen")
+  - Unfreeze strategy at epoch 2: "last_n_blocks" with last_n_blocks: 1
   - Gradient checkpointing enabled for backbone.
 Training setup:
 - Batch size: 48
 - Epochs: up to 100 (early stopping patience: 15)
+- Streaming steps per epoch: 2000
 - Optimizer: AdamW (betas: 0.9, 0.999; eps: 1e-8)
 - Learning rate: head 1e-4; backbone 2e-5 (after unfreeze)
 - Scheduler: cosine with warmup (num_warmup_steps: 2000)