Create README.md
Browse files# SSL-FT-PRON: Fine-tuned SSL Models for Automatic Pronunciation Assessment (APA)
**Author:** Haeyoung Lee (haeylee)
**Paper:** *Analysis of Various Self-Supervised Learning Models for Automatic Pronunciation Assessment (APSIPA ASC 2024)*
**Code:** https://github.com/hy310/ssl_finetuning
This repository on the Hub is a **collection of sub-checkpoints** for different SSL backbones (Wav2Vec2.0, HuBERT, WavLM) under three training strategies:
- **CTC**: ASR-style head with Connectionist Temporal Classification
- **Freeze**: feature extractor (CNN frontend) frozen during fine-tuning
- **General**: no CTC head; a small regression head predicts four APA scores
*(Accuracy, Fluency, Prosody, Total)*
> Each variant lives in a **subdirectory**. Load it using the full path
> (e.g., `haeylee/ssl_ft_pron/wav2vec2/general/02_wav2vec2-large-960h`).
---
|
@@ -0,0 +1,21 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
base_model:
|
| 3 |
+
- facebook/wav2vec2-large
|
| 4 |
+
- facebook/wav2vec2-large-960h
|
| 5 |
+
- facebook/wav2vec2-large-lv60
|
| 6 |
+
- facebook/wav2vec2-large-xlsr-53
|
| 7 |
+
- facebook/wav2vec2-xls-r-300m
|
| 8 |
+
- facebook/hubert-large-ll60k
|
| 9 |
+
- facebook/hubert-base-ls960
|
| 10 |
+
- facebook/hubert-xlarge-ll60k
|
| 11 |
+
- facebook/hubert-xlarge-ls960-ft
|
| 12 |
+
- microsoft/wavlm-large
|
| 13 |
+
- microsoft/wavlm-base-plus
|
| 14 |
+
- microsoft/wavlm-base-plus-sv
|
| 15 |
+
tags:
|
| 16 |
+
- self-supervised-learning
|
| 17 |
+
- pronunciation-assessment
|
| 18 |
+
- speech
|
| 19 |
+
metrics:
|
| 20 |
+
- pearsonr
|
| 21 |
+
---
|