haeylee
/

ssl_ft_pron

@@ -30,12 +30,12 @@ metrics:
 # SSL-FT-PRON: Fine-tuned SSL Models for Automatic Pronunciation Assessment (APA)
-A collection of fine-tuned **Self-Supervised Learning (SSL)** speech checkpoints (Wav2Vec2.0, HuBERT, WavLM) for **Automatic Pronunciation Assessment (APA)**.
 Three strategies are provided per backbone:
 - **CTC**: ASR-style head trained with CTC
 - **Freeze**: CNN feature extractor frozen; rest is fine-tuned
-- **General**: no CTC head; a lightweight regression head predicts four APA scores (Accuracy, Fluency, Prosody, Total)
 > **Important:** This Hub repository is a *collection*. Each model lives in a **subdirectory**.
 > Load with the full sub-path, e.g. `haeylee/ssl_ft_pron/wav2vec2/general/02_wav2vec2-large-960h`.
@@ -46,9 +46,8 @@ Three strategies are provided per backbone:
 - **Developed by:** Haeyoung Lee (haeylee)
 - **Affiliation (paper):** Seoul National University, SNU Spoken Language Processing Lab
-- **Model type:** SSL speech encoders fine-tuned for APA (CTC / General / Freeze variants)
 - **Language(s):** English (evaluated on Speechocean762)
-- **License:** *TBD by author*
 - **Finetuned from:** See `base_model` list above
 ### Model Sources
@@ -57,27 +56,15 @@ Three strategies are provided per backbone:
 ---
-## Uses
-### Direct Use
 - Research and prototyping for **pronunciation scoring** and **feature analysis** on read English speech.
 - As encoders for downstream APA tasks, analytics, or visualization (e.g., PCA of hidden states).
-### Downstream Use
-- Integrate APA scores into CALL (Computer-Assisted Language Learning) or assessment tools.
-- Use CTC variants for ASR-aligned pipelines; use General/Freeze variants for score regression.
-### Out-of-Scope Use
-- Non-English targets without adaptation.
-- High-stakes assessment without proper validation, calibration, and fairness checks.
 ---
 ## Bias, Risks, and Limitations
 - Trained/evaluated on **Speechocean762** (read English speech by L2 speakers). May not generalize to spontaneous speech, other accents/languages, or noisy conditions.
 - APA involves subjective human judgments; ensure careful calibration and validation on your domain.
-- Consider privacy/consent when handling speech data.
 **Recommendation:** Validate on in-domain data and monitor subgroup performance.
@@ -92,4 +79,36 @@ from transformers import AutoModelForCTC, AutoProcessor
 ckpt = "haeylee/ssl_ft_pron/wav2vec2/ctc/01_wav2vec2-large"  # pick your subdir
 model = AutoModelForCTC.from_pretrained(ckpt)
 processor = AutoProcessor.from_pretrained(ckpt)

 # SSL-FT-PRON: Fine-tuned SSL Models for Automatic Pronunciation Assessment (APA)
+A collection of fine-tuned **Self-Supervised Learning (SSL)** speech models (Wav2Vec2.0, HuBERT, WavLM) for **Automatic Pronunciation Assessment (APA)**.
 Three strategies are provided per backbone:
 - **CTC**: ASR-style head trained with CTC
 - **Freeze**: CNN feature extractor frozen; rest is fine-tuned
+- **General**: no CTC head;
 > **Important:** This Hub repository is a *collection*. Each model lives in a **subdirectory**.
 > Load with the full sub-path, e.g. `haeylee/ssl_ft_pron/wav2vec2/general/02_wav2vec2-large-960h`.
 - **Developed by:** Haeyoung Lee (haeylee)
 - **Affiliation (paper):** Seoul National University, SNU Spoken Language Processing Lab
+- **Model type:** SSL speech encoders fine-tuned for APA (CTC / General / Freeze)
 - **Language(s):** English (evaluated on Speechocean762)
 - **Finetuned from:** See `base_model` list above
 ### Model Sources
 ---
+### Use
 - Research and prototyping for **pronunciation scoring** and **feature analysis** on read English speech.
 - As encoders for downstream APA tasks, analytics, or visualization (e.g., PCA of hidden states).
 ---
 ## Bias, Risks, and Limitations
 - Trained/evaluated on **Speechocean762** (read English speech by L2 speakers). May not generalize to spontaneous speech, other accents/languages, or noisy conditions.
 - APA involves subjective human judgments; ensure careful calibration and validation on your domain.
 **Recommendation:** Validate on in-domain data and monitor subgroup performance.
 ckpt = "haeylee/ssl_ft_pron/wav2vec2/ctc/01_wav2vec2-large"  # pick your subdir
 model = AutoModelForCTC.from_pretrained(ckpt)
 processor = AutoProcessor.from_pretrained(ckpt)
+```
+### B) General / Freeze models (no CTC head)
+```python
+from transformers import AutoProcessor, Wav2Vec2Model, HubertModel, WavLMModel
+# Wav2Vec2 example (General)
+ckpt = "haeylee/ssl_ft_pron/wav2vec2/general/01_wav2vec2-large"
+model = Wav2Vec2Model.from_pretrained(ckpt)
+processor = AutoProcessor.from_pretrained(ckpt)
+# HuBERT example (Freeze)
+# ckpt = "haeylee/ssl_ft_pron/hubert/freeze/06_hubert-large-ll60k"
+# model = HubertModel.from_pretrained(ckpt)
+# processor = AutoProcessor.from_pretrained(ckpt)
+# WavLM example (General)
+# ckpt = "haeylee/ssl_ft_pron/wavlm/general/10_wavlm-large"
+# model = WavLMModel.from_pretrained(ckpt)
+# processor = AutoProcessor.from_pretrained(ckpt)
+```
+### Summary:
+CTC: AutoModelForCTC.from_pretrained(...)
+General/Freeze: Wav2Vec2Model / HubertModel / WavLMModel .from_pretrained(...)
+## Training Details
+### Training Data
+- **Dataset:** [Speechocean762](https://openslr.org/101/)
+- **Preprocessing:** Use `preprocess_dataset.py` (in the repo) to convert raw audio/labels into Hugging Face `datasets` format.
+Expected processed layout: