modify

#50

by haeylee - opened Sep 24, 2025

base: refs/heads/main

←

from: refs/pr/50

Discussion Files changed

-192

Files changed (12) hide show

README.md +0 -192
wav2vec2/freeze/{02_wav2vec2-large-960h → 02_wav2vec2-large-960-h}/all_results.json +0 -0
wav2vec2/freeze/{02_wav2vec2-large-960h → 02_wav2vec2-large-960-h}/args.json +0 -0
wav2vec2/freeze/{02_wav2vec2-large-960h → 02_wav2vec2-large-960-h}/eval_results.json +0 -0
wav2vec2/freeze/{02_wav2vec2-large-960h → 02_wav2vec2-large-960-h}/finetuned_pytorch_model.bin +0 -0
wav2vec2/freeze/{02_wav2vec2-large-960h → 02_wav2vec2-large-960-h}/model.safetensors +0 -0
wav2vec2/freeze/{02_wav2vec2-large-960h → 02_wav2vec2-large-960-h}/model_weights.pt +0 -0
wav2vec2/freeze/{02_wav2vec2-large-960h → 02_wav2vec2-large-960-h}/preprocessor_config.json +0 -0
wav2vec2/freeze/{02_wav2vec2-large-960h → 02_wav2vec2-large-960-h}/train_results.json +0 -0
wav2vec2/freeze/{02_wav2vec2-large-960h → 02_wav2vec2-large-960-h}/trainer_args.json +0 -0
wav2vec2/freeze/{02_wav2vec2-large-960h → 02_wav2vec2-large-960-h}/trainer_state.json +0 -0
wav2vec2/freeze/{02_wav2vec2-large-960h → 02_wav2vec2-large-960-h}/training_args.bin +0 -0

README.md DELETED Viewed

@@ -1,192 +0,0 @@
----
-base_model:
-- facebook/wav2vec2-large
-- facebook/wav2vec2-large-960h
-- facebook/wav2vec2-large-lv60
-- facebook/wav2vec2-large-xlsr-53
-- facebook/wav2vec2-xls-r-300m
-- facebook/hubert-large-ll60k
-- facebook/hubert-base-ls960
-- facebook/hubert-xlarge-ll60k
-- facebook/hubert-xlarge-ls960-ft
-- microsoft/wavlm-large
-- microsoft/wavlm-base-plus
-- microsoft/wavlm-base-plus-sv
-tags:
-- self-supervised-learning
-- pronunciation-assessment
-- speech
-- wav2vec2
-- hubert
-- wavlm
-- ctc
-- regression
-- feature-extraction
-datasets:
-- openslr/speechocean762
-metrics:
-- pearsonr
----
-# SSL-FT-PRON: Fine-tuned SSL Models for Automatic Pronunciation Assessment (APA)
-A collection of fine-tuned **Self-Supervised Learning (SSL)** speech models (Wav2Vec2.0, HuBERT, WavLM) for **Automatic Pronunciation Assessment (APA)**.
-Three strategies are provided per backbone:
-- **CTC**: ASR-style head trained with CTC
-- **Freeze**: CNN feature extractor frozen; rest is fine-tuned
-- **General**: no CTC head;
-> **Important:** This Hub repository is a *collection*. Each model lives in a **subdirectory**.
-> Load with the full sub-path, e.g. `haeylee/ssl_ft_pron/wav2vec2/general/02_wav2vec2-large-960h`.
----
-## Model Details
-- **Developed by:** Haeyoung Lee (haeylee)
-- **Affiliation (paper):** Seoul National University, SNU Spoken Language Processing Lab
-- **Model type:** SSL speech encoders fine-tuned for APA (CTC / General / Freeze)
-- **Language(s):** English (evaluated on Speechocean762)
-- **Finetuned from:** See `base_model` list above
-### Model Sources
-- **Code:** https://github.com/hy310/ssl_finetuning
-- **Paper:** *Analysis of Various Self-Supervised Learning Models for Automatic Pronunciation Assessment (APSIPA ASC 2024)*
----
-## Uses
-- Research/prototyping for **pronunciation scoring** and **representation analysis** (e.g., PCA on hidden states).
-- Feature extraction for downstream APA tasks.
----
-## Bias, Risks, and Limitations
-- Trained/evaluated on **Speechocean762** (read English by L2 speakers). Generalization to other languages/speaking styles is not guaranteed.
-- APA relies on subjective human scores; apply domain calibration and monitor subgroup performance.
-**Recommendation:** Validate on in-domain data; report uncertainty and subgroup metrics.
----
-## How to Get Started
-### Load a CTC model (with CTC head)
-~~~python
-from transformers import AutoModelForCTC, AutoProcessor
-ckpt = "haeylee/ssl_ft_pron/wav2vec2/ctc/01_wav2vec2-large"
-model = AutoModelForCTC.from_pretrained(ckpt)
-processor = AutoProcessor.from_pretrained(ckpt)
-~~~
-### Load a General / Freeze model (no CTC head)
-~~~python
-from transformers import AutoProcessor, Wav2Vec2Model, HubertModel, WavLMModel
-# Wav2Vec2 (General)
-ckpt = "haeylee/ssl_ft_pron/wav2vec2/general/01_wav2vec2-large"
-model = Wav2Vec2Model.from_pretrained(ckpt)
-processor = AutoProcessor.from_pretrained(ckpt)
-# HuBERT (Freeze)
-# ckpt = "haeylee/ssl_ft_pron/hubert/freeze/06_hubert-large-ll60k"
-# model = HubertModel.from_pretrained(ckpt)
-# processor = AutoProcessor.from_pretrained(ckpt)
-# WavLM (General)
-# ckpt = "haeylee/ssl_ft_pron/wavlm/general/10_wavlm-large"
-# model = WavLMModel.from_pretrained(ckpt)
-# processor = AutoProcessor.from_pretrained(ckpt)
-~~~
-**Summary:**
-- **CTC:** `AutoModelForCTC.from_pretrained(...)`
-- **General/Freeze:** `Wav2Vec2Model` / `HubertModel` / `WavLMModel` `.from_pretrained(...)`
----
-## Training Details
-### Training Data
-- **Dataset:** [Speechocean762](https://openslr.org/101/)
-- **Preprocessing:** We used `preprocess_dataset.py` (see the GitHub repo) to convert raw audio/labels into Hugging Face `datasets` format.
-**Expected processed layout:**
-~~~text
-/your/data/path/speechocean762/
-└── preprocess/
-    ├── speechocean_train_ds/
-    └── speechocean_test_ds/
-~~~
-### Training Procedure
-#### Preprocessing
-~~~bash
-# Adjust paths inside the script or via CLI args
-python preprocess_dataset.py \
-  --data_root /your/data/path/speechocean762 \
-  --out_dir  /your/data/path/speechocean762/preprocess
-~~~
-#### General (no CTC head)
-Loads encoders with `Wav2Vec2Model / HubertModel / WavLMModel .from_pretrained(...)` and trains a regression head to predict 4 APA scores.
-~~~bash
-python train/baseline.py \
-  --model_name facebook/hubert-xlarge-ls960-ft \
-  --batch_size 4 \
-  --learning_rate 1e-5 \
-  --num_train_epochs 30
-~~~
-#### Freeze (feature extractor frozen)
-Same as **General**, but freezes the CNN feature extractor.
-~~~bash
-python train/freeze.py \
-  --model_name facebook/hubert-xlarge-ls960-ft \
-  --freeze_feature_extractor \
-  --batch_size 4 \
-  --learning_rate 1e-5 \
-  --num_train_epochs 30
-~~~
-#### CTC (ASR-style head)
-Uses `AutoModelForCTC.from_pretrained(...)` for CTC training.
-~~~bash
-python train/ctc.py \
-  --model_name facebook/wav2vec2-large \
-  --batch_size 4 \
-  --learning_rate 1e-5 \
-  --num_train_epochs 30
-~~~
-**Artifacts saved:** `model.safetensors`, `trainer_state.json`, `training_args.bin`, logs, and checkpoints (per run: `args.json`, `trainer_args.json`).
----
-## Evaluation
-### Testing Data, Factors & Metrics
-- **Test set:** Speechocean762 (held-out split prepared by `preprocess_dataset.py`)
-- **Factors:** Backbone (Wav2Vec2 / HuBERT / WavLM) × strategy (CTC / General / Freeze)
-- **Metric:** `pearsonr` (Pearson correlation coefficient, PCC) for Accuracy, Fluency, Prosody, and Total.
----
-## Citation
-~~~bibtex
-@inproceedings{lee2024analysis,
-  title={Analysis of Various Self-Supervised Learning Models for Automatic Pronunciation Assessment},
-  author={Lee, Haeyoung and Kim, Sunhee and Chung, Minhwa},
-  booktitle={2024 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)},
-  pages={1--6},
-  year={2024},
-  organization={IEEE}
-}
-~~~
----
-## Authors & Contact
-- **Author:** Haeyoung Lee (haeylee)
-- **Email:** haeylee@snu.ac.kr
-- **Issues/Requests:** https://github.com/hy310/ssl_finetuning

wav2vec2/freeze/{02_wav2vec2-large-960h → 02_wav2vec2-large-960-h}/all_results.json RENAMED Viewed

File without changes

wav2vec2/freeze/{02_wav2vec2-large-960h → 02_wav2vec2-large-960-h}/args.json RENAMED Viewed

File without changes

wav2vec2/freeze/{02_wav2vec2-large-960h → 02_wav2vec2-large-960-h}/eval_results.json RENAMED Viewed

File without changes

wav2vec2/freeze/{02_wav2vec2-large-960h → 02_wav2vec2-large-960-h}/finetuned_pytorch_model.bin RENAMED Viewed

File without changes

wav2vec2/freeze/{02_wav2vec2-large-960h → 02_wav2vec2-large-960-h}/model.safetensors RENAMED Viewed

File without changes

wav2vec2/freeze/{02_wav2vec2-large-960h → 02_wav2vec2-large-960-h}/model_weights.pt RENAMED Viewed

File without changes

wav2vec2/freeze/{02_wav2vec2-large-960h → 02_wav2vec2-large-960-h}/preprocessor_config.json RENAMED Viewed

File without changes

wav2vec2/freeze/{02_wav2vec2-large-960h → 02_wav2vec2-large-960-h}/train_results.json RENAMED Viewed

File without changes

wav2vec2/freeze/{02_wav2vec2-large-960h → 02_wav2vec2-large-960-h}/trainer_args.json RENAMED Viewed

File without changes

wav2vec2/freeze/{02_wav2vec2-large-960h → 02_wav2vec2-large-960-h}/trainer_state.json RENAMED Viewed

File without changes

wav2vec2/freeze/{02_wav2vec2-large-960h → 02_wav2vec2-large-960-h}/training_args.bin RENAMED Viewed

File without changes