AImpower
/

StutteredSpeechASR

Automatic Speech Recognition

Model card Files Files and versions

kexinf1 commited on Sep 29, 2025

Commit

db8340d

·

verified ·

1 Parent(s): 2f2d8cc

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -14,7 +14,7 @@ This model is a version of OpenAI's `whisper-large-v2` fine-tuned on the **AImpo
 This model is specifically adapted to provide more accurate and authentic transcriptions for Mandarin-speaking PWS.
 Standard Automatic Speech Recognition (ASR) models often exhibit "fluency bias," where they "smoothen" out or delete stuttered speech patterns like repetitions and interjections.
-This model was fine-tuned on **literal transcriptions** that intentionally preserve these disfluencies.
 The primary goal is to create a more inclusive ASR system that recognizes and respects the natural speech patterns of PWS, reducing deletion errors and improving overall accuracy.
@@ -29,7 +29,7 @@ This model is intended for transcribing conversational Mandarin Chinese speech f
 ### Limitations
-* **Language Specificity:** The model is trained exclusively on Mandarin Chinese and is not intended for other languages.
 * **Data Specificity:** Performance is optimized for speech patterns present in the AImpower/MandarinStutteredSpeech dataset. It may not perform as well on other types of atypical speech or in environments with significant background noise.
 * **Variability:** Stuttering is highly variable. While the model shows significant improvements across severity levels, accuracy may still vary between individuals and contexts.
@@ -86,7 +86,7 @@ This dataset was created through a community-led, grassroots effort with Stammer
       * **Optimizer:** AdamW
       * **Batch Size:** 16
       * **Fine-tuning Method:** AdaLora
 -----
 ## Evaluation Results

 This model is specifically adapted to provide more accurate and authentic transcriptions for Mandarin-speaking PWS.
 Standard Automatic Speech Recognition (ASR) models often exhibit "fluency bias," where they "smoothen" out or delete stuttered speech patterns like repetitions and interjections.
+This model was fine-tuned on literal transcriptions that intentionally preserve these disfluencies.
 The primary goal is to create a more inclusive ASR system that recognizes and respects the natural speech patterns of PWS, reducing deletion errors and improving overall accuracy.
 ### Limitations
+* **Language Specificity:** The model is fine-tuned exclusively on Mandarin Chinese and is not intended for other languages.
 * **Data Specificity:** Performance is optimized for speech patterns present in the AImpower/MandarinStutteredSpeech dataset. It may not perform as well on other types of atypical speech or in environments with significant background noise.
 * **Variability:** Stuttering is highly variable. While the model shows significant improvements across severity levels, accuracy may still vary between individuals and contexts.
       * **Optimizer:** AdamW
       * **Batch Size:** 16
       * **Fine-tuning Method:** AdaLora
+  * **GPU:** Four NVIDIA A100 80G GPUs
 -----
 ## Evaluation Results