Dataset Overview

Introduction We use Meta's FairSpeech dataset to conduct fairness audits of speech recognition models submitted to our leaderboard. This dataset was specifically designed to address fairness gaps across diverse demographic groups. The complete FairSpeech dataset includes 26,471 utterances recorded by 593 individuals across the United States. Participants self-identified their personal information, including age, gender, ethnicity, geographic location, and whether they consider themselves native English speakers. For our leaderboard evaluation, we use a stratified 10% sample from the FairSpeech dataset. We run inference using these test samples to evaluate the fairness of submitted models across different demographic groups. The original dataset spans seven domains: music, capture, utilities, notification control, messaging, calling, and dictation. In response to these domain-specific prompts, participants recorded audio commands such as searching for songs or making plans to meet friends.

Our Testing Strategy: Stratified Sampling For our leaderboard evaluations, we employ stratified sampling to select a representative 10% subset from the FairSpeech dataset. Stratified sampling is critical in Automatic Speech Recognition (ASR) fairness testing to ensure that test sets reflect the overall dataset's distribution. This approach ensures demographic factors, background noise conditions, and linguistic diversity are properly represented, allowing us to evaluate model robustness and fairness across varied populations.

Age Group by Gender Distribution

Socioeconomic Group by Gender Distribution

Ethnicity Distribution

Frequently Used Words