Separate training data by country

#117

by wponhf - opened Sep 26, 2022

Sep 26, 2022

•

edited Sep 26, 2022

Greetings, I sent a question to bigscience-contact@googlegroups.com, but have not received a response. If I am asking this question in the wrong forum, I apologize. Are there any resources available to to understand how to isolate or categorize the English-sourced training data according to its country of origin? Thanks.

yjernite

BigScience Workshop org Sep 26, 2022

•

edited Sep 26, 2022

You can find this information (when available) in the data card deck available here, under Speaker Locations:
Data Cards per Source

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment