Which 46 Languages

#86

by Robertl - opened Aug 16, 2022

Discussion

Robertl

Aug 16, 2022

This comment has been hidden

ybelkada

BigScience Workshop org Aug 16, 2022

•

edited Aug 16, 2022

Hi @Robertl ,
Please find the full list below! You can see it by clicking on the widget 46 languages above

Thanks a lot!

ybelkada

BigScience Workshop org Aug 16, 2022

I agree it is not clear enough, I proposed a PR here: https://github.com/huggingface/transformers/pull/18645 to have the full detailed list of the trained languages

ybelkada

BigScience Workshop org Aug 16, 2022

Actually you can also find the full list here: https://huggingface.co/bigscience/bloom#languages !

christopher changed discussion status to closed Aug 17, 2022

cerisara

Sep 5, 2022

Why no Czech for training bloom? Czech has large corpora, has a very active community in NLP, have published previous NLP models (e.g., a BERT version)... ?

christopher

BigScience Workshop org Sep 5, 2022

@cerisara The training corpus was crowdsourced by workshop participants; the final list of languages took shape organically through community hackathons and volunteer efforts.

More info in this thread: https://twitter.com/YJernite/status/1505920454825066496

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment