A physical commonsense reasoning benchmark for 100+ languages, written in collaboration with 300+ researchers from 65 countries.
Catherine Arnett
catherinearnett
AI & ML interests
multilingual NLP, tokenization
Recent Activity
updated
a dataset 21 days ago
catherinearnett/bilingual-tokenizer-training-data published
a dataset 22 days ago
catherinearnett/bilingual-tokenizer-training-data liked
a dataset about 1 month ago
commoncrawl/CommonLID