Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
5
7
14
Catherine Arnett
catherinearnett
Follow
christopher's profile picture
raidhosn's profile picture
ikarth's profile picture
108 followers
·
37 following
https://catherinearnett.github.io/
linguist_cat
catherinearnett
catherinearnett.bsky.social
AI & ML interests
multilingual NLP, tokenization
Recent Activity
updated
a dataset
22 days ago
catherinearnett/bilingual-tokenizer-training-data
published
a dataset
22 days ago
catherinearnett/bilingual-tokenizer-training-data
liked
a dataset
about 1 month ago
commoncrawl/CommonLID
View all activity
Organizations
catherinearnett
's datasets
4
Sort: Recently updated
catherinearnett/bilingual-tokenizer-training-data
Viewer
•
Updated
21 days ago
•
30.7M
•
255
catherinearnett/montok
Updated
Sep 19, 2025
•
5.28k
•
3
catherinearnett/morphscore
Viewer
•
Updated
Jul 10, 2025
•
5.09M
•
391
•
4
catherinearnett/monolingual-tokenizer-data
Viewer
•
Updated
May 15, 2025
•
139M
•
213
•
1