A multilingual dataset for NER covering 91 langauges and 25 scripts
Jonas Golde
whoisjones
AI & ML interests
Data-efficient transfer learning
Recent Activity
upvoted a paper about 19 hours ago
Repetition over Diversity: High-Signal Data Filtering for Sample-Efficient German Language Modeling updated a dataset 29 days ago
whoisjones/finerweb_document_context published a dataset 29 days ago
whoisjones/finerweb_document_context