TiME Collection The TiME collection gathers monolingual BERT-style encoders for 16 languages (xs, s, m). Each model outputs embeddings distilled from XLM-R large. • 48 items • Updated Mar 2 • 2
Obfuscated ModernBERT Collection Models pretrained on various obfuscated versions of the same dataset to analyse how the obfuscation affects downstream performance. • 3 items • Updated about 16 hours ago
Obfuscated ModernBERT Collection Models pretrained on various obfuscated versions of the same dataset to analyse how the obfuscation affects downstream performance. • 3 items • Updated about 16 hours ago
Obfuscated FineWeb Edu Collection A collection of obfuscated version of a 20B-token sample of the FineWeb Edu dataset. • 4 items • Updated 3 days ago
Obfuscated FineWeb Edu Collection A collection of obfuscated version of a 20B-token sample of the FineWeb Edu dataset. • 4 items • Updated 3 days ago
Learning to Detect Language Model Training Data via Active Reconstruction Paper • 2602.19020 • Published Feb 22 • 2