Global Corpus lumees/turkish-corpus-100b Viewer • Updated Nov 30, 2025 • 107M • 226 • 4 lumees/multilingual-safety-classification-dataset Viewer • Updated Oct 24, 2025 • 213k • 3.64k • 3 lumees/bulgarian-corpus-33b Viewer • Updated Nov 30, 2025 • 34.9M • 986 • 3 lumees/dutch-corpus-200b Viewer • Updated Dec 1, 2025 • 170M • 991 • 4
Turkish Retrieval Datasets lumees/ms-marco-tr-hard-negatives Viewer • Updated Nov 27, 2025 • 786k • 38 • 2 lumees/wikipedia-turkish-synthetic-query Viewer • Updated Nov 28, 2025 • 19.8k • 31 • 3
Retrieval Models lumees/lumees-matryoshka-embedding-v1 Sentence Similarity • 0.6B • Updated Nov 25, 2025 • 1 • 2 lumees/lumees-matryoshka-vision-embedding-v1 Feature Extraction • Updated Nov 26, 2025 • 1 • 3 lumees/aethel-reranker-en-v1 Text Ranking • 0.1B • Updated Nov 20, 2025 • 9 • 3
Code Retrieval Datasets lumees/codesearchnet-hard-negatives Viewer • Updated Nov 28, 2025 • 955k • 43 • 3
Safety & Moderation Datasets Comprehensive collection of high-quality multilingual datasets for NLP research and production. lumees/multilingual-safety-classification-dataset Viewer • Updated Oct 24, 2025 • 213k • 3.64k • 3 lumees/age-specific-text-simplification Viewer • Updated Aug 13, 2025 • 17.2k • 15 • 2
Retrieval Models lumees/lumees-matryoshka-embedding-v1 Sentence Similarity • 0.6B • Updated Nov 25, 2025 • 1 • 2 lumees/lumees-matryoshka-vision-embedding-v1 Feature Extraction • Updated Nov 26, 2025 • 1 • 3 lumees/aethel-reranker-en-v1 Text Ranking • 0.1B • Updated Nov 20, 2025 • 9 • 3
Global Corpus lumees/turkish-corpus-100b Viewer • Updated Nov 30, 2025 • 107M • 226 • 4 lumees/multilingual-safety-classification-dataset Viewer • Updated Oct 24, 2025 • 213k • 3.64k • 3 lumees/bulgarian-corpus-33b Viewer • Updated Nov 30, 2025 • 34.9M • 986 • 3 lumees/dutch-corpus-200b Viewer • Updated Dec 1, 2025 • 170M • 991 • 4
Code Retrieval Datasets lumees/codesearchnet-hard-negatives Viewer • Updated Nov 28, 2025 • 955k • 43 • 3
Turkish Retrieval Datasets lumees/ms-marco-tr-hard-negatives Viewer • Updated Nov 27, 2025 • 786k • 38 • 2 lumees/wikipedia-turkish-synthetic-query Viewer • Updated Nov 28, 2025 • 19.8k • 31 • 3
Safety & Moderation Datasets Comprehensive collection of high-quality multilingual datasets for NLP research and production. lumees/multilingual-safety-classification-dataset Viewer • Updated Oct 24, 2025 • 213k • 3.64k • 3 lumees/age-specific-text-simplification Viewer • Updated Aug 13, 2025 • 17.2k • 15 • 2