Global Corpus lumees/turkish-corpus-100b Viewer • Updated 27 days ago • 107M • 1.38k • 2 lumees/multilingual-safety-classification-dataset Viewer • Updated Oct 24 • 213k • 214 • 2 lumees/bulgarian-corpus-33b Viewer • Updated 27 days ago • 34.9M • 567 • 2 lumees/dutch-corpus-200b Viewer • Updated 26 days ago • 170M • 346 • 3
Turkish Retrieval Datasets lumees/ms-marco-tr-hard-negatives Viewer • Updated about 1 month ago • 786k • 61 • 2 lumees/wikipedia-turkish-synthetic-query Viewer • Updated 30 days ago • 19.8k • 43 • 2
Retrieval Models lumees/lumees-matryoshka-embedding-v1 Sentence Similarity • 0.6B • Updated Nov 25 • 303 • 2 lumees/lumees-matryoshka-vision-embedding-v1 Feature Extraction • Updated Nov 26 • 11 • 2 lumees/aethel-reranker-en-v1 Text Ranking • 0.1B • Updated Nov 20 • 313 • 3
Code Retrieval Datasets lumees/codesearchnet-hard-negatives Viewer • Updated 30 days ago • 955k • 88 • 2
Safety & Moderation Datasets Comprehensive collection of high-quality multilingual datasets for NLP research and production. lumees/multilingual-safety-classification-dataset Viewer • Updated Oct 24 • 213k • 214 • 2 lumees/age-specific-text-simplification Viewer • Updated Aug 13 • 17.2k • 51 • 2
Retrieval Models lumees/lumees-matryoshka-embedding-v1 Sentence Similarity • 0.6B • Updated Nov 25 • 303 • 2 lumees/lumees-matryoshka-vision-embedding-v1 Feature Extraction • Updated Nov 26 • 11 • 2 lumees/aethel-reranker-en-v1 Text Ranking • 0.1B • Updated Nov 20 • 313 • 3
Global Corpus lumees/turkish-corpus-100b Viewer • Updated 27 days ago • 107M • 1.38k • 2 lumees/multilingual-safety-classification-dataset Viewer • Updated Oct 24 • 213k • 214 • 2 lumees/bulgarian-corpus-33b Viewer • Updated 27 days ago • 34.9M • 567 • 2 lumees/dutch-corpus-200b Viewer • Updated 26 days ago • 170M • 346 • 3
Code Retrieval Datasets lumees/codesearchnet-hard-negatives Viewer • Updated 30 days ago • 955k • 88 • 2
Turkish Retrieval Datasets lumees/ms-marco-tr-hard-negatives Viewer • Updated about 1 month ago • 786k • 61 • 2 lumees/wikipedia-turkish-synthetic-query Viewer • Updated 30 days ago • 19.8k • 43 • 2
Safety & Moderation Datasets Comprehensive collection of high-quality multilingual datasets for NLP research and production. lumees/multilingual-safety-classification-dataset Viewer • Updated Oct 24 • 213k • 214 • 2 lumees/age-specific-text-simplification Viewer • Updated Aug 13 • 17.2k • 51 • 2