Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
DoctorMorDi
's Collections
(not so) Large Language Models
Interesting Datasets
Mor Di's Spaces
Interesting Datasets
updated
Dec 30, 2023
Open-source datasets for training LLMs and Multimodal models
Upvote
-
wikimedia/wikipedia
Viewer
•
Updated
Jan 9, 2024
•
61.6M
•
101k
•
1.18k
OpenAssistant/oasst2
Viewer
•
Updated
Jan 11, 2024
•
135k
•
11.9k
•
288
AdaptLLM/finance-tasks
Viewer
•
Updated
Nov 30, 2024
•
23.3k
•
7.09k
•
78
AdaptLLM/law-tasks
Viewer
•
Updated
Dec 2, 2024
•
5.22k
•
181
•
36
AdaptLLM/medicine-tasks
Viewer
•
Updated
Dec 2, 2024
•
5.38k
•
277
•
33
HuggingFaceH4/ultrachat_200k
Viewer
•
Updated
Oct 16, 2024
•
515k
•
42k
•
683
HuggingFaceH4/no_robots
Viewer
•
Updated
Apr 18, 2024
•
10k
•
8.27k
•
539
Lin-Chen/ShareGPT4V
Viewer
•
Updated
Jun 6, 2024
•
1.35M
•
2.01k
•
306
ise-uiuc/Magicoder-OSS-Instruct-75K
Viewer
•
Updated
Dec 4, 2023
•
75.2k
•
6.36k
•
162
LDJnr/Capybara
Viewer
•
Updated
Jun 7, 2024
•
16k
•
2.88k
•
248
Open-Orca/OpenOrca
Viewer
•
Updated
Feb 19, 2025
•
2.94M
•
18.7k
•
1.51k
arxiv-community/arxiv_dataset
Updated
Jan 18, 2024
•
810
•
134
ai4privacy/pii-masking-200k
Viewer
•
Updated
5 days ago
•
209k
•
2.93k
•
120
AI4Math/MathVista
Viewer
•
Updated
Feb 11, 2024
•
6.14k
•
18.4k
•
210
derek-thomas/ScienceQA
Viewer
•
Updated
Feb 25, 2023
•
21.2k
•
14.4k
•
217
teknium/openhermes
Viewer
•
Updated
Sep 7, 2023
•
243k
•
1.32k
•
219
SciPhi/AgentSearch-V1
Viewer
•
Updated
Jan 14, 2024
•
70k
•
1.46k
•
92
theblackcat102/evol-codealpaca-v1
Viewer
•
Updated
Mar 10, 2024
•
111k
•
3.54k
•
179
Upvote
-
Share collection
View history
Collection guide
Browse collections