Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
KaleidoPH
's Collections
Pretrain
Pretrain
updated
Mar 2
Pretraining Bleedingheart
Upvote
-
AlekseyKorshuk/fiction-books
Viewer
•
Updated
Jun 12, 2022
•
4.74k
•
370
•
9
defunct-datasets/the_pile_books3
Updated
Jan 18, 2024
•
257
•
152
AlekseyKorshuk/drama-books
Viewer
•
Updated
Jun 11, 2022
•
1.11k
•
23
•
3
AlekseyKorshuk/thriller-books
Viewer
•
Updated
Jun 10, 2022
•
366
•
11
•
3
AlekseyKorshuk/horror-scripts
Viewer
•
Updated
Feb 10, 2022
•
11
•
127
•
2
AlekseyKorshuk/comedy-scripts
Viewer
•
Updated
Feb 11, 2022
•
11
•
209
•
4
AlekseyKorshuk/books
Viewer
•
Updated
Jun 25, 2022
•
741
•
51
•
3
AlekseyKorshuk/fantasy-books
Viewer
•
Updated
Jun 10, 2022
•
3.51k
•
18
•
10
AlekseyKorshuk/erotic-books
Viewer
•
Updated
Jun 9, 2022
•
646
•
154
•
27
AlekseyKorshuk/romance-books
Viewer
•
Updated
Jun 10, 2022
•
3.55k
•
21
•
12
AlekseyKorshuk/fairy-tale-books
Viewer
•
Updated
Jun 9, 2022
•
1.01k
•
11
•
8
kmfoda/booksum
Viewer
•
Updated
Nov 30, 2022
•
12.5k
•
1.74k
•
78
defunct-datasets/bookcorpusopen
Updated
Nov 24, 2023
•
664
•
39
bookcorpus/bookcorpus
Updated
May 3, 2024
•
14.8k
•
353
legacy-datasets/wikipedia
Updated
Mar 11, 2024
•
123k
•
627
allenai/dolma
Updated
Apr 17, 2024
•
4.5k
•
1.03k
EleutherAI/the_pile_deduplicated
Viewer
•
Updated
Dec 2, 2022
•
134M
•
22.6k
•
112
tiiuae/falcon-refinedweb
Viewer
•
Updated
Jun 20, 2023
•
968M
•
22.7k
•
910
bigcode/starcoderdata
Viewer
•
Updated
May 16, 2023
•
207M
•
31k
•
504
SciPhi/textbooks-are-all-you-need-lite
Viewer
•
Updated
Sep 30, 2023
•
682k
•
1.2k
•
191
nampdn-ai/tiny-textbooks
Viewer
•
Updated
Jul 3, 2024
•
420k
•
558
•
173
nampdn-ai/tiny-codes
Viewer
•
Updated
Sep 30, 2023
•
1.63M
•
1.93k
•
288
nampdn-ai/tiny-orca-textbooks
Viewer
•
Updated
Sep 28, 2023
•
147k
•
41
•
43
roneneldan/TinyStories
Viewer
•
Updated
Aug 12, 2024
•
2.14M
•
97.1k
•
974
arxiv-community/arxiv_dataset
Updated
Jan 18, 2024
•
1.13k
•
135
Salesforce/wikitext
Viewer
•
Updated
Jan 4, 2024
•
3.71M
•
1.32M
•
680
Skylion007/openwebtext
Viewer
•
Updated
Dec 26, 2025
•
8.01M
•
70.2k
•
509
codeparrot/github-code
Updated
Oct 20, 2022
•
15.3k
•
360
codeparrot/codecomplex
Viewer
•
Updated
Oct 25, 2022
•
4.52k
•
423
•
32
codeparrot/github-jupyter
Viewer
•
Updated
Oct 25, 2022
•
165k
•
1.49k
•
5
codeparrot/self-instruct-starcoder
Viewer
•
Updated
Oct 23, 2023
•
9.63k
•
698
•
63
codeparrot/codeparrot-clean-train
Viewer
•
Updated
Oct 10, 2022
•
5.11M
•
1.7k
•
16
dell-research-harvard/AmericanStories
Updated
Mar 26, 2025
•
12.6k
•
167
Upvote
-
Share collection
View history
Collection guide
Browse collections