Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
Buckets:
HCAI-Lab
/
dolma3-6t-sample-5000-docs
Follow
Human-Centered AI Lab
10
Files
xet
HCAI-Lab/dolma3-6t-sample-5000-docs
/
worker_0016
11.1 GB
56,043 files
Updated about 2 months ago
Ctrl+K
Name
Size
Uploaded
Xet hash
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000411.jsonl.zst
225 kB
xet
about 2 months ago
40ffa0c8
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000412.jsonl.zst
167 kB
xet
about 2 months ago
5c73b5bf
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000413.jsonl.zst
285 kB
xet
about 2 months ago
3470e0d9
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000414.jsonl.zst
196 kB
xet
about 2 months ago
7019ac68
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000415.jsonl.zst
215 kB
xet
about 2 months ago
b42d552d
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000416.jsonl.zst
278 kB
xet
about 2 months ago
c361271e
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000417.jsonl.zst
195 kB
xet
about 2 months ago
f0fbe867
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000418.jsonl.zst
232 kB
xet
about 2 months ago
5ca2cf80
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000419.jsonl.zst
193 kB
xet
about 2 months ago
b0ad1ac0
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000420.jsonl.zst
175 kB
xet
about 2 months ago
cfc80747
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000421.jsonl.zst
201 kB
xet
about 2 months ago
f53d4713
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000422.jsonl.zst
148 kB
xet
about 2 months ago
2cacd0d8
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000423.jsonl.zst
136 kB
xet
about 2 months ago
ac845d3f
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000424.jsonl.zst
211 kB
xet
about 2 months ago
c6de9b4d
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000425.jsonl.zst
212 kB
xet
about 2 months ago
81a61347
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000426.jsonl.zst
233 kB
xet
about 2 months ago
586846d3
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000427.jsonl.zst
160 kB
xet
about 2 months ago
a4fd5f3e
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000428.jsonl.zst
151 kB
xet
about 2 months ago
43290ec8
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000429.jsonl.zst
136 kB
xet
about 2 months ago
e17e26b6
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000430.jsonl.zst
201 kB
xet
about 2 months ago
c61d54d2
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000431.jsonl.zst
223 kB
xet
about 2 months ago
c35fcdd0
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000432.jsonl.zst
206 kB
xet
about 2 months ago
e013598c
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000433.jsonl.zst
199 kB
xet
about 2 months ago
c8b434e5
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000434.jsonl.zst
170 kB
xet
about 2 months ago
35623568
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000435.jsonl.zst
175 kB
xet
about 2 months ago
66946fef
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000436.jsonl.zst
210 kB
xet
about 2 months ago
7ba1324f
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000437.jsonl.zst
303 kB
xet
about 2 months ago
457ffa0b
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000438.jsonl.zst
180 kB
xet
about 2 months ago
7c52dbca
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000439.jsonl.zst
119 kB
xet
about 2 months ago
274a62ec
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000440.jsonl.zst
189 kB
xet
about 2 months ago
8eddada4
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000441.jsonl.zst
216 kB
xet
about 2 months ago
4f55d255
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000442.jsonl.zst
199 kB
xet
about 2 months ago
249a8b53
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000443.jsonl.zst
169 kB
xet
about 2 months ago
1289d480
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000444.jsonl.zst
236 kB
xet
about 2 months ago
48e60bd6
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000445.jsonl.zst
170 kB
xet
about 2 months ago
b810f05c
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000446.jsonl.zst
246 kB
xet
about 2 months ago
e4c12784
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000447.jsonl.zst
258 kB
xet
about 2 months ago
72941bcc
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000448.jsonl.zst
186 kB
xet
about 2 months ago
36f41398
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000449.jsonl.zst
195 kB
xet
about 2 months ago
95d9e5a8
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000450.jsonl.zst
177 kB
xet
about 2 months ago
45e1eb25
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000451.jsonl.zst
198 kB
xet
about 2 months ago
f4cc1ee4
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000452.jsonl.zst
156 kB
xet
about 2 months ago
f8a13bec
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000453.jsonl.zst
183 kB
xet
about 2 months ago
9ea0bad7
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000454.jsonl.zst
148 kB
xet
about 2 months ago
6b7b2e3e
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000455.jsonl.zst
375 kB
xet
about 2 months ago
2666e653
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000456.jsonl.zst
204 kB
xet
about 2 months ago
d9d61c12
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000457.jsonl.zst
344 kB
xet
about 2 months ago
38c2e55d
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000458.jsonl.zst
212 kB
xet
about 2 months ago
2e431aac
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000459.jsonl.zst
257 kB
xet
about 2 months ago
3782280a
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000460.jsonl.zst
299 kB
xet
about 2 months ago
23827429
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000461.jsonl.zst
214 kB
xet
about 2 months ago
52a201f8
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000462.jsonl.zst
182 kB
xet
about 2 months ago
f0438f1d
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000463.jsonl.zst
137 kB
xet
about 2 months ago
4c04b6bf
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000464.jsonl.zst
252 kB
xet
about 2 months ago
063331b0
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000465.jsonl.zst
179 kB
xet
about 2 months ago
7d313152
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000466.jsonl.zst
227 kB
xet
about 2 months ago
19626d5e
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000467.jsonl.zst
144 kB
xet
about 2 months ago
3d701d30
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000468.jsonl.zst
165 kB
xet
about 2 months ago
0822fd33
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000469.jsonl.zst
217 kB
xet
about 2 months ago
3e3f7598
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000470.jsonl.zst
168 kB
xet
about 2 months ago
51b2d075
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000471.jsonl.zst
220 kB
xet
about 2 months ago
fc0d11b3
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000472.jsonl.zst
274 kB
xet
about 2 months ago
3beeea34
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000473.jsonl.zst
147 kB
xet
about 2 months ago
60c78d78
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000474.jsonl.zst
177 kB
xet
about 2 months ago
6cb4b17b
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000475.jsonl.zst
168 kB
xet
about 2 months ago
3fb50e69
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000476.jsonl.zst
155 kB
xet
about 2 months ago
92d287b6
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000477.jsonl.zst
152 kB
xet
about 2 months ago
8adf1b21
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000478.jsonl.zst
253 kB
xet
about 2 months ago
78f01e8e
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000479.jsonl.zst
191 kB
xet
about 2 months ago
8bfa5f97
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000480.jsonl.zst
279 kB
xet
about 2 months ago
8ef6444a
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000481.jsonl.zst
222 kB
xet
about 2 months ago
6b5bb8cf
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000482.jsonl.zst
199 kB
xet
about 2 months ago
6c0f06c8
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000483.jsonl.zst
260 kB
xet
about 2 months ago
9dfa90ab
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000484.jsonl.zst
163 kB
xet
about 2 months ago
26f9fe4d
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000485.jsonl.zst
174 kB
xet
about 2 months ago
64f2db55
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000486.jsonl.zst
271 kB
xet
about 2 months ago
33ea5c69
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000487.jsonl.zst
125 kB
xet
about 2 months ago
f275d32b
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000488.jsonl.zst
226 kB
xet
about 2 months ago
963cc871
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000489.jsonl.zst
255 kB
xet
about 2 months ago
c826f978
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000490.jsonl.zst
146 kB
xet
about 2 months ago
8955626a
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000491.jsonl.zst
231 kB
xet
about 2 months ago
22559839
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000492.jsonl.zst
162 kB
xet
about 2 months ago
933eb3d3
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000493.jsonl.zst
234 kB
xet
about 2 months ago
97e7ad6b
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000494.jsonl.zst
262 kB
xet
about 2 months ago
2b7fe9ef
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000495.jsonl.zst
253 kB
xet
about 2 months ago
6c303ec6
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000496.jsonl.zst
170 kB
xet
about 2 months ago
77c4f4c4
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000497.jsonl.zst
196 kB
xet
about 2 months ago
41ef1fdd
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000498.jsonl.zst
184 kB
xet
about 2 months ago
4b907dd8
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000499.jsonl.zst
332 kB
xet
about 2 months ago
76ada8a2
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000500.jsonl.zst
239 kB
xet
about 2 months ago
88ba17bc
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000501.jsonl.zst
298 kB
xet
about 2 months ago
a1946d32
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000502.jsonl.zst
216 kB
xet
about 2 months ago
a0c65d67
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000503.jsonl.zst
177 kB
xet
about 2 months ago
12d4e948
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000504.jsonl.zst
170 kB
xet
about 2 months ago
b8bb7e3d
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000505.jsonl.zst
209 kB
xet
about 2 months ago
98a2f91f
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000506.jsonl.zst
314 kB
xet
about 2 months ago
37528205
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000507.jsonl.zst
201 kB
xet
about 2 months ago
fad2b7bc
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000508.jsonl.zst
260 kB
xet
about 2 months ago
cba67f67
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000509.jsonl.zst
204 kB
xet
about 2 months ago
c0e7792e
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0019__shard_00000510.jsonl.zst
335 kB
xet
about 2 months ago
05339350
Load more
Sync this bucket
Mount this bucket
Total size
11.1 GB
Files
56,043
Last updated
Mar 24
Pre-warmed CDN
US
EU
US
EU
Contributors