Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
Buckets:
HCAI-Lab
/
dolma3-6t-sample-5000-docs
Follow
Human-Centered AI Lab
10
Files
xet
HCAI-Lab/dolma3-6t-sample-5000-docs
/
worker_0044
11.1 GB
56,043 files
Updated about 2 months ago
Ctrl+K
Name
Size
Uploaded
Xet hash
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000095.jsonl.zst
301 kB
xet
about 2 months ago
6b35348e
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000096.jsonl.zst
327 kB
xet
about 2 months ago
e81ef4f4
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000097.jsonl.zst
107 kB
xet
about 2 months ago
76475ee9
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000098.jsonl.zst
306 kB
xet
about 2 months ago
e052f43b
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000099.jsonl.zst
336 kB
xet
about 2 months ago
e0efa5b8
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000100.jsonl.zst
269 kB
xet
about 2 months ago
e3e0a6a7
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000101.jsonl.zst
310 kB
xet
about 2 months ago
a1518e6a
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000102.jsonl.zst
282 kB
xet
about 2 months ago
837f6ce4
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000103.jsonl.zst
244 kB
xet
about 2 months ago
1ec52bc3
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000104.jsonl.zst
276 kB
xet
about 2 months ago
73808709
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000105.jsonl.zst
293 kB
xet
about 2 months ago
c3d941d7
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000106.jsonl.zst
275 kB
xet
about 2 months ago
4c141332
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000107.jsonl.zst
241 kB
xet
about 2 months ago
9e6e6721
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000108.jsonl.zst
124 kB
xet
about 2 months ago
5136b792
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000109.jsonl.zst
220 kB
xet
about 2 months ago
e7c255de
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000110.jsonl.zst
263 kB
xet
about 2 months ago
07a25bbc
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000111.jsonl.zst
247 kB
xet
about 2 months ago
cfaed969
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000112.jsonl.zst
228 kB
xet
about 2 months ago
b89af701
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000113.jsonl.zst
334 kB
xet
about 2 months ago
4bf36314
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000114.jsonl.zst
228 kB
xet
about 2 months ago
effb99e0
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000115.jsonl.zst
270 kB
xet
about 2 months ago
9588c54f
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000116.jsonl.zst
302 kB
xet
about 2 months ago
cedda450
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000117.jsonl.zst
292 kB
xet
about 2 months ago
ec77c29b
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000118.jsonl.zst
259 kB
xet
about 2 months ago
e67af7a1
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000119.jsonl.zst
171 kB
xet
about 2 months ago
0851eb45
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000120.jsonl.zst
286 kB
xet
about 2 months ago
77da3876
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000121.jsonl.zst
252 kB
xet
about 2 months ago
48cdf8b3
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000122.jsonl.zst
458 kB
xet
about 2 months ago
c5917710
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000123.jsonl.zst
309 kB
xet
about 2 months ago
d3e51975
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000124.jsonl.zst
269 kB
xet
about 2 months ago
4f1859c7
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000125.jsonl.zst
250 kB
xet
about 2 months ago
1b940733
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000126.jsonl.zst
262 kB
xet
about 2 months ago
f2225179
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000127.jsonl.zst
276 kB
xet
about 2 months ago
c44f8216
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000128.jsonl.zst
286 kB
xet
about 2 months ago
12ae6777
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000129.jsonl.zst
340 kB
xet
about 2 months ago
65966fa5
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000130.jsonl.zst
243 kB
xet
about 2 months ago
dcb1d955
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000131.jsonl.zst
338 kB
xet
about 2 months ago
090f5109
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000132.jsonl.zst
291 kB
xet
about 2 months ago
f66eab5f
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000133.jsonl.zst
257 kB
xet
about 2 months ago
62406bb4
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000134.jsonl.zst
259 kB
xet
about 2 months ago
f9de8929
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000135.jsonl.zst
267 kB
xet
about 2 months ago
0e457815
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000136.jsonl.zst
288 kB
xet
about 2 months ago
dfcb6ecd
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000137.jsonl.zst
261 kB
xet
about 2 months ago
544d7b54
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000138.jsonl.zst
282 kB
xet
about 2 months ago
571a7c54
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000139.jsonl.zst
317 kB
xet
about 2 months ago
a5674da7
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000140.jsonl.zst
316 kB
xet
about 2 months ago
ec5befd2
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000141.jsonl.zst
282 kB
xet
about 2 months ago
fee7d9c0
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000142.jsonl.zst
302 kB
xet
about 2 months ago
4b074ec6
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000143.jsonl.zst
327 kB
xet
about 2 months ago
96d8e7fc
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000144.jsonl.zst
269 kB
xet
about 2 months ago
7360adf2
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000145.jsonl.zst
234 kB
xet
about 2 months ago
3ac1b3a3
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000146.jsonl.zst
252 kB
xet
about 2 months ago
45e661fa
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000147.jsonl.zst
206 kB
xet
about 2 months ago
a68356d5
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000148.jsonl.zst
368 kB
xet
about 2 months ago
cb36551c
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000149.jsonl.zst
373 kB
xet
about 2 months ago
d6593e87
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000150.jsonl.zst
279 kB
xet
about 2 months ago
035421dc
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000151.jsonl.zst
306 kB
xet
about 2 months ago
6ac9d099
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000152.jsonl.zst
245 kB
xet
about 2 months ago
be106073
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000153.jsonl.zst
269 kB
xet
about 2 months ago
0601c360
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000154.jsonl.zst
235 kB
xet
about 2 months ago
d775a253
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000155.jsonl.zst
323 kB
xet
about 2 months ago
e325dc91
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000156.jsonl.zst
216 kB
xet
about 2 months ago
cf8da487
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000157.jsonl.zst
254 kB
xet
about 2 months ago
ca0a87bb
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000158.jsonl.zst
291 kB
xet
about 2 months ago
ee99b3e6
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000159.jsonl.zst
312 kB
xet
about 2 months ago
e0e414d8
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000160.jsonl.zst
280 kB
xet
about 2 months ago
064e4b57
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000161.jsonl.zst
294 kB
xet
about 2 months ago
49123bf2
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000162.jsonl.zst
217 kB
xet
about 2 months ago
313798a3
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000163.jsonl.zst
361 kB
xet
about 2 months ago
4f83ace8
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000164.jsonl.zst
279 kB
xet
about 2 months ago
67b23429
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000165.jsonl.zst
255 kB
xet
about 2 months ago
c63a8a4f
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000166.jsonl.zst
287 kB
xet
about 2 months ago
c1a8b9dd
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000167.jsonl.zst
279 kB
xet
about 2 months ago
0827cb46
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000168.jsonl.zst
274 kB
xet
about 2 months ago
4f4ede2c
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000169.jsonl.zst
296 kB
xet
about 2 months ago
993fb7bf
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000170.jsonl.zst
357 kB
xet
about 2 months ago
be044844
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000171.jsonl.zst
298 kB
xet
about 2 months ago
f5f38bdc
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000172.jsonl.zst
203 kB
xet
about 2 months ago
70b1643f
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000173.jsonl.zst
213 kB
xet
about 2 months ago
afbb7c77
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000174.jsonl.zst
321 kB
xet
about 2 months ago
4872bb4d
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000175.jsonl.zst
335 kB
xet
about 2 months ago
a05f3627
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000176.jsonl.zst
298 kB
xet
about 2 months ago
da8b4544
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000177.jsonl.zst
266 kB
xet
about 2 months ago
3f669e7b
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000178.jsonl.zst
231 kB
xet
about 2 months ago
635cc6a4
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000179.jsonl.zst
264 kB
xet
about 2 months ago
cad97167
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000180.jsonl.zst
266 kB
xet
about 2 months ago
5edc0dec
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000181.jsonl.zst
252 kB
xet
about 2 months ago
aa0387b5
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000182.jsonl.zst
300 kB
xet
about 2 months ago
5d46493b
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000183.jsonl.zst
240 kB
xet
about 2 months ago
0df67704
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000184.jsonl.zst
319 kB
xet
about 2 months ago
4340cace
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000185.jsonl.zst
307 kB
xet
about 2 months ago
440a419b
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000186.jsonl.zst
287 kB
xet
about 2 months ago
2f7e83a9
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000187.jsonl.zst
308 kB
xet
about 2 months ago
ef57bbaf
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000188.jsonl.zst
303 kB
xet
about 2 months ago
3f8722df
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000189.jsonl.zst
229 kB
xet
about 2 months ago
d99ba64c
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000190.jsonl.zst
260 kB
xet
about 2 months ago
b7fdcead
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000191.jsonl.zst
229 kB
xet
about 2 months ago
c532a243
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000192.jsonl.zst
261 kB
xet
about 2 months ago
2ba7d4ee
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000193.jsonl.zst
293 kB
xet
about 2 months ago
2452784f
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0019__shard_00000194.jsonl.zst
363 kB
xet
about 2 months ago
e1fdc6bc
Load more
Sync this bucket
Mount this bucket
Total size
11.1 GB
Files
56,043
Last updated
Mar 24
Pre-warmed CDN
US
EU
US
EU
Contributors