Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
Buckets:
HCAI-Lab
/
dolma3-6t-sample-5000-docs
Follow
Human-Centered AI Lab
10
Files
xet
HCAI-Lab/dolma3-6t-sample-5000-docs
/
worker_0043
11.1 GB
56,043 files
Updated about 2 months ago
Ctrl+K
Name
Size
Uploaded
Xet hash
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000003.jsonl.zst
387 kB
xet
about 2 months ago
34d3ce3a
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000008.jsonl.zst
360 kB
xet
about 2 months ago
d2947fe9
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000013.jsonl.zst
330 kB
xet
about 2 months ago
3f8f1b27
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000016.jsonl.zst
374 kB
xet
about 2 months ago
c36dfe15
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000023.jsonl.zst
335 kB
xet
about 2 months ago
9dac45af
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000024.jsonl.zst
299 kB
xet
about 2 months ago
61ac49f5
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000030.jsonl.zst
415 kB
xet
about 2 months ago
97dc9bd8
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000034.jsonl.zst
285 kB
xet
about 2 months ago
5779df72
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000035.jsonl.zst
302 kB
xet
about 2 months ago
3d7c75b8
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000039.jsonl.zst
275 kB
xet
about 2 months ago
25cb2be6
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000043.jsonl.zst
405 kB
xet
about 2 months ago
2a0b348c
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000044.jsonl.zst
379 kB
xet
about 2 months ago
7a9457b0
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000055.jsonl.zst
423 kB
xet
about 2 months ago
0d34ee83
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000060.jsonl.zst
356 kB
xet
about 2 months ago
6055e59f
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000064.jsonl.zst
417 kB
xet
about 2 months ago
f252c38b
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000076.jsonl.zst
444 kB
xet
about 2 months ago
e8e38d52
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000084.jsonl.zst
281 kB
xet
about 2 months ago
dcf3a7af
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000086.jsonl.zst
348 kB
xet
about 2 months ago
7325b6b6
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000090.jsonl.zst
301 kB
xet
about 2 months ago
30ee8fc9
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000092.jsonl.zst
243 kB
xet
about 2 months ago
78f4a8a9
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000093.jsonl.zst
339 kB
xet
about 2 months ago
23023d43
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000095.jsonl.zst
350 kB
xet
about 2 months ago
a00a92f1
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000099.jsonl.zst
318 kB
xet
about 2 months ago
a258b130
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000101.jsonl.zst
366 kB
xet
about 2 months ago
a4108862
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000103.jsonl.zst
433 kB
xet
about 2 months ago
79e0d8a1
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000105.jsonl.zst
385 kB
xet
about 2 months ago
5af11e65
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000109.jsonl.zst
267 kB
xet
about 2 months ago
8348bdef
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000113.jsonl.zst
345 kB
xet
about 2 months ago
d5e5695e
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000115.jsonl.zst
337 kB
xet
about 2 months ago
9d9e2f8c
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000116.jsonl.zst
383 kB
xet
about 2 months ago
0d9af41e
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000117.jsonl.zst
308 kB
xet
about 2 months ago
8bed44a9
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000121.jsonl.zst
319 kB
xet
about 2 months ago
70980d63
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000130.jsonl.zst
399 kB
xet
about 2 months ago
dce0d933
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000132.jsonl.zst
379 kB
xet
about 2 months ago
e2050479
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000135.jsonl.zst
336 kB
xet
about 2 months ago
b036ad3a
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000139.jsonl.zst
315 kB
xet
about 2 months ago
84a096ab
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000143.jsonl.zst
285 kB
xet
about 2 months ago
72f11873
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000144.jsonl.zst
309 kB
xet
about 2 months ago
490ed397
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000147.jsonl.zst
170 kB
xet
about 2 months ago
9905fa77
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000150.jsonl.zst
123 kB
xet
about 2 months ago
1a4ba418
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000152.jsonl.zst
412 kB
xet
about 2 months ago
72a341c1
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000158.jsonl.zst
486 kB
xet
about 2 months ago
793b7aac
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000161.jsonl.zst
352 kB
xet
about 2 months ago
593698b4
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000167.jsonl.zst
314 kB
xet
about 2 months ago
952bce8d
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000172.jsonl.zst
448 kB
xet
about 2 months ago
820450e3
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000174.jsonl.zst
358 kB
xet
about 2 months ago
b6852e6f
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000175.jsonl.zst
363 kB
xet
about 2 months ago
29723fd7
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000176.jsonl.zst
216 kB
xet
about 2 months ago
f19cc471
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000179.jsonl.zst
176 kB
xet
about 2 months ago
a48c34ea
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000180.jsonl.zst
135 kB
xet
about 2 months ago
0b419b03
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000184.jsonl.zst
352 kB
xet
about 2 months ago
ff12becb
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000186.jsonl.zst
446 kB
xet
about 2 months ago
708b027d
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000188.jsonl.zst
395 kB
xet
about 2 months ago
4ac59571
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000191.jsonl.zst
396 kB
xet
about 2 months ago
925be4c4
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000195.jsonl.zst
481 kB
xet
about 2 months ago
f95522c6
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000198.jsonl.zst
322 kB
xet
about 2 months ago
35b4fed9
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000200.jsonl.zst
320 kB
xet
about 2 months ago
edbc5e1c
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000211.jsonl.zst
341 kB
xet
about 2 months ago
c8faeaba
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000212.jsonl.zst
400 kB
xet
about 2 months ago
1616804b
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000213.jsonl.zst
367 kB
xet
about 2 months ago
665de31e
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000217.jsonl.zst
352 kB
xet
about 2 months ago
cafb50a2
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000220.jsonl.zst
283 kB
xet
about 2 months ago
e3dfc26b
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000227.jsonl.zst
283 kB
xet
about 2 months ago
53e7bd25
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000233.jsonl.zst
369 kB
xet
about 2 months ago
0cef9e5f
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000238.jsonl.zst
321 kB
xet
about 2 months ago
d24ca05f
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000242.jsonl.zst
340 kB
xet
about 2 months ago
7210329d
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000243.jsonl.zst
483 kB
xet
about 2 months ago
339a5c3f
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000254.jsonl.zst
376 kB
xet
about 2 months ago
912e3668
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000255.jsonl.zst
374 kB
xet
about 2 months ago
0b5ef94e
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000261.jsonl.zst
398 kB
xet
about 2 months ago
d8462adf
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000262.jsonl.zst
445 kB
xet
about 2 months ago
8343af43
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000264.jsonl.zst
271 kB
xet
about 2 months ago
c98cdc54
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000265.jsonl.zst
317 kB
xet
about 2 months ago
0e704be2
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0017__shard_00000269.jsonl.zst
322 kB
xet
about 2 months ago
e5a567e0
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0018__shard_00000000.jsonl.zst
396 kB
xet
about 2 months ago
7867b8dc
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0018__shard_00000001.jsonl.zst
396 kB
xet
about 2 months ago
9eb75891
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0018__shard_00000002.jsonl.zst
306 kB
xet
about 2 months ago
12a3e782
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0018__shard_00000003.jsonl.zst
263 kB
xet
about 2 months ago
361a99c6
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0018__shard_00000004.jsonl.zst
291 kB
xet
about 2 months ago
33d37a57
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0018__shard_00000005.jsonl.zst
371 kB
xet
about 2 months ago
3205ad73
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0018__shard_00000006.jsonl.zst
311 kB
xet
about 2 months ago
a11ec7e0
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0018__shard_00000007.jsonl.zst
376 kB
xet
about 2 months ago
a2441351
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0018__shard_00000008.jsonl.zst
280 kB
xet
about 2 months ago
221d2513
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0018__shard_00000009.jsonl.zst
285 kB
xet
about 2 months ago
b6a8639a
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0018__shard_00000010.jsonl.zst
400 kB
xet
about 2 months ago
0883eaa9
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0018__shard_00000011.jsonl.zst
335 kB
xet
about 2 months ago
61857f63
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0018__shard_00000012.jsonl.zst
286 kB
xet
about 2 months ago
8bb53034
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0018__shard_00000013.jsonl.zst
369 kB
xet
about 2 months ago
557893e3
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0018__shard_00000014.jsonl.zst
271 kB
xet
about 2 months ago
dfefc358
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0018__shard_00000015.jsonl.zst
260 kB
xet
about 2 months ago
c0287888
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0018__shard_00000016.jsonl.zst
362 kB
xet
about 2 months ago
ed00eeb7
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0018__shard_00000017.jsonl.zst
423 kB
xet
about 2 months ago
b409a547
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0018__shard_00000018.jsonl.zst
350 kB
xet
about 2 months ago
93db1656
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0018__shard_00000019.jsonl.zst
347 kB
xet
about 2 months ago
3baa7fc3
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0018__shard_00000020.jsonl.zst
346 kB
xet
about 2 months ago
0c4696cd
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0018__shard_00000021.jsonl.zst
440 kB
xet
about 2 months ago
d730ae37
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0018__shard_00000022.jsonl.zst
363 kB
xet
about 2 months ago
e7d18eae
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0018__shard_00000023.jsonl.zst
291 kB
xet
about 2 months ago
541f6a20
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0018__shard_00000024.jsonl.zst
466 kB
xet
about 2 months ago
3067df46
soc127__phase1_pool_shared__common_crawl__part_003__data__common_crawl-industrial-0018__shard_00000025.jsonl.zst
489 kB
xet
about 2 months ago
4e5b5108
Load more
Sync this bucket
Mount this bucket
Total size
11.1 GB
Files
56,043
Last updated
Mar 24
Pre-warmed CDN
US
EU
US
EU
Contributors