Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
Buckets:
HCAI-Lab
/
dolma3-6t-sample-5000-docs
Follow
Human-Centered AI Lab
10
Files
xet
HCAI-Lab/dolma3-6t-sample-5000-docs
/
worker_0072
11.1 GB
56,043 files
Updated about 2 months ago
Ctrl+K
Name
Size
Uploaded
Xet hash
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000292.jsonl.zst
213 kB
xet
about 2 months ago
c940f94c
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000293.jsonl.zst
271 kB
xet
about 2 months ago
4167adc9
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000294.jsonl.zst
209 kB
xet
about 2 months ago
4a3d4e75
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000295.jsonl.zst
185 kB
xet
about 2 months ago
4caae08a
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000296.jsonl.zst
197 kB
xet
about 2 months ago
99178599
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000297.jsonl.zst
231 kB
xet
about 2 months ago
c940ce5c
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000298.jsonl.zst
189 kB
xet
about 2 months ago
0132c32a
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000299.jsonl.zst
188 kB
xet
about 2 months ago
d2888447
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000300.jsonl.zst
266 kB
xet
about 2 months ago
b4075c16
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000301.jsonl.zst
243 kB
xet
about 2 months ago
fd02cf4c
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000302.jsonl.zst
176 kB
xet
about 2 months ago
a72611e0
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000303.jsonl.zst
198 kB
xet
about 2 months ago
aff9ef22
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000304.jsonl.zst
234 kB
xet
about 2 months ago
1c8226a2
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000305.jsonl.zst
296 kB
xet
about 2 months ago
6cc5a266
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000306.jsonl.zst
306 kB
xet
about 2 months ago
d82efd56
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000307.jsonl.zst
145 kB
xet
about 2 months ago
49250705
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000308.jsonl.zst
197 kB
xet
about 2 months ago
8573def1
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000309.jsonl.zst
170 kB
xet
about 2 months ago
89a34fa6
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000310.jsonl.zst
209 kB
xet
about 2 months ago
372ff077
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000311.jsonl.zst
191 kB
xet
about 2 months ago
f318fb7c
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000312.jsonl.zst
220 kB
xet
about 2 months ago
83c2d95b
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000313.jsonl.zst
257 kB
xet
about 2 months ago
b7a4e4ba
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000314.jsonl.zst
203 kB
xet
about 2 months ago
49b5e4e5
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000315.jsonl.zst
301 kB
xet
about 2 months ago
8fbb799b
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000316.jsonl.zst
251 kB
xet
about 2 months ago
a0586e0c
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000317.jsonl.zst
189 kB
xet
about 2 months ago
9cc151fa
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000318.jsonl.zst
217 kB
xet
about 2 months ago
5beab6af
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000319.jsonl.zst
154 kB
xet
about 2 months ago
91a3d531
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000320.jsonl.zst
289 kB
xet
about 2 months ago
d647552c
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000321.jsonl.zst
229 kB
xet
about 2 months ago
00a3a185
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000322.jsonl.zst
345 kB
xet
about 2 months ago
9dff25e2
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000323.jsonl.zst
191 kB
xet
about 2 months ago
117fad03
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000324.jsonl.zst
200 kB
xet
about 2 months ago
45fb3a2f
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000325.jsonl.zst
185 kB
xet
about 2 months ago
e221843f
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000326.jsonl.zst
233 kB
xet
about 2 months ago
62444048
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000327.jsonl.zst
217 kB
xet
about 2 months ago
8e9549c7
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000328.jsonl.zst
177 kB
xet
about 2 months ago
75a0adb3
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000329.jsonl.zst
186 kB
xet
about 2 months ago
5eafcf34
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000330.jsonl.zst
243 kB
xet
about 2 months ago
abb3a57a
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000331.jsonl.zst
172 kB
xet
about 2 months ago
5e493189
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000332.jsonl.zst
181 kB
xet
about 2 months ago
44c9f786
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000333.jsonl.zst
176 kB
xet
about 2 months ago
50f96b9a
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000334.jsonl.zst
207 kB
xet
about 2 months ago
d91699c0
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000335.jsonl.zst
217 kB
xet
about 2 months ago
58c2aaf5
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000336.jsonl.zst
170 kB
xet
about 2 months ago
427b7f28
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000337.jsonl.zst
156 kB
xet
about 2 months ago
710d5f4d
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000338.jsonl.zst
214 kB
xet
about 2 months ago
08060e44
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000339.jsonl.zst
219 kB
xet
about 2 months ago
d67d8407
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000340.jsonl.zst
270 kB
xet
about 2 months ago
0c6c24fa
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000341.jsonl.zst
210 kB
xet
about 2 months ago
10fdad7e
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000342.jsonl.zst
241 kB
xet
about 2 months ago
2180b905
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000343.jsonl.zst
214 kB
xet
about 2 months ago
3370b6c1
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000344.jsonl.zst
220 kB
xet
about 2 months ago
3a04c642
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000345.jsonl.zst
147 kB
xet
about 2 months ago
7b567caa
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000346.jsonl.zst
225 kB
xet
about 2 months ago
be7047a1
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000347.jsonl.zst
182 kB
xet
about 2 months ago
531c4276
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000348.jsonl.zst
225 kB
xet
about 2 months ago
42c658ee
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000349.jsonl.zst
205 kB
xet
about 2 months ago
b1475501
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000350.jsonl.zst
165 kB
xet
about 2 months ago
799e31f0
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000351.jsonl.zst
226 kB
xet
about 2 months ago
8f605808
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000352.jsonl.zst
183 kB
xet
about 2 months ago
4a0ab5c5
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000353.jsonl.zst
221 kB
xet
about 2 months ago
47a40dc7
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000354.jsonl.zst
249 kB
xet
about 2 months ago
73cf257b
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000355.jsonl.zst
252 kB
xet
about 2 months ago
505875e9
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000356.jsonl.zst
225 kB
xet
about 2 months ago
f4cc9902
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000357.jsonl.zst
183 kB
xet
about 2 months ago
212eecc0
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000358.jsonl.zst
203 kB
xet
about 2 months ago
66be1464
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000359.jsonl.zst
218 kB
xet
about 2 months ago
5750799b
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000360.jsonl.zst
157 kB
xet
about 2 months ago
8698df24
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000361.jsonl.zst
181 kB
xet
about 2 months ago
630c0e20
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000362.jsonl.zst
254 kB
xet
about 2 months ago
aa5c4525
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000363.jsonl.zst
203 kB
xet
about 2 months ago
7152c1f9
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000364.jsonl.zst
218 kB
xet
about 2 months ago
a6f9cf97
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000365.jsonl.zst
229 kB
xet
about 2 months ago
f9f43540
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000366.jsonl.zst
203 kB
xet
about 2 months ago
0ef3300b
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000367.jsonl.zst
171 kB
xet
about 2 months ago
be5b831f
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000368.jsonl.zst
258 kB
xet
about 2 months ago
92b947b9
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000369.jsonl.zst
218 kB
xet
about 2 months ago
651cdab2
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000370.jsonl.zst
217 kB
xet
about 2 months ago
c4027798
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000371.jsonl.zst
174 kB
xet
about 2 months ago
6913cf94
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000372.jsonl.zst
195 kB
xet
about 2 months ago
ac7e2c0d
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000373.jsonl.zst
150 kB
xet
about 2 months ago
daf5d5e7
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000374.jsonl.zst
83.1 kB
xet
about 2 months ago
e2a9c16f
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000375.jsonl.zst
87.8 kB
xet
about 2 months ago
961a125f
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000376.jsonl.zst
95 kB
xet
about 2 months ago
4eb2aa32
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000377.jsonl.zst
96.2 kB
xet
about 2 months ago
9f8a4aa6
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000378.jsonl.zst
278 kB
xet
about 2 months ago
f5c00dad
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000379.jsonl.zst
208 kB
xet
about 2 months ago
91d6e622
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000380.jsonl.zst
193 kB
xet
about 2 months ago
dca8cb6a
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000381.jsonl.zst
304 kB
xet
about 2 months ago
78349409
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000382.jsonl.zst
209 kB
xet
about 2 months ago
b216e789
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000383.jsonl.zst
244 kB
xet
about 2 months ago
c9c72031
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000384.jsonl.zst
211 kB
xet
about 2 months ago
9fafe4c4
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000385.jsonl.zst
185 kB
xet
about 2 months ago
9e0f59ba
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000386.jsonl.zst
165 kB
xet
about 2 months ago
80c0bbeb
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000387.jsonl.zst
266 kB
xet
about 2 months ago
d1d109ef
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000388.jsonl.zst
189 kB
xet
about 2 months ago
d9fd5218
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000389.jsonl.zst
269 kB
xet
about 2 months ago
c7dcd42f
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000390.jsonl.zst
197 kB
xet
about 2 months ago
9b8afa76
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0018__shard_00000391.jsonl.zst
198 kB
xet
about 2 months ago
86abcbd5
Load more
Sync this bucket
Mount this bucket
Total size
11.1 GB
Files
56,043
Last updated
Mar 24
Pre-warmed CDN
US
EU
US
EU
Contributors