Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
Buckets:
HCAI-Lab
/
dolma3-6t-sample-5000-docs
Follow
Human-Centered AI Lab
10
Files
xet
HCAI-Lab/dolma3-6t-sample-5000-docs
/
worker_0069
11.1 GB
56,043 files
Updated about 2 months ago
Ctrl+K
Name
Size
Uploaded
Xet hash
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0013__shard_00000148.jsonl.zst
207 kB
xet
about 2 months ago
41a57a79
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0013__shard_00000167.jsonl.zst
204 kB
xet
about 2 months ago
73049243
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0013__shard_00000207.jsonl.zst
200 kB
xet
about 2 months ago
d36695b5
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0013__shard_00000221.jsonl.zst
244 kB
xet
about 2 months ago
4844d53a
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0013__shard_00000254.jsonl.zst
246 kB
xet
about 2 months ago
a23f1e55
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0013__shard_00000271.jsonl.zst
265 kB
xet
about 2 months ago
a020e226
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0013__shard_00000292.jsonl.zst
215 kB
xet
about 2 months ago
66eeaf13
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0013__shard_00000307.jsonl.zst
237 kB
xet
about 2 months ago
688d1f58
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0013__shard_00000328.jsonl.zst
199 kB
xet
about 2 months ago
d8328c97
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0013__shard_00000353.jsonl.zst
240 kB
xet
about 2 months ago
e1551578
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0013__shard_00000354.jsonl.zst
264 kB
xet
about 2 months ago
f3665677
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0013__shard_00000360.jsonl.zst
251 kB
xet
about 2 months ago
03130198
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0013__shard_00000374.jsonl.zst
187 kB
xet
about 2 months ago
d944cc38
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0014__shard_00000002.jsonl.zst
224 kB
xet
about 2 months ago
97288671
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0014__shard_00000003.jsonl.zst
291 kB
xet
about 2 months ago
a1456720
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0014__shard_00000006.jsonl.zst
219 kB
xet
about 2 months ago
081c7c31
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0014__shard_00000008.jsonl.zst
257 kB
xet
about 2 months ago
9f060961
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0014__shard_00000014.jsonl.zst
131 kB
xet
about 2 months ago
45f8d716
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0014__shard_00000020.jsonl.zst
202 kB
xet
about 2 months ago
a70ad8e2
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0014__shard_00000024.jsonl.zst
235 kB
xet
about 2 months ago
ab2b7e0c
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0014__shard_00000030.jsonl.zst
270 kB
xet
about 2 months ago
7c9858dc
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0014__shard_00000032.jsonl.zst
237 kB
xet
about 2 months ago
009f4588
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0014__shard_00000049.jsonl.zst
248 kB
xet
about 2 months ago
c06d4fd3
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0014__shard_00000065.jsonl.zst
232 kB
xet
about 2 months ago
e40ca99e
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0014__shard_00000069.jsonl.zst
198 kB
xet
about 2 months ago
8b8d7131
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0014__shard_00000070.jsonl.zst
240 kB
xet
about 2 months ago
530164a1
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0014__shard_00000082.jsonl.zst
266 kB
xet
about 2 months ago
5f2a015e
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0014__shard_00000085.jsonl.zst
257 kB
xet
about 2 months ago
7d0e6dcd
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0014__shard_00000087.jsonl.zst
210 kB
xet
about 2 months ago
2a6d613d
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0014__shard_00000088.jsonl.zst
231 kB
xet
about 2 months ago
104be852
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0014__shard_00000100.jsonl.zst
193 kB
xet
about 2 months ago
6ae436e7
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0014__shard_00000103.jsonl.zst
224 kB
xet
about 2 months ago
ec8b8dee
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0014__shard_00000110.jsonl.zst
188 kB
xet
about 2 months ago
e2436f98
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0014__shard_00000111.jsonl.zst
203 kB
xet
about 2 months ago
fbd9bb15
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0014__shard_00000113.jsonl.zst
293 kB
xet
about 2 months ago
a5df833f
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0014__shard_00000115.jsonl.zst
222 kB
xet
about 2 months ago
626e9344
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0014__shard_00000118.jsonl.zst
246 kB
xet
about 2 months ago
9dff6def
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0014__shard_00000121.jsonl.zst
189 kB
xet
about 2 months ago
198219f0
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0014__shard_00000122.jsonl.zst
244 kB
xet
about 2 months ago
afae549d
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0014__shard_00000128.jsonl.zst
275 kB
xet
about 2 months ago
78fad168
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0014__shard_00000139.jsonl.zst
211 kB
xet
about 2 months ago
a56c0d33
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0014__shard_00000147.jsonl.zst
227 kB
xet
about 2 months ago
71b1b96c
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0014__shard_00000148.jsonl.zst
231 kB
xet
about 2 months ago
47d48f37
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0014__shard_00000150.jsonl.zst
194 kB
xet
about 2 months ago
bb3ab748
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0014__shard_00000158.jsonl.zst
225 kB
xet
about 2 months ago
f3fd07df
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0014__shard_00000167.jsonl.zst
197 kB
xet
about 2 months ago
cd4e3cd7
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0014__shard_00000172.jsonl.zst
206 kB
xet
about 2 months ago
34bff2a4
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0014__shard_00000179.jsonl.zst
301 kB
xet
about 2 months ago
7cabc10d
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0014__shard_00000219.jsonl.zst
175 kB
xet
about 2 months ago
152663e0
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0014__shard_00000224.jsonl.zst
262 kB
xet
about 2 months ago
cb9ddd26
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0014__shard_00000225.jsonl.zst
289 kB
xet
about 2 months ago
b1548456
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0014__shard_00000242.jsonl.zst
199 kB
xet
about 2 months ago
5dd702db
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0014__shard_00000244.jsonl.zst
288 kB
xet
about 2 months ago
f43c6863
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0014__shard_00000248.jsonl.zst
278 kB
xet
about 2 months ago
6a529f70
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0014__shard_00000249.jsonl.zst
202 kB
xet
about 2 months ago
1b160af7
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0014__shard_00000257.jsonl.zst
284 kB
xet
about 2 months ago
62f8911c
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0014__shard_00000270.jsonl.zst
245 kB
xet
about 2 months ago
6d104ec6
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0014__shard_00000284.jsonl.zst
204 kB
xet
about 2 months ago
7936e54d
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0014__shard_00000292.jsonl.zst
246 kB
xet
about 2 months ago
7783111d
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0014__shard_00000303.jsonl.zst
252 kB
xet
about 2 months ago
af5403e5
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0014__shard_00000314.jsonl.zst
228 kB
xet
about 2 months ago
3d167eee
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0014__shard_00000317.jsonl.zst
247 kB
xet
about 2 months ago
c4293dd0
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0014__shard_00000322.jsonl.zst
233 kB
xet
about 2 months ago
81924158
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0014__shard_00000344.jsonl.zst
245 kB
xet
about 2 months ago
e81406d5
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0014__shard_00000361.jsonl.zst
255 kB
xet
about 2 months ago
4e657674
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0014__shard_00000366.jsonl.zst
209 kB
xet
about 2 months ago
24b62f0a
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0014__shard_00000373.jsonl.zst
187 kB
xet
about 2 months ago
fac6f15d
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0014__shard_00000376.jsonl.zst
212 kB
xet
about 2 months ago
b806725f
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0014__shard_00000379.jsonl.zst
231 kB
xet
about 2 months ago
3e97df09
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0014__shard_00000383.jsonl.zst
203 kB
xet
about 2 months ago
8db0f67f
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0014__shard_00000386.jsonl.zst
259 kB
xet
about 2 months ago
b513b984
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0014__shard_00000387.jsonl.zst
188 kB
xet
about 2 months ago
13f84745
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0014__shard_00000390.jsonl.zst
280 kB
xet
about 2 months ago
e0cb0280
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0014__shard_00000391.jsonl.zst
274 kB
xet
about 2 months ago
cca86436
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0014__shard_00000397.jsonl.zst
185 kB
xet
about 2 months ago
867f9170
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0014__shard_00000405.jsonl.zst
206 kB
xet
about 2 months ago
9d60637c
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0014__shard_00000406.jsonl.zst
231 kB
xet
about 2 months ago
0e9b577f
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0014__shard_00000407.jsonl.zst
229 kB
xet
about 2 months ago
39e39fb9
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0014__shard_00000408.jsonl.zst
200 kB
xet
about 2 months ago
1ebf8a89
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0014__shard_00000417.jsonl.zst
206 kB
xet
about 2 months ago
449c4bef
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0015__shard_00000003.jsonl.zst
239 kB
xet
about 2 months ago
d2040ba4
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0015__shard_00000005.jsonl.zst
204 kB
xet
about 2 months ago
17ac131e
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0015__shard_00000006.jsonl.zst
207 kB
xet
about 2 months ago
6abf96b7
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0015__shard_00000008.jsonl.zst
123 kB
xet
about 2 months ago
30b34d14
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0015__shard_00000014.jsonl.zst
92.8 kB
xet
about 2 months ago
79adca96
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0015__shard_00000015.jsonl.zst
131 kB
xet
about 2 months ago
0034b42c
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0015__shard_00000017.jsonl.zst
94.6 kB
xet
about 2 months ago
dbaf5c77
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0015__shard_00000024.jsonl.zst
227 kB
xet
about 2 months ago
9b1cddf2
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0015__shard_00000027.jsonl.zst
225 kB
xet
about 2 months ago
cd27c815
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0015__shard_00000028.jsonl.zst
212 kB
xet
about 2 months ago
eec1b6d1
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0015__shard_00000034.jsonl.zst
289 kB
xet
about 2 months ago
a6ce1895
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0015__shard_00000038.jsonl.zst
208 kB
xet
about 2 months ago
cd66f70b
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0015__shard_00000039.jsonl.zst
234 kB
xet
about 2 months ago
1650854d
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0015__shard_00000043.jsonl.zst
278 kB
xet
about 2 months ago
915cd78c
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0015__shard_00000044.jsonl.zst
242 kB
xet
about 2 months ago
880fb305
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0015__shard_00000045.jsonl.zst
264 kB
xet
about 2 months ago
8ebc1ea2
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0015__shard_00000050.jsonl.zst
253 kB
xet
about 2 months ago
6e675a0a
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0015__shard_00000054.jsonl.zst
269 kB
xet
about 2 months ago
c1e532b2
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0015__shard_00000067.jsonl.zst
283 kB
xet
about 2 months ago
6656f9fd
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0015__shard_00000069.jsonl.zst
246 kB
xet
about 2 months ago
6fd5e54a
Load more
Sync this bucket
Mount this bucket
Total size
11.1 GB
Files
56,043
Last updated
Mar 24
Pre-warmed CDN
US
EU
US
EU
Contributors