Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
Buckets:
HCAI-Lab
/
dolma3-6t-sample-5000-docs
Follow
Human-Centered AI Lab
10
Files
xet
HCAI-Lab/dolma3-6t-sample-5000-docs
/
worker_0073
11.1 GB
56,043 files
Updated about 2 months ago
Ctrl+K
Name
Size
Uploaded
Xet hash
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000288.jsonl.zst
241 kB
xet
about 2 months ago
8901da15
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000289.jsonl.zst
272 kB
xet
about 2 months ago
dd887704
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000290.jsonl.zst
202 kB
xet
about 2 months ago
eafa7a28
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000291.jsonl.zst
205 kB
xet
about 2 months ago
d0daa789
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000292.jsonl.zst
216 kB
xet
about 2 months ago
f22c43ff
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000293.jsonl.zst
138 kB
xet
about 2 months ago
e1fe13b0
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000294.jsonl.zst
154 kB
xet
about 2 months ago
0ba93c9d
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000295.jsonl.zst
162 kB
xet
about 2 months ago
a96e8683
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000296.jsonl.zst
195 kB
xet
about 2 months ago
6a31882e
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000297.jsonl.zst
220 kB
xet
about 2 months ago
91149845
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000298.jsonl.zst
217 kB
xet
about 2 months ago
6b08e368
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000299.jsonl.zst
228 kB
xet
about 2 months ago
ea7dbb39
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000300.jsonl.zst
196 kB
xet
about 2 months ago
759f2795
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000301.jsonl.zst
251 kB
xet
about 2 months ago
c14b8c11
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000302.jsonl.zst
183 kB
xet
about 2 months ago
b4bd2250
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000303.jsonl.zst
207 kB
xet
about 2 months ago
47a18dfb
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000304.jsonl.zst
174 kB
xet
about 2 months ago
7ee816fb
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000305.jsonl.zst
184 kB
xet
about 2 months ago
63afa444
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000306.jsonl.zst
203 kB
xet
about 2 months ago
75f84bb2
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000307.jsonl.zst
205 kB
xet
about 2 months ago
d0722231
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000308.jsonl.zst
216 kB
xet
about 2 months ago
4213b9d5
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000309.jsonl.zst
166 kB
xet
about 2 months ago
f91bb029
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000310.jsonl.zst
178 kB
xet
about 2 months ago
44591518
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000311.jsonl.zst
186 kB
xet
about 2 months ago
5e518e54
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000312.jsonl.zst
179 kB
xet
about 2 months ago
3fdd00f7
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000313.jsonl.zst
255 kB
xet
about 2 months ago
40b5f19f
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000314.jsonl.zst
192 kB
xet
about 2 months ago
1de4f27d
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000315.jsonl.zst
290 kB
xet
about 2 months ago
d0a7f99d
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000316.jsonl.zst
154 kB
xet
about 2 months ago
2e57c061
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000317.jsonl.zst
180 kB
xet
about 2 months ago
cde7c83f
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000318.jsonl.zst
186 kB
xet
about 2 months ago
bfe49fc1
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000319.jsonl.zst
174 kB
xet
about 2 months ago
ac147404
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000320.jsonl.zst
221 kB
xet
about 2 months ago
a16d4f26
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000321.jsonl.zst
230 kB
xet
about 2 months ago
85925bac
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000322.jsonl.zst
269 kB
xet
about 2 months ago
9af9ecc4
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000323.jsonl.zst
167 kB
xet
about 2 months ago
80386548
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000324.jsonl.zst
224 kB
xet
about 2 months ago
cb6144b4
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000325.jsonl.zst
239 kB
xet
about 2 months ago
9c22332e
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000326.jsonl.zst
187 kB
xet
about 2 months ago
f7ffa149
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000327.jsonl.zst
175 kB
xet
about 2 months ago
a2eb323d
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000329.jsonl.zst
51.2 kB
xet
about 2 months ago
4486a3b9
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000331.jsonl.zst
127 kB
xet
about 2 months ago
a5ad2d20
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000332.jsonl.zst
194 kB
xet
about 2 months ago
9262d586
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000333.jsonl.zst
172 kB
xet
about 2 months ago
8b517003
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000334.jsonl.zst
197 kB
xet
about 2 months ago
8bf77f67
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000335.jsonl.zst
204 kB
xet
about 2 months ago
981c9b6b
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000336.jsonl.zst
196 kB
xet
about 2 months ago
64b03a95
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000337.jsonl.zst
122 kB
xet
about 2 months ago
686b2fb5
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000338.jsonl.zst
196 kB
xet
about 2 months ago
bffa85b8
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000339.jsonl.zst
189 kB
xet
about 2 months ago
215ffc55
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000340.jsonl.zst
220 kB
xet
about 2 months ago
1a71c079
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000341.jsonl.zst
233 kB
xet
about 2 months ago
66b6649c
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000342.jsonl.zst
231 kB
xet
about 2 months ago
b9d5faec
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000343.jsonl.zst
273 kB
xet
about 2 months ago
b0dce0ec
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000344.jsonl.zst
196 kB
xet
about 2 months ago
de626897
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000345.jsonl.zst
238 kB
xet
about 2 months ago
5d4577f3
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000346.jsonl.zst
188 kB
xet
about 2 months ago
5bb3e7ab
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000347.jsonl.zst
174 kB
xet
about 2 months ago
7301cbc9
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000348.jsonl.zst
175 kB
xet
about 2 months ago
2a264ec0
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000349.jsonl.zst
152 kB
xet
about 2 months ago
a4e912b8
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000350.jsonl.zst
244 kB
xet
about 2 months ago
b411b9ee
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000351.jsonl.zst
237 kB
xet
about 2 months ago
6ed37c9e
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000352.jsonl.zst
212 kB
xet
about 2 months ago
ce0d4f26
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000353.jsonl.zst
309 kB
xet
about 2 months ago
e475cb49
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000354.jsonl.zst
145 kB
xet
about 2 months ago
f569038f
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000355.jsonl.zst
141 kB
xet
about 2 months ago
3bb59ee9
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000356.jsonl.zst
178 kB
xet
about 2 months ago
39b852ff
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000357.jsonl.zst
279 kB
xet
about 2 months ago
eccf4717
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000358.jsonl.zst
153 kB
xet
about 2 months ago
8a924c37
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000359.jsonl.zst
236 kB
xet
about 2 months ago
42843b03
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000360.jsonl.zst
207 kB
xet
about 2 months ago
78c5668c
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000361.jsonl.zst
186 kB
xet
about 2 months ago
724e1992
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000362.jsonl.zst
208 kB
xet
about 2 months ago
42930b63
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000363.jsonl.zst
227 kB
xet
about 2 months ago
ce651490
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000364.jsonl.zst
125 kB
xet
about 2 months ago
42357f9e
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000365.jsonl.zst
192 kB
xet
about 2 months ago
60fdb87f
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000366.jsonl.zst
214 kB
xet
about 2 months ago
8258bc91
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000367.jsonl.zst
194 kB
xet
about 2 months ago
d6050a70
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000368.jsonl.zst
181 kB
xet
about 2 months ago
62fd5108
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000369.jsonl.zst
150 kB
xet
about 2 months ago
399fb954
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000370.jsonl.zst
190 kB
xet
about 2 months ago
b98a2749
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000371.jsonl.zst
181 kB
xet
about 2 months ago
e8b01f44
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000372.jsonl.zst
183 kB
xet
about 2 months ago
d0f5ee10
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000373.jsonl.zst
161 kB
xet
about 2 months ago
483c6ad4
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000374.jsonl.zst
163 kB
xet
about 2 months ago
3b528f3a
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000375.jsonl.zst
110 kB
xet
about 2 months ago
92cc74e2
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000376.jsonl.zst
158 kB
xet
about 2 months ago
68e30bef
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000377.jsonl.zst
162 kB
xet
about 2 months ago
f72a7727
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software-0019__shard_00000378.jsonl.zst
172 kB
xet
about 2 months ago
02e0d211
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0012__shard_00000000.jsonl.zst
159 kB
xet
about 2 months ago
1518a1ff
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0012__shard_00000001.jsonl.zst
108 kB
xet
about 2 months ago
ff60e611
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0012__shard_00000002.jsonl.zst
140 kB
xet
about 2 months ago
a21367bc
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0012__shard_00000003.jsonl.zst
201 kB
xet
about 2 months ago
70cadf24
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0012__shard_00000004.jsonl.zst
146 kB
xet
about 2 months ago
3a574593
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0012__shard_00000005.jsonl.zst
163 kB
xet
about 2 months ago
b363114f
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0012__shard_00000006.jsonl.zst
139 kB
xet
about 2 months ago
e31f2b3c
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0012__shard_00000007.jsonl.zst
173 kB
xet
about 2 months ago
894dd8b2
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0012__shard_00000008.jsonl.zst
157 kB
xet
about 2 months ago
e0c28322
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0012__shard_00000009.jsonl.zst
162 kB
xet
about 2 months ago
3f8d883b
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0012__shard_00000010.jsonl.zst
172 kB
xet
about 2 months ago
3b87490a
Load more
Sync this bucket
Mount this bucket
Total size
11.1 GB
Files
56,043
Last updated
Mar 24
Pre-warmed CDN
US
EU
US
EU
Contributors