Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
Buckets:
HCAI-Lab
/
dolma3-6t-sample-5000-docs
Follow
Human-Centered AI Lab
10
Files
xet
HCAI-Lab/dolma3-6t-sample-5000-docs
/
worker_0075
11.1 GB
56,043 files
Updated about 2 months ago
Ctrl+K
Name
Size
Uploaded
Xet hash
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000206.jsonl.zst
157 kB
xet
about 2 months ago
f8554efb
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000207.jsonl.zst
125 kB
xet
about 2 months ago
4c71abc4
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000208.jsonl.zst
181 kB
xet
about 2 months ago
b65ec8b6
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000209.jsonl.zst
196 kB
xet
about 2 months ago
9b7c335d
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000210.jsonl.zst
189 kB
xet
about 2 months ago
8bfa67eb
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000211.jsonl.zst
182 kB
xet
about 2 months ago
aa57ea15
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000212.jsonl.zst
141 kB
xet
about 2 months ago
8d536227
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000213.jsonl.zst
186 kB
xet
about 2 months ago
6fb5cb53
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000214.jsonl.zst
146 kB
xet
about 2 months ago
18acea11
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000215.jsonl.zst
164 kB
xet
about 2 months ago
8e9f6845
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000216.jsonl.zst
163 kB
xet
about 2 months ago
79106cb0
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000217.jsonl.zst
181 kB
xet
about 2 months ago
9129d444
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000218.jsonl.zst
190 kB
xet
about 2 months ago
4f937c6c
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000219.jsonl.zst
156 kB
xet
about 2 months ago
c65c9c24
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000220.jsonl.zst
145 kB
xet
about 2 months ago
d95f75e9
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000221.jsonl.zst
215 kB
xet
about 2 months ago
65adfd6d
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000222.jsonl.zst
129 kB
xet
about 2 months ago
cae16aa3
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000223.jsonl.zst
149 kB
xet
about 2 months ago
edf2c39d
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000224.jsonl.zst
174 kB
xet
about 2 months ago
5202252b
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000225.jsonl.zst
150 kB
xet
about 2 months ago
2e1062d5
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000226.jsonl.zst
165 kB
xet
about 2 months ago
ab688140
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000227.jsonl.zst
187 kB
xet
about 2 months ago
bbebd491
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000228.jsonl.zst
195 kB
xet
about 2 months ago
bd7722fa
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000229.jsonl.zst
130 kB
xet
about 2 months ago
30679f5c
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000230.jsonl.zst
153 kB
xet
about 2 months ago
ad1f8f56
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000231.jsonl.zst
214 kB
xet
about 2 months ago
abcecab9
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000232.jsonl.zst
166 kB
xet
about 2 months ago
5bb174e3
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000233.jsonl.zst
242 kB
xet
about 2 months ago
a3f2bb3c
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000234.jsonl.zst
165 kB
xet
about 2 months ago
ecb6f158
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000235.jsonl.zst
155 kB
xet
about 2 months ago
ba273f42
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000236.jsonl.zst
191 kB
xet
about 2 months ago
fe5eed2c
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000237.jsonl.zst
192 kB
xet
about 2 months ago
7f004a1c
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000238.jsonl.zst
155 kB
xet
about 2 months ago
9b7c9fd3
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000239.jsonl.zst
188 kB
xet
about 2 months ago
8fc2502f
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000240.jsonl.zst
232 kB
xet
about 2 months ago
feaa4cc8
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000241.jsonl.zst
270 kB
xet
about 2 months ago
ea8d9b8a
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000242.jsonl.zst
140 kB
xet
about 2 months ago
a0ebe6e0
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000243.jsonl.zst
167 kB
xet
about 2 months ago
2a5bb217
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000244.jsonl.zst
139 kB
xet
about 2 months ago
16c3df44
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000245.jsonl.zst
159 kB
xet
about 2 months ago
c41a2cff
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000246.jsonl.zst
196 kB
xet
about 2 months ago
e0c2f127
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000247.jsonl.zst
151 kB
xet
about 2 months ago
56c75a91
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000248.jsonl.zst
155 kB
xet
about 2 months ago
82778c98
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000249.jsonl.zst
143 kB
xet
about 2 months ago
456af797
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000250.jsonl.zst
182 kB
xet
about 2 months ago
30dd4ed1
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000251.jsonl.zst
159 kB
xet
about 2 months ago
0c180f91
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000252.jsonl.zst
158 kB
xet
about 2 months ago
1cfc43fe
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000253.jsonl.zst
117 kB
xet
about 2 months ago
352fba4f
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000254.jsonl.zst
164 kB
xet
about 2 months ago
fd5eb8e6
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000255.jsonl.zst
151 kB
xet
about 2 months ago
f9f5e2d1
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000256.jsonl.zst
233 kB
xet
about 2 months ago
e643b1b5
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000257.jsonl.zst
152 kB
xet
about 2 months ago
5b802675
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000258.jsonl.zst
158 kB
xet
about 2 months ago
19d3ec8d
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000259.jsonl.zst
155 kB
xet
about 2 months ago
c42e7fa5
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000260.jsonl.zst
212 kB
xet
about 2 months ago
5d60d07c
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000261.jsonl.zst
122 kB
xet
about 2 months ago
d35f2067
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000262.jsonl.zst
197 kB
xet
about 2 months ago
03800d00
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000263.jsonl.zst
200 kB
xet
about 2 months ago
77ecda57
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000264.jsonl.zst
119 kB
xet
about 2 months ago
dc035b05
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000265.jsonl.zst
170 kB
xet
about 2 months ago
1ad510a4
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000266.jsonl.zst
150 kB
xet
about 2 months ago
33a81910
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000267.jsonl.zst
206 kB
xet
about 2 months ago
6dd2d80e
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000268.jsonl.zst
149 kB
xet
about 2 months ago
f935c410
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000269.jsonl.zst
220 kB
xet
about 2 months ago
d6baf697
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000270.jsonl.zst
214 kB
xet
about 2 months ago
b2eb0a35
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000271.jsonl.zst
165 kB
xet
about 2 months ago
38fa59fe
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000272.jsonl.zst
180 kB
xet
about 2 months ago
6e380f9f
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000273.jsonl.zst
240 kB
xet
about 2 months ago
0d55b87d
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000274.jsonl.zst
194 kB
xet
about 2 months ago
27aee6f2
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000275.jsonl.zst
128 kB
xet
about 2 months ago
07cf07a6
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000276.jsonl.zst
138 kB
xet
about 2 months ago
04c6fc72
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000277.jsonl.zst
170 kB
xet
about 2 months ago
7615919e
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000278.jsonl.zst
129 kB
xet
about 2 months ago
0c4f896c
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000279.jsonl.zst
143 kB
xet
about 2 months ago
5ba0e2e3
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000280.jsonl.zst
208 kB
xet
about 2 months ago
721aea0c
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000281.jsonl.zst
175 kB
xet
about 2 months ago
07da6378
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000282.jsonl.zst
166 kB
xet
about 2 months ago
7fae66d2
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000283.jsonl.zst
181 kB
xet
about 2 months ago
bfdc1325
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000284.jsonl.zst
128 kB
xet
about 2 months ago
7aa4e1a6
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000285.jsonl.zst
161 kB
xet
about 2 months ago
4bc7bde4
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000286.jsonl.zst
193 kB
xet
about 2 months ago
5c3f4cb9
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000287.jsonl.zst
160 kB
xet
about 2 months ago
34d14c2d
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000289.jsonl.zst
57.6 kB
xet
about 2 months ago
7e1b26fb
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000291.jsonl.zst
47 kB
xet
about 2 months ago
17dbfabd
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000292.jsonl.zst
198 kB
xet
about 2 months ago
ed7bcf2e
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000293.jsonl.zst
128 kB
xet
about 2 months ago
955a7256
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000294.jsonl.zst
156 kB
xet
about 2 months ago
2b27e5f1
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000295.jsonl.zst
191 kB
xet
about 2 months ago
34159e80
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000296.jsonl.zst
144 kB
xet
about 2 months ago
793da5ba
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000297.jsonl.zst
133 kB
xet
about 2 months ago
52203089
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000298.jsonl.zst
145 kB
xet
about 2 months ago
c244f8e5
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000299.jsonl.zst
176 kB
xet
about 2 months ago
f560feaf
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000300.jsonl.zst
164 kB
xet
about 2 months ago
c2219a91
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000301.jsonl.zst
196 kB
xet
about 2 months ago
bea30b95
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000302.jsonl.zst
129 kB
xet
about 2 months ago
155cd602
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000303.jsonl.zst
183 kB
xet
about 2 months ago
982f96d2
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000304.jsonl.zst
136 kB
xet
about 2 months ago
fdded34a
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000305.jsonl.zst
172 kB
xet
about 2 months ago
be3d0203
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000306.jsonl.zst
153 kB
xet
about 2 months ago
b4e048ea
soc127__phase1_pool_shared__common_crawl__part_006__data__common_crawl-software_development-0014__shard_00000307.jsonl.zst
136 kB
xet
about 2 months ago
bd55a32c
Load more
Sync this bucket
Mount this bucket
Total size
11.1 GB
Files
56,043
Last updated
Mar 24
Pre-warmed CDN
US
EU
US
EU
Contributors