Buckets:
| Name | Size | Uploaded | Xet hash |
|---|---|---|---|
| atlas-zmumu-hepmc | 2 items | ||
| cms-derived-csv | 6 items | ||
| cms-nanoaod-root | 2 items | ||
| README.md | 2.05 kB xet | a29e56e7 |
CERN Open Data on Hugging Face Storage Buckets
A small demonstration of distributing CERN Open Data (CC0) via Hugging Face Storage Buckets — fast, mutable, Xet-backed object storage with a pre-warmable global CDN. Independent demo, not an official CERN distribution.
Layout
| Folder | Contents |
|---|---|
atlas-zmumu-hepmc/ |
ATLAS Z-boson Monte-Carlo simulation (HEPMC text, ~519 MB) |
cms-derived-csv/ |
CMS 2011 derived analysis CSVs — Higgs (→ γγ, → 4ℓ), Z → μμ, J/ψ, dimuon |
cms-nanoaod-root/ |
CMS NanoAOD — a binary ROOT file (Charmonium / J/ψ, ~0.97 GB) |
A spread of formats — text simulation, derived CSV tables, and a raw binary ROOT file — each folder with its own README.md (which also renders on the Hub). Shows how a real multi-dataset distribution could be organised and documented inside a bucket.
Access
# A single file
hf buckets cp hf://buckets/davanstrien/cern-opendata-demo/cms-derived-csv/Zmumu.csv ./Zmumu.csv
# Or the whole bucket
hf buckets sync hf://buckets/davanstrien/cern-opendata-demo ./cern-open-data
A plain hf buckets cp pulls at ~32 MB/s on a home connection; data can also be pre-warmed to a CDN region (GCP / AWS, US / EU) so it sits next to your compute.
Why buckets for open scientific data
- Raw, as-is — keep original formats (ROOT, HEPMC, CSV, …), no reshaping into Parquet/LFS.
- Fast + global — Xet-backed dedup, CDN pre-warming near your cluster.
- Documented —
README.mdfiles render directly on the bucket pages.
See the Storage Buckets blog post and docs.
Data © the respective CERN experiments, released under CC0 via the CERN Open Data Portal.
- Total size
- 1.5 GB
- Files
- 11
- Last updated
- Jun 4
- Pre-warmed CDN
- US EU US EU