Buckets:
| # CERN Open Data on Hugging Face Storage Buckets | |
| A small **demonstration** of distributing [CERN Open Data](https://opendata.cern.ch) (CC0) via [Hugging Face Storage Buckets](https://huggingface.co/docs/hub/storage-buckets) — fast, mutable, Xet-backed object storage with a pre-warmable global CDN. *Independent demo, not an official CERN distribution.* | |
| ## Layout | |
| | Folder | Contents | | |
| |---|---| | |
| | [`atlas-zmumu-hepmc/`](./atlas-zmumu-hepmc) | ATLAS Z-boson Monte-Carlo simulation (HEPMC text, ~519 MB) | | |
| | [`cms-derived-csv/`](./cms-derived-csv) | CMS 2011 derived analysis CSVs — Higgs (→ γγ, → 4ℓ), Z → μμ, J/ψ, dimuon | | |
| | [`cms-nanoaod-root/`](./cms-nanoaod-root) | CMS NanoAOD — a **binary ROOT** file (Charmonium / J/ψ, ~0.97 GB) | | |
| A spread of formats — text simulation, derived CSV tables, and a raw binary ROOT file — each folder with its own `README.md` (which also renders on the Hub). Shows how a real multi-dataset distribution could be organised and documented inside a bucket. | |
| ## Access | |
| ```bash | |
| # A single file | |
| hf buckets cp hf://buckets/davanstrien/cern-opendata-demo/cms-derived-csv/Zmumu.csv ./Zmumu.csv | |
| # Or the whole bucket | |
| hf buckets sync hf://buckets/davanstrien/cern-opendata-demo ./cern-open-data | |
| ``` | |
| A plain `hf buckets cp` pulls at ~32 MB/s on a home connection; data can also be [pre-warmed to a CDN region](https://huggingface.co/docs/hub/storage-buckets#pre-warming-and-cdn) (GCP / AWS, US / EU) so it sits next to your compute. | |
| ## Why buckets for open scientific data | |
| - **Raw, as-is** — keep original formats (ROOT, HEPMC, CSV, …), no reshaping into Parquet/LFS. | |
| - **Fast + global** — Xet-backed dedup, CDN pre-warming near your cluster. | |
| - **Documented** — `README.md` files render directly on the bucket pages. | |
| See the [Storage Buckets blog post](https://huggingface.co/blog/storage-buckets) and [docs](https://huggingface.co/docs/hub/storage-buckets). | |
| --- | |
| Data © the respective CERN experiments, released under CC0 via the [CERN Open Data Portal](https://opendata.cern.ch). | |
Xet Storage Details
- Size:
- 2.05 kB
- Xet hash:
- a29e56e71e4430914be0a4c68d9580c81485426f760578f262c424ad87f1b64e
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.