Buckets:

davanstrien
/

cern-opendata-demo

1.5 GB

11 files

Updated about 2 months ago

Ctrl+K

Name	Size	Uploaded	Xet hash
atlas-zmumu-hepmc		about 2 months ago	2 items
cms-derived-csv		about 2 months ago	6 items
cms-nanoaod-root		about 2 months ago	2 items
README.md	2.05 kB xet	about 2 months ago	a29e56e7

README.md

CERN Open Data on Hugging Face Storage Buckets

A small demonstration of distributing CERN Open Data (CC0) via Hugging Face Storage Buckets — fast, mutable, Xet-backed object storage with a pre-warmable global CDN. Independent demo, not an official CERN distribution.

Layout

Folder	Contents
`atlas-zmumu-hepmc/`	ATLAS Z-boson Monte-Carlo simulation (HEPMC text, ~519 MB)
`cms-derived-csv/`	CMS 2011 derived analysis CSVs — Higgs (→ γγ, → 4ℓ), Z → μμ, J/ψ, dimuon
`cms-nanoaod-root/`	CMS NanoAOD — a binary ROOT file (Charmonium / J/ψ, ~0.97 GB)

A spread of formats — text simulation, derived CSV tables, and a raw binary ROOT file — each folder with its own README.md (which also renders on the Hub). Shows how a real multi-dataset distribution could be organised and documented inside a bucket.

Access

# A single file
hf buckets cp hf://buckets/davanstrien/cern-opendata-demo/cms-derived-csv/Zmumu.csv ./Zmumu.csv

# Or the whole bucket
hf buckets sync hf://buckets/davanstrien/cern-opendata-demo ./cern-open-data

A plain hf buckets cp pulls at ~32 MB/s on a home connection; data can also be pre-warmed to a CDN region (GCP / AWS, US / EU) so it sits next to your compute.

Why buckets for open scientific data

Raw, as-is — keep original formats (ROOT, HEPMC, CSV, …), no reshaping into Parquet/LFS.
Fast + global — Xet-backed dedup, CDN pre-warming near your cluster.
Documented — README.md files render directly on the bucket pages.

See the Storage Buckets blog post and docs.

Total size: 1.5 GB

Files: 11

Last updated: Jun 4

Pre-warmed CDN: US EU US EU

CERN Open Data on Hugging Face Storage Buckets

Layout

Access

Why buckets for open scientific data

Contributors