Useful but difficult to access
Dear FAIR Chemistry Team,
This is a feedback on the distribution format of the ODAC25 dataset.
While the dataset is a valuable resource, the current packaging (monolithic 500GB+ tarballs) makes the data effectively inaccessible for researchers without HPC-scale storage. Specifically:
"Deep" vs. "Wide" Ordering: The archive appears to place the massive mof_plus_adsorbate trajectory data before the gcmc or mof summary partitions. This prevents streaming or partial downloading of the high-value/low-volume summary data needed for screening models.
Lack of Metadata Separation: Users must download terabytes of raw relaxation frames just to access the final adsorption energies and structure files.
To align better with FAIR principles (specifically Accessibility), I strongly suggest releasing a standalone "Light" version (e.g., odac25_summary.tar.gz) containing only the gcmc (single-point) and mof (bare structure) partitions. This would reduce the download size from ~800GB to likely <10GB, democratizing access for the broader material science community.
Thank you.