Spaces:

RosettaCommons
/

MolecularDatasetCurationGuide

Sleeping

App Files Files Community

MolecularDatasetCurationGuide / sections /05_add_dataset_metadata.md

maom

Rename sections/05_add_dataset_metadata to sections/05_add_dataset_metadata.md

ece0e2f verified about 1 month ago

preview code

raw

history blame contribute delete

2.72 kB

5 Add Metadata to the Dataset Card

Overview

A the top of the `README.md file include metadata about the dataset in yaml format
---
language: …
license: …
size_categories: …
pretty_name: '...'
tags: …
dataset_summary: …
dataset_description: …
acknowledgements: …
repo: …
citation_bibtex: …
citation_apa: …
---

For the full spec, see the Dataset Card specification

To allow the datasets to be loaded automatically through the datasets python library, additional info needs to be in the header of the README.md. It should reflect how the repository is structured
configs:
dataset_info:

While it is possible to create these by hand, it highly recommended allowing it to be created automatically when uploaded via loading the dataset locally with datasets.load_dataset(...), then pushing it to the hub with datasets.push_to_hub(...)

Example of uploading data using push_to_hub()
See below for more details about how to use push_to_hub(...) for different common formats

Metadata fields

License

If the dataset is licensed under an existing standard license, then use it
If it is unclear, then the authors need to be contacted for clarification
Licensing it under the Rosetta License
- Add the following to the dataset card:
  
  license: other
  
  license_name: rosetta-license-1.0
  
  license_link: LICENSE.md
- Upload the Rosetta LICENSE.md to the Dataset

Citation

If the dataset has a DOI (e.g. associated with a published paper), use doi2bib.org
DOI → APA converter:

repo

Github, repository, figshare, etc. URL for data or project

citation_bibtex

Citation in bibtex format
You can use https://www.doi2bib.org/

citation_apa

Citation in APA format