| |
|
|
| |
|
|
| A the top of the \`README.md file include metadata about the dataset in yaml format |
| \--- |
| language: … |
| license: … |
| size\_categories: … |
| pretty\_name: '...' |
| tags: … |
| dataset\_summary: … |
| dataset\_description: … |
| acknowledgements: … |
| repo: … |
| citation\_bibtex: … |
| citation\_apa: … |
| \--- |
|
|
| For the full spec, see the Dataset Card specification |
|
|
| * [Dataset Card Documentation](https://huggingface.co/docs/hub/en/datasets-cards) |
| * [Dataset Card Specification](https://github.com/huggingface/hub-docs/blob/main/datasetcard.md?plain=1) |
| * [Dataset Card Template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/datasetcard_template.md) |
|
|
| To allow the datasets to be loaded automatically through the datasets python library, additional info needs to be in the header of the README.md. It should reflect how the [repository is structured](https://huggingface.co/docs/datasets/en/repository_structure) |
| configs: |
| dataset\_info: |
|
|
| While it is possible to create these by hand, it highly recommended allowing it to be created automatically when uploaded via loading the dataset locally with [datasets.load\_dataset(...)](https://huggingface.co/docs/datasets/en/loading), then pushing it to the hub with [datasets.push\_to\_hub(...)](https://huggingface.co/docs/datasets/main/en/package_reference/main_classes |
|
|
| * [Example of uploading data using push\_to\_hub()](https://huggingface.co/datasets/RosettaCommons/MegaScale/blob/main/src/03.1_upload_data.py) |
| * See below for more details about how to use push\_to\_hub(...) for different common formats |
|
|
| |
|
|
| |
|
|
| * If the dataset is licensed under an existing standard license, then use it |
| * If it is unclear, then the authors need to be contacted for clarification |
| * Licensing it under the Rosetta License |
| * Add the following to the dataset card: |
|
|
| license: other |
|
|
| license\_name: rosetta-license-1.0 |
|
|
| license\_link: LICENSE.md |
|
|
| * Upload the Rosetta [LICENSE.md](https://github.com/RosettaCommons/rosetta/blob/main/LICENSE.md) to the Dataset |
|
|
| |
|
|
| * If the dataset has a DOI (e.g. associated with a published paper), use [doi2bib.org](http://doi2bib.org) |
| * [DOI → APA converter](https://paperpile.com/t/doi-to-apa-converter/): |
|
|
| |
|
|
| * Standard tags for searching for HuggingFace datasets |
| * typically: |
|
|
| \- biology |
|
|
| \- chemistry |
|
|
| |
|
|
| * Github, repository, figshare, etc. URL for data or project |
|
|
| |
|
|
| * Citation in bibtex format |
| * You can use https://www.doi2bib.org/ |
|
|
| |
|
|
| * Citation in APA format |
|
|