File size: 2,724 Bytes
6e640e8 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 | ## **5 Add Metadata to the Dataset Card**
### **Overview**
A the top of the \`README.md file include metadata about the dataset in yaml format
\---
language: …
license: …
size\_categories: …
pretty\_name: '...'
tags: …
dataset\_summary: …
dataset\_description: …
acknowledgements: …
repo: …
citation\_bibtex: …
citation\_apa: …
\---
For the full spec, see the Dataset Card specification
* [Dataset Card Documentation](https://huggingface.co/docs/hub/en/datasets-cards)
* [Dataset Card Specification](https://github.com/huggingface/hub-docs/blob/main/datasetcard.md?plain=1)
* [Dataset Card Template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/datasetcard_template.md)
To allow the datasets to be loaded automatically through the datasets python library, additional info needs to be in the header of the README.md. It should reflect how the [repository is structured](https://huggingface.co/docs/datasets/en/repository_structure)
configs:
dataset\_info:
While it is possible to create these by hand, it highly recommended allowing it to be created automatically when uploaded via loading the dataset locally with [datasets.load\_dataset(...)](https://huggingface.co/docs/datasets/en/loading), then pushing it to the hub with [datasets.push\_to\_hub(...)](https://huggingface.co/docs/datasets/main/en/package_reference/main_classes#datasets.DatasetDict.push_to_hub)
* [Example of uploading data using push\_to\_hub()](https://huggingface.co/datasets/RosettaCommons/MegaScale/blob/main/src/03.1_upload_data.py)
* See below for more details about how to use push\_to\_hub(...) for different common formats
### **Metadata fields**
#### License
* If the dataset is licensed under an existing standard license, then use it
* If it is unclear, then the authors need to be contacted for clarification
* Licensing it under the Rosetta License
* Add the following to the dataset card:
license: other
license\_name: rosetta-license-1.0
license\_link: LICENSE.md
* Upload the Rosetta [LICENSE.md](https://github.com/RosettaCommons/rosetta/blob/main/LICENSE.md) to the Dataset
#### Citation
* If the dataset has a DOI (e.g. associated with a published paper), use [doi2bib.org](http://doi2bib.org)
* [DOI → APA converter](https://paperpile.com/t/doi-to-apa-converter/):
#### tags
* Standard tags for searching for HuggingFace datasets
* typically:
\- biology
\- chemistry
#### repo
* Github, repository, figshare, etc. URL for data or project
#### citation\_bibtex
* Citation in bibtex format
* You can use https://www.doi2bib.org/
#### citation\_apa
* Citation in APA format
|