File size: 2,724 Bytes
6e640e8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
## **5 Add Metadata to the Dataset Card**

### **Overview**

A the top of the \`README.md file include metadata about the dataset in yaml format  
\---  
language: …  
license: …  
size\_categories: …  
pretty\_name: '...'  
tags: …  
dataset\_summary: …  
dataset\_description: …  
acknowledgements: …  
repo: …  
citation\_bibtex: …  
citation\_apa: …  
\---

For the full spec, see the Dataset Card specification 

* [Dataset Card Documentation](https://huggingface.co/docs/hub/en/datasets-cards)  
* [Dataset Card Specification](https://github.com/huggingface/hub-docs/blob/main/datasetcard.md?plain=1)  
* [Dataset Card Template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/datasetcard_template.md)

To allow the datasets to be loaded automatically through the datasets python library, additional info needs to be in the header of the README.md. It should reflect how the [repository is structured](https://huggingface.co/docs/datasets/en/repository_structure)  
configs:  
dataset\_info:

While it is possible to create these by hand, it highly recommended allowing it to be created automatically when uploaded via loading the dataset locally with [datasets.load\_dataset(...)](https://huggingface.co/docs/datasets/en/loading), then pushing it to the hub with [datasets.push\_to\_hub(...)](https://huggingface.co/docs/datasets/main/en/package_reference/main_classes#datasets.DatasetDict.push_to_hub)

* [Example of uploading data using push\_to\_hub()](https://huggingface.co/datasets/RosettaCommons/MegaScale/blob/main/src/03.1_upload_data.py)  
* See below for more details about how to use push\_to\_hub(...) for different common formats

### **Metadata fields**

#### License

* If the dataset is licensed under an existing standard license, then use it  
* If it is unclear, then the authors need to be contacted for clarification  
* Licensing it under the Rosetta License  
  * Add the following to the dataset card:

    license: other

    license\_name: rosetta-license-1.0

    license\_link: LICENSE.md

  * Upload the Rosetta [LICENSE.md](https://github.com/RosettaCommons/rosetta/blob/main/LICENSE.md) to the Dataset

#### Citation

* If the dataset has a DOI (e.g. associated with a published paper), use [doi2bib.org](http://doi2bib.org)  
* [DOI → APA converter](https://paperpile.com/t/doi-to-apa-converter/): 

#### tags

* Standard tags for searching for HuggingFace datasets  
* typically:

  \- biology

  \- chemistry

#### repo

* Github, repository, figshare, etc. URL for data or project

#### citation\_bibtex

* Citation in bibtex format  
* You can use https://www.doi2bib.org/

#### citation\_apa

* Citation in APA format