Spaces:

RosettaCommons
/

MolecularDatasetCurationGuide

Sleeping

App Files Files Community

maom commited on Feb 3

Commit

cf03a4c

verified ·

1 Parent(s): cd44827

Create 04_create_dataset_card

Browse files

Files changed (1) hide show

sections/04_create_dataset_card +6 -0

sections/04_create_dataset_card ADDED Viewed

	@@ -0,0 +1,6 @@

+## **4 Add an informative README.md**
+The `README.md` is a markdown file that is displayed when goes to the front page for the dataset. It should give appropriate context for the dataset and guidance on how to use it. As a template, consider having these sections--where teh parts in brackets should be filled in. See the [MIP](https://huggingface.co/datasets/RosettaCommons/MIP/blob/main/README.md) dataset as an example.
+| \# \<DATASET TITLE\>\<short descriptive abstract the dataset\>\#\# Quickstart Usage\#\#\# Install HuggingFace Datasets packageEach subset can be loaded into python using the HuggingFace \[datasets\](https://huggingface.co/docs/datasets/index) library.First, from the command line install the \`datasets\` library    $ pip install datasetsOptionally set the cache directory, e.g.    $ HF\_HOME=${HOME}/.cache/huggingface/    $ export HF\_HOMEthen, from within python load the datasets library    \>\>\> import datasets\#\#\# Load model datasets   To load one of the \<DATASET ID\> model datasets, use \`datasets.load\_dataset(...)\`:    \>\>\> dataset\_tag \= "\<DATASET TAG\>"    \>\>\> dataset \= datasets.load\_dataset(      path \= "\<HF PATH TO DATASET\>",      name \= f"{dataset\_tag}",      data\_dir \= f"{dataset\_tag}")\['train'\]and the dataset is loaded as a \`datasets.arrow\_dataset.Dataset\`    \>\>\> dataset    \<RESULT OF LOADING DATASET MODEL\>which is a column oriented format that can be accessed directly, converted in to a \`pandas.DataFrame\`, or \`parquet\` format, e.g.    \>\>\> dataset.data.column('\<COLUMN NAME IN DATASET\>')    \>\>\> dataset.to\_pandas()    \>\>\> dataset.to\_parquet("dataset.parquet")\#\#\# \<BREIF EXAMPLE OF HOW TO USE DIFFERENT PARTS OF THE DATASET\>\#\# Dataset Details\#\#\# Dataset Description\<DETAILED DESCRIPTION OF DATASET\>\- \*\*Acknowledgements:\*\*\<ACKNOWLEDGEMENTS\>\- \*\*License:\*\* \<LICENSE\>\#\#\# Dataset Sources\- \*\*Repository:\*\* \<URL FOR SOURCE OF DATA\>\- \*\*Paper:\*\* \<APA CITATION REFERENCE FOR SOURCE DATA\>\- \*\*Zenodo Repository:\*\* \<ZENODO LINK IF RELEVANT\>\#\# Uses\<DESCRIPTION OF INTENDED USE OF DATASET\>\#\#\# Out-of-Scope Use\<DESCRIPTION OF OUT OF SCOPE USES OF DATASET\>\#\#\# Source Data\<DESCRIPTION OF SOURCE DATA\>\#\# Citation\<BIBTEX REFERENCE FOR DATASET\>\#\# Dataset Card Authors\<NAME/INFO OF DATASET AUTHORS\> |
+| :---- |