MolecularDatasetCurationGuide / sections /04_create_dataset_card.md
maom's picture
Rename sections/04_create_dataset_card to sections/04_create_dataset_card.md
040e0f4 verified
|
raw
history blame
2.44 kB

4 Add an informative README.md

The README.md is a markdown file that is displayed when goes to the front page for the dataset. It should give appropriate context for the dataset and guidance on how to use it. As a template, consider having these sections--where teh parts in brackets should be filled in. See the MIP dataset as an example.

# <DATASET TITLE> <short descriptive abstract the dataset> ## Quickstart Usage ### Install HuggingFace Datasets package Each subset can be loaded into python using the HuggingFace [datasets](https://huggingface.co/docs/datasets/index) library. First, from the command line install the `datasets` library $ pip install datasets Optionally set the cache directory, e.g. $ HF_HOME=${HOME}/.cache/huggingface/ $ export HF_HOME then, from within python load the datasets library >>> import datasets ### Load model datasets To load one of the <DATASET ID> model datasets, use `datasets.load_dataset(...)`: >>> dataset_tag = "<DATASET TAG>" >>> dataset = datasets.load_dataset( path = "<HF PATH TO DATASET>", name = f"{dataset_tag}", data_dir = f"{dataset_tag}")['train'] and the dataset is loaded as a `datasets.arrow_dataset.Dataset` >>> dataset <RESULT OF LOADING DATASET MODEL> which is a column oriented format that can be accessed directly, converted in to a `pandas.DataFrame`, or `parquet` format, e.g. >>> dataset.data.column('<COLUMN NAME IN DATASET>') >>> dataset.to_pandas() >>> dataset.to_parquet("dataset.parquet") ### <BREIF EXAMPLE OF HOW TO USE DIFFERENT PARTS OF THE DATASET> ## Dataset Details ### Dataset Description <DETAILED DESCRIPTION OF DATASET> - **Acknowledgements:** <ACKNOWLEDGEMENTS> - **License:** <LICENSE> ### Dataset Sources - **Repository:** <URL FOR SOURCE OF DATA> - **Paper:** <APA CITATION REFERENCE FOR SOURCE DATA> - **Zenodo Repository:** <ZENODO LINK IF RELEVANT> ## Uses <DESCRIPTION OF INTENDED USE OF DATASET> ### Out-of-Scope Use <DESCRIPTION OF OUT OF SCOPE USES OF DATASET> ### Source Data <DESCRIPTION OF SOURCE DATA> ## Citation <BIBTEX REFERENCE FOR DATASET> ## Dataset Card Authors <NAME/INFO OF DATASET AUTHORS>