| # Questions and Answers | |
| #### My dataset is a metadata curation of multiple datasets from different places, how should I license it? | |
| If you are building on others' work, It is important to respect their licenses. How you do this will fall into three buckets | |
| * The data cannot be used, e.g. because of a proprietary license restriction | |
| * The data can be used with or without some restriction. For example, if the source dataset is licensed under the creative commons open source license [CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/), it requires redistribution to "ShareAlike". Or, the authors require signing a specific usage license, for example, the Rocklin lab requires registering the use of the MegaScale dataset so they can demonstrate impact to maintain grant support. | |
| * The license of the source data is not clear. In this case, it is best to reach out to the original authors and either request that they adopt a license (open source or otherwise), or get explicit permission to re-share the data | |
| My dataset is made up of a bunch of small tabular datasets, should I make them each a sub-dataset or different "splits"? |