Buckets:
| <meta charset="utf-8" /><meta http-equiv="content-security-policy" content=""><meta name="hf:doc:metadata" content="{"local":"the-dataset-object","sections":[{"local":"metadata","title":"Metadata"},{"local":"features-and-columns","title":"Features and columns"},{"local":"rows-slices-batches-and-columns","title":"Rows, slices, batches, and columns"}],"title":"The Dataset object"}" data-svelte="svelte-1phssyn"> | |
| <link rel="modulepreload" href="/docs/datasets/v2.2.2/en/_app/assets/pages/__layout.svelte-efc77dbd.css"> | |
| <link rel="modulepreload" href="/docs/datasets/v2.2.2/en/_app/start-0f8c1da7.js"> | |
| <link rel="modulepreload" href="/docs/datasets/v2.2.2/en/_app/chunks/vendor-8138ceec.js"> | |
| <link rel="modulepreload" href="/docs/datasets/v2.2.2/en/_app/chunks/paths-4b3c6e7e.js"> | |
| <link rel="modulepreload" href="/docs/datasets/v2.2.2/en/_app/pages/__layout.svelte-efb8e839.js"> | |
| <link rel="modulepreload" href="/docs/datasets/v2.2.2/en/_app/pages/access.mdx-f196cd48.js"> | |
| <link rel="modulepreload" href="/docs/datasets/v2.2.2/en/_app/chunks/IconCopyLink-2dd3a6ac.js"> | |
| <link rel="modulepreload" href="/docs/datasets/v2.2.2/en/_app/chunks/CodeBlock-fc89709f.js"> | |
| <h1 class="relative group"><a id="the-dataset-object" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#the-dataset-object"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> | |
| <span>The Dataset object | |
| </span></h1> | |
| <p>In the previous tutorial, you learned how to successfully load a dataset. This section will familiarize you with the <a href="/docs/datasets/v2.2.2/en/package_reference/main_classes#datasets.Dataset">Dataset</a> object. You will learn about the metadata stored inside a Dataset object, and the basics of querying a Dataset object to return rows and columns.</p> | |
| <p>A <a href="/docs/datasets/v2.2.2/en/package_reference/main_classes#datasets.Dataset">Dataset</a> object is returned when you load an instance of a dataset. This object behaves like a normal Python container.</p> | |
| <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> | |
| <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> | |
| Copied</div></button></div> | |
| <pre><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span><span class="hljs-keyword">from</span> datasets <span class="hljs-keyword">import</span> load_dataset | |
| <span class="hljs-meta">>>> </span>dataset = load_dataset(<span class="hljs-string">'glue'</span>, <span class="hljs-string">'mrpc'</span>, split=<span class="hljs-string">'train'</span>)<!-- HTML_TAG_END --></pre></div> | |
| <h2 class="relative group"><a id="metadata" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#metadata"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> | |
| <span>Metadata | |
| </span></h2> | |
| <p>The <a href="/docs/datasets/v2.2.2/en/package_reference/main_classes#datasets.Dataset">Dataset</a> object contains a lot of useful information about your dataset. For example, access <a href="/docs/datasets/v2.2.2/en/package_reference/main_classes#datasets.DatasetInfo">DatasetInfo</a> to return a short description of the dataset, the authors, and even the dataset size. This will give you a quick snapshot of the datasets most important attributes.</p> | |
| <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> | |
| <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> | |
| Copied</div></button></div> | |
| <pre><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span>dataset.info | |
| DatasetInfo( | |
| description=<span class="hljs-string">'GLUE, the General Language Understanding Evaluation benchmark\n(https://gluebenchmark.com/) is a collection of resources for training,\nevaluating, and analyzing natural language understanding systems.\n\n'</span>, | |
| citation=<span class="hljs-string">'@inproceedings{dolan2005automatically,\n title={Automatically constructing a corpus of sentential paraphrases},\n author={Dolan, William B and Brockett, Chris},\n booktitle={Proceedings of the Third International Workshop on Paraphrasing (IWP2005)},\n year={2005}\n}\n@inproceedings{wang2019glue,\n title={{GLUE}: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding},\n author={Wang, Alex and Singh, Amanpreet and Michael, Julian and Hill, Felix and Levy, Omer and Bowman, Samuel R.},\n note={In the Proceedings of ICLR.},\n year={2019}\n}\n'</span>, homepage=<span class="hljs-string">'https://www.microsoft.com/en-us/download/details.aspx?id=52398'</span>, | |
| license=<span class="hljs-string">''</span>, | |
| features={<span class="hljs-string">'sentence1'</span>: Value(dtype=<span class="hljs-string">'string'</span>, <span class="hljs-built_in">id</span>=<span class="hljs-literal">None</span>), <span class="hljs-string">'sentence2'</span>: Value(dtype=<span class="hljs-string">'string'</span>, <span class="hljs-built_in">id</span>=<span class="hljs-literal">None</span>), <span class="hljs-string">'label'</span>: ClassLabel(num_classes=<span class="hljs-number">2</span>, names=[<span class="hljs-string">'not_equivalent'</span>, <span class="hljs-string">'equivalent'</span>], names_file=<span class="hljs-literal">None</span>, <span class="hljs-built_in">id</span>=<span class="hljs-literal">None</span>), <span class="hljs-string">'idx'</span>: Value(dtype=<span class="hljs-string">'int32'</span>, <span class="hljs-built_in">id</span>=<span class="hljs-literal">None</span>)}, post_processed=<span class="hljs-literal">None</span>, supervised_keys=<span class="hljs-literal">None</span>, builder_name=<span class="hljs-string">'glue'</span>, config_name=<span class="hljs-string">'mrpc'</span>, version=<span class="hljs-number">1.0</span><span class="hljs-number">.0</span>, splits={<span class="hljs-string">'train'</span>: SplitInfo(name=<span class="hljs-string">'train'</span>, num_bytes=<span class="hljs-number">943851</span>, num_examples=<span class="hljs-number">3668</span>, dataset_name=<span class="hljs-string">'glue'</span>), <span class="hljs-string">'validation'</span>: SplitInfo(name=<span class="hljs-string">'validation'</span>, num_bytes=<span class="hljs-number">105887</span>, num_examples=<span class="hljs-number">408</span>, dataset_name=<span class="hljs-string">'glue'</span>), <span class="hljs-string">'test'</span>: SplitInfo(name=<span class="hljs-string">'test'</span>, num_bytes=<span class="hljs-number">442418</span>, num_examples=<span class="hljs-number">1725</span>, dataset_name=<span class="hljs-string">'glue'</span>)}, | |
| download_checksums={<span class="hljs-string">'https://dl.fbaipublicfiles.com/glue/data/mrpc_dev_ids.tsv'</span>: {<span class="hljs-string">'num_bytes'</span>: <span class="hljs-number">6222</span>, <span class="hljs-string">'checksum'</span>: <span class="hljs-string">'971d7767d81b997fd9060ade0ec23c4fc31cbb226a55d1bd4a1bac474eb81dc7'</span>}, <span class="hljs-string">'https://dl.fbaipublicfiles.com/senteval/senteval_data/msr_paraphrase_train.txt'</span>: {<span class="hljs-string">'num_bytes'</span>: <span class="hljs-number">1047044</span>, <span class="hljs-string">'checksum'</span>: <span class="hljs-string">'60a9b09084528f0673eedee2b69cb941920f0b8cd0eeccefc464a98768457f89'</span>}, <span class="hljs-string">'https://dl.fbaipublicfiles.com/senteval/senteval_data/msr_paraphrase_test.txt'</span>: {<span class="hljs-string">'num_bytes'</span>: <span class="hljs-number">441275</span>, <span class="hljs-string">'checksum'</span>: <span class="hljs-string">'a04e271090879aaba6423d65b94950c089298587d9c084bf9cd7439bd785f784'</span>}}, | |
| download_size=<span class="hljs-number">1494541</span>, | |
| post_processing_size=<span class="hljs-literal">None</span>, | |
| dataset_size=<span class="hljs-number">1492156</span>, | |
| size_in_bytes=<span class="hljs-number">2986697</span> | |
| )<!-- HTML_TAG_END --></pre></div> | |
| <p>You can request specific attributes of the dataset, like <code>description</code>, <code>citation</code>, and <code>homepage</code>, by calling them directly. Take a look at <a href="/docs/datasets/v2.2.2/en/package_reference/main_classes#datasets.DatasetInfo">DatasetInfo</a> for a complete list of attributes you can return.</p> | |
| <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> | |
| <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> | |
| Copied</div></button></div> | |
| <pre><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span>dataset.split | |
| NamedSplit(<span class="hljs-string">'train'</span>) | |
| <span class="hljs-meta">>>> </span>dataset.description | |
| <span class="hljs-string">'GLUE, the General Language Understanding Evaluation benchmark\n(https://gluebenchmark.com/) is a collection of resources for training,\nevaluating, and analyzing natural language understanding systems.\n\n'</span> | |
| <span class="hljs-meta">>>> </span>dataset.citation | |
| <span class="hljs-string">'@inproceedings{dolan2005automatically,\n title={Automatically constructing a corpus of sentential paraphrases},\n author={Dolan, William B and Brockett, Chris},\n booktitle={Proceedings of the Third International Workshop on Paraphrasing (IWP2005)},\n year={2005}\n}\n@inproceedings{wang2019glue,\n title={{GLUE}: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding},\n author={Wang, Alex and Singh, Amanpreet and Michael, Julian and Hill, Felix and Levy, Omer and Bowman, Samuel R.},\n note={In the Proceedings of ICLR.},\n year={2019}\n}\n\nNote that each GLUE dataset has its own citation. Please see the source to see\nthe correct citation for each contained dataset.'</span> | |
| <span class="hljs-meta">>>> </span>dataset.homepage | |
| <span class="hljs-string">'https://www.microsoft.com/en-us/download/details.aspx?id=52398'</span><!-- HTML_TAG_END --></pre></div> | |
| <h2 class="relative group"><a id="features-and-columns" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#features-and-columns"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> | |
| <span>Features and columns | |
| </span></h2> | |
| <p>A dataset is a table of rows and typed columns. Querying a dataset returns a Python dictionary where the keys correspond to column names, and the values correspond to column values:</p> | |
| <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> | |
| <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> | |
| Copied</div></button></div> | |
| <pre><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span>dataset[<span class="hljs-number">0</span>] | |
| {<span class="hljs-string">'idx'</span>: <span class="hljs-number">0</span>, | |
| <span class="hljs-string">'label'</span>: <span class="hljs-number">1</span>, | |
| <span class="hljs-string">'sentence1'</span>: <span class="hljs-string">'Amrozi accused his brother , whom he called " the witness " , of deliberately distorting his evidence .'</span>, | |
| <span class="hljs-string">'sentence2'</span>: <span class="hljs-string">'Referring to him as only " the witness " , Amrozi accused his brother of deliberately distorting his evidence .'</span>}<!-- HTML_TAG_END --></pre></div> | |
| <p>Return the number of rows and columns with the following standard attributes:</p> | |
| <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> | |
| <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> | |
| Copied</div></button></div> | |
| <pre><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span>dataset.shape | |
| (<span class="hljs-number">3668</span>, <span class="hljs-number">4</span>) | |
| <span class="hljs-meta">>>> </span>dataset.num_columns | |
| <span class="hljs-number">4</span> | |
| <span class="hljs-meta">>>> </span>dataset.num_rows | |
| <span class="hljs-number">3668</span> | |
| <span class="hljs-meta">>>> </span><span class="hljs-built_in">len</span>(dataset) | |
| <span class="hljs-number">3668</span><!-- HTML_TAG_END --></pre></div> | |
| <p>List the columns names with <a href="/docs/datasets/v2.2.2/en/package_reference/main_classes#datasets.Dataset.column_names">Dataset.column_names()</a>:</p> | |
| <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> | |
| <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> | |
| Copied</div></button></div> | |
| <pre><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span>dataset.column_names | |
| [<span class="hljs-string">'idx'</span>, <span class="hljs-string">'label'</span>, <span class="hljs-string">'sentence1'</span>, <span class="hljs-string">'sentence2'</span>]<!-- HTML_TAG_END --></pre></div> | |
| <p>Get detailed information about the columns with <a href="/docs/datasets/v2.2.2/en/package_reference/main_classes#datasets.Features">Features</a>:</p> | |
| <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> | |
| <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> | |
| Copied</div></button></div> | |
| <pre><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span>dataset.features | |
| {<span class="hljs-string">'idx'</span>: Value(dtype=<span class="hljs-string">'int32'</span>, <span class="hljs-built_in">id</span>=<span class="hljs-literal">None</span>), | |
| <span class="hljs-string">'label'</span>: ClassLabel(num_classes=<span class="hljs-number">2</span>, names=[<span class="hljs-string">'not_equivalent'</span>, <span class="hljs-string">'equivalent'</span>], names_file=<span class="hljs-literal">None</span>, <span class="hljs-built_in">id</span>=<span class="hljs-literal">None</span>), | |
| <span class="hljs-string">'sentence1'</span>: Value(dtype=<span class="hljs-string">'string'</span>, <span class="hljs-built_in">id</span>=<span class="hljs-literal">None</span>), | |
| <span class="hljs-string">'sentence2'</span>: Value(dtype=<span class="hljs-string">'string'</span>, <span class="hljs-built_in">id</span>=<span class="hljs-literal">None</span>), | |
| }<!-- HTML_TAG_END --></pre></div> | |
| <p>Return even more specific information about a feature like <a href="/docs/datasets/v2.2.2/en/package_reference/main_classes#datasets.ClassLabel">ClassLabel</a>, by calling its parameters <code>num_classes</code> and <code>names</code>:</p> | |
| <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> | |
| <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> | |
| Copied</div></button></div> | |
| <pre><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span>dataset.features[<span class="hljs-string">'label'</span>].num_classes | |
| <span class="hljs-number">2</span> | |
| <span class="hljs-meta">>>> </span>dataset.features[<span class="hljs-string">'label'</span>].names | |
| [<span class="hljs-string">'not_equivalent'</span>, <span class="hljs-string">'equivalent'</span>]<!-- HTML_TAG_END --></pre></div> | |
| <h2 class="relative group"><a id="rows-slices-batches-and-columns" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#rows-slices-batches-and-columns"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> | |
| <span>Rows, slices, batches, and columns | |
| </span></h2> | |
| <p>Get several rows of your dataset at a time with slice notation or a list of indices:</p> | |
| <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> | |
| <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> | |
| Copied</div></button></div> | |
| <pre><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span>dataset[:<span class="hljs-number">3</span>] | |
| {<span class="hljs-string">'idx'</span>: [<span class="hljs-number">0</span>, <span class="hljs-number">1</span>, <span class="hljs-number">2</span>], | |
| <span class="hljs-string">'label'</span>: [<span class="hljs-number">1</span>, <span class="hljs-number">0</span>, <span class="hljs-number">1</span>], | |
| <span class="hljs-string">'sentence1'</span>: [<span class="hljs-string">'Amrozi accused his brother , whom he called " the witness " , of deliberately distorting his evidence .'</span>, <span class="hljs-string">"Yucaipa owned Dominick 's before selling the chain to Safeway in 1998 for $ 2.5 billion ."</span>, <span class="hljs-string">'They had published an advertisement on the Internet on June 10 , offering the cargo for sale , he added .'</span>], | |
| <span class="hljs-string">'sentence2'</span>: [<span class="hljs-string">'Referring to him as only " the witness " , Amrozi accused his brother of deliberately distorting his evidence .'</span>, <span class="hljs-string">"Yucaipa bought Dominick 's in 1995 for $ 693 million and sold it to Safeway for $ 1.8 billion in 1998 ."</span>, <span class="hljs-string">"On June 10 , the ship 's owners had published an advertisement on the Internet , offering the explosives for sale ."</span>] | |
| } | |
| <span class="hljs-meta">>>> </span>dataset[[<span class="hljs-number">1</span>, <span class="hljs-number">3</span>, <span class="hljs-number">5</span>]] | |
| {<span class="hljs-string">'idx'</span>: [<span class="hljs-number">1</span>, <span class="hljs-number">3</span>, <span class="hljs-number">5</span>], | |
| <span class="hljs-string">'label'</span>: [<span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">1</span>], | |
| <span class="hljs-string">'sentence1'</span>: [<span class="hljs-string">"Yucaipa owned Dominick 's before selling the chain to Safeway in 1998 for $ 2.5 billion ."</span>, <span class="hljs-string">'Around 0335 GMT , Tab shares were up 19 cents , or 4.4 % , at A $ 4.56 , having earlier set a record high of A $ 4.57 .'</span>, <span class="hljs-string">'Revenue in the first quarter of the year dropped 15 percent from the same period a year earlier .'</span>], | |
| <span class="hljs-string">'sentence2'</span>: [<span class="hljs-string">"Yucaipa bought Dominick 's in 1995 for $ 693 million and sold it to Safeway for $ 1.8 billion in 1998 ."</span>, <span class="hljs-string">'Tab shares jumped 20 cents , or 4.6 % , to set a record closing high at A $ 4.57 .'</span>, <span class="hljs-string">"With the scandal hanging over Stewart 's company , revenue the first quarter of the year dropped 15 percent from the same period a year earlier ."</span>] | |
| }<!-- HTML_TAG_END --></pre></div> | |
| <p>Querying by the column name will return its values. For example, if you want to only return the first three examples:</p> | |
| <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> | |
| <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> | |
| Copied</div></button></div> | |
| <pre><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span>dataset[<span class="hljs-string">'sentence1'</span>][:<span class="hljs-number">3</span>] | |
| [<span class="hljs-string">'Amrozi accused his brother , whom he called " the witness " , of deliberately distorting his evidence .'</span>, <span class="hljs-string">"Yucaipa owned Dominick 's before selling the chain to Safeway in 1998 for $ 2.5 billion ."</span>, <span class="hljs-string">'They had published an advertisement on the Internet on June 10 , offering the cargo for sale , he added .'</span>]<!-- HTML_TAG_END --></pre></div> | |
| <p>Depending on how a <a href="/docs/datasets/v2.2.2/en/package_reference/main_classes#datasets.Dataset">Dataset</a> object is queried, the format returned will be different:</p> | |
| <ul><li>A single row like <code>dataset[0]</code> returns a Python dictionary of values.</li> | |
| <li>A batch like <code>dataset[5:10]</code> returns a Python dictionary of lists of values.</li> | |
| <li>A column like <code>dataset['sentence1']</code> returns a Python list of values.</li></ul> | |
| <script type="module" data-hydrate="ky63oi"> | |
| import { start } from "/docs/datasets/v2.2.2/en/_app/start-0f8c1da7.js"; | |
| start({ | |
| target: document.querySelector('[data-hydrate="ky63oi"]').parentNode, | |
| paths: {"base":"/docs/datasets/v2.2.2/en","assets":"/docs/datasets/v2.2.2/en"}, | |
| session: {}, | |
| route: false, | |
| spa: false, | |
| trailing_slash: "never", | |
| hydrate: { | |
| status: 200, | |
| error: null, | |
| nodes: [ | |
| import("/docs/datasets/v2.2.2/en/_app/pages/__layout.svelte-efb8e839.js"), | |
| import("/docs/datasets/v2.2.2/en/_app/pages/access.mdx-f196cd48.js") | |
| ], | |
| params: {} | |
| } | |
| }); | |
| </script> | |
Xet Storage Details
- Size:
- 34.5 kB
- Xet hash:
- 18e5f77fb566fb35653ae02b3d6ffabf973651b482cd2055af0a35087d53dfad
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.