Buckets:

hf-doc-build
/

doc-dev

Files

xet

hf-doc-build/doc-dev / course /pr_1114 /en /chapter5 /1.md

rtrm

about 2 months ago

preview code

download

raw

1.24 kB

Introduction[[introduction]]

In Chapter 3 you got your first taste of the 🤗 Datasets library and saw that there were three main steps when it came to fine-tuning a model:

Load a dataset from the Hugging Face Hub.
Preprocess the data with Dataset.map().
Load and compute metrics.

But this is just scratching the surface of what 🤗 Datasets can do! In this chapter, we will take a deep dive into the library. Along the way, we'll find answers to the following questions:

What do you do when your dataset is not on the Hub?
How can you slice and dice a dataset? (And what if you really need to use Pandas?)
What do you do when your dataset is huge and will melt your laptop's RAM?
What the heck are "memory mapping" and Apache Arrow?
How can you create your own dataset and push it to the Hub?

The techniques you learn here will prepare you for the advanced tokenization and fine-tuning tasks in Chapter 6 and Chapter 7 -- so grab a coffee and let's get started!

Xet Storage Details

Size:: 1.24 kB
Xet hash:: 1a22dd651965a63fc76a4c3f4b3f254f22d3caf908639a72146c1291830f7640

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.