Buckets:

hf-doc-build
/

doc-dev

about 2 months ago

705 Bytes

	# Tokenizers, check![[tokenizers-check]]

	<CourseFloatingBanner
	chapter={6}
	classNames="absolute z-10 right-0 top-0"
	/>

	Great job finishing this chapter!

	After this deep dive into tokenizers, you should:

	- Be able to train a new tokenizer using an old one as a template
	- Understand how to use offsets to map tokens' positions to their original span of text
	- Know the differences between BPE, WordPiece, and Unigram
	- Be able to mix and match the blocks provided by the 🤗 Tokenizers library to build your own tokenizer
	- Be able to use that tokenizer inside the 🤗 Transformers library


	<EditOnGithub source="https://github.com/huggingface/course/blob/main/chapters/en/chapter6/9.mdx" />

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.