Buckets:

rtrm's picture
|
download
raw
1.68 kB

Installation

๐Ÿค— Tokenizers is tested on Python 3.5+.

You should install ๐Ÿค— Tokenizers in a virtual environment. If you're unfamiliar with Python virtual environments, check out the user guide. Create a virtual environment with the version of Python you're going to use and activate it.

Installation with pip

๐Ÿค— Tokenizers can be installed using pip as follows:

pip install tokenizers

Installation from sources

To use this method, you need to have the Rust language installed. You can follow the official guide for more information.

If you are using a unix based OS, the installation should be as simple as running:

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

Or you can easily update it with the following command:

rustup update

Once rust is installed, we can start retrieving the sources for ๐Ÿค— Tokenizers:

git clone https://github.com/huggingface/tokenizers

Then we go into the python bindings folder:

cd tokenizers/bindings/python

At this point you should have your virtual environment already activated. In order to compile ๐Ÿค— Tokenizers, you need to:

pip install -e .

Crates.io

๐Ÿค— Tokenizers is available on crates.io.

You just need to add it to your Cargo.toml:

cargo add tokenizers

Installation with npm

You can simply install ๐Ÿค— Tokenizers with npm using:

npm install tokenizers

Xet Storage Details

Size:
1.68 kB
ยท
Xet hash:
60159c9c66820e26dfb98c7d6c68a96c37225e1e24eff28375879a540dd92303

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.