Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
vukrosic
/
essential-web-16k-tokenizer
like
0
License:
mit
Model card
Files
Files and versions
xet
Community
main
essential-web-16k-tokenizer
1.59 MB
1 contributor
History:
6 commits
vukrosic
Add essential_web_500k_tokens.pkl - Tokenized version (pickle format) of 500K chars from Essential-Web
4ca13e5
verified
7 months ago
.gitattributes
1.52 kB
initial commit
7 months ago
README.md
4.6 kB
Add tokenizer usage documentation
7 months ago
bpe_tokenizer_16k_n1000000.pkl
194 kB
xet
Upload bpe_tokenizer_16k_n1000000.pkl
7 months ago
essential_web_500k_text.txt
504 kB
Add essential_web_500k_text.txt - 500K characters of raw text from Essential-Web dataset
7 months ago
essential_web_500k_tokens.pkl
350 kB
xet
Add essential_web_500k_tokens.pkl - Tokenized version (pickle format) of 500K chars from Essential-Web
7 months ago
essential_web_500k_tokens.txt
538 kB
Add essential_web_500k_tokens.txt - Tokenized version (one token per line) of 500K chars from Essential-Web
7 months ago