Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
ttj
/
nanochat-cache
like
0
Model card
Files
Files and versions
xet
Community
main
nanochat-cache
/
report
/
tokenizer-training.md
ttj
Add files using upload-large-folder tool
85a524c
verified
4 months ago
preview
code
|
raw
Copy download link
history
blame
contribute
delete
263 Bytes
Tokenizer training
timestamp: 2025-11-03 05:56:12
max_chars: 2,000,000,000
doc_cap: 10,000
vocab_size: 65,536
train_time: 57.1515
num_special_tokens: 9
token_bytes_min: 1
token_bytes_max: 32
token_bytes_mean: 6.9197
token_bytes_std: 2.8748