Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
pankajmathur
/
nanochat-d20
like
0
Text Generation
HuggingFaceTB/smol-smoltalk
karpathy/fineweb-edu-100b-shuffle
English
nanochat
gpt
conversational
finetuned
License:
mit
Model card
Files
Files and versions
xet
Community
main
nanochat-d20
/
report
/
tokenizer-training.md
pankajmathur
Upload folder using huggingface_hub
4dec898
verified
7 months ago
preview
code
|
raw
Copy download link
history
blame
contribute
delete
Safe
263 Bytes
Tokenizer training
timestamp: 2025-10-18 21:16:38
max_chars: 2,000,000,000
doc_cap: 10,000
vocab_size: 65,536
train_time: 58.2860
num_special_tokens: 9
token_bytes_min: 1
token_bytes_max: 32
token_bytes_mean: 6.9197
token_bytes_std: 2.8748