Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
almaghrabima
/
deeplatent-tokenizer-parity
like
0
tokenizer
bpe
myte
sarf
parity-aware
deeplatent
bilingual
arabic-english
morfessor
License:
apache-2.0
Model card
Files
Files and versions
xet
Community
main
deeplatent-tokenizer-parity
/
training
23.5 GB
1 contributor
History:
5 commits
almaghrabima
Upload parity-aware MYTE tokenizer artifacts
a879a93
verified
5 days ago
merges.txt
Safe
1.29 MB
Upload parity-aware MYTE tokenizer artifacts
5 days ago
phase1_stats.json
Safe
217 Bytes
Upload parity-aware MYTE tokenizer artifacts
5 days ago
train.ar
13.7 GB
xet
Upload parity-aware MYTE tokenizer artifacts
5 days ago
train.en
9.74 GB
xet
Upload parity-aware MYTE tokenizer artifacts
5 days ago