jmcinern
/

qwen_tokenizer_ga

Model card Files Files and versions

jmcinern commited on Jun 24, 2025

Commit

c14d9a3

·

verified ·

1 Parent(s): fe9eee8

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -9,7 +9,7 @@
   year         = {2025}}
-## Qwen tokenizer trained on Irish language data
 - Provides a ~50% reduction in number of tokens. (399 → 200 in test set).
 - Significantly improves identifying words as tokens.

   year         = {2025}}
+## Monolingual Qwen tokenizer trained on Irish language data
 - Provides a ~50% reduction in number of tokens. (399 → 200 in test set).
 - Significantly improves identifying words as tokens.