Arabic
arabic
tokenizer
morphology
nlp
dialect
fr3on commited on
Commit
60eb243
·
verified ·
1 Parent(s): 562053c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -7,6 +7,8 @@ tags:
7
  license: apache-2.0
8
  language:
9
  - ar
 
 
10
  ---
11
 
12
  # DF-Arc: Morphology-Aware Arabic Tokenizer
@@ -32,4 +34,4 @@ print(tokens)
32
 
33
  ## Citation
34
  If you use DF-Arc, please cite our paper:
35
- *The Arabic Token Tax: Quantifying Tokenization Inefficiency in Large Language Models* (Dataflare Lab, 2026).
 
7
  license: apache-2.0
8
  language:
9
  - ar
10
+ datasets:
11
+ - dataflare/arabic-dialect-corpus
12
  ---
13
 
14
  # DF-Arc: Morphology-Aware Arabic Tokenizer
 
34
 
35
  ## Citation
36
  If you use DF-Arc, please cite our paper:
37
+ *The Arabic Token Tax: Quantifying Tokenization Inefficiency in Large Language Models* (Dataflare Lab, 2026).