felixb85 commited on
Commit
950a8d3
·
1 Parent(s): 798342a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +20 -0
README.md CHANGED
@@ -1,3 +1,23 @@
1
  ---
2
  license: mit
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ datasets:
4
+ - lc_quad
5
  ---
6
+
7
+ This repo contains a custom tokenizer for SPARQL. Here is an example.
8
+
9
+ ```
10
+ Query: SELECT ?answer WHERE { wd:Q825946 wdt:P371 ?X . ?X wdt:P2048 ?answer}
11
+ ```
12
+
13
+ Result from default T5 tokenizer:
14
+ ```
15
+ ['▁', 'SEL', 'ECT', '▁', '?', 'ans', 'wer', '▁W', 'HER', 'E', '▁', '{', '▁', 'w', 'd', ':', 'Q', '82', '59', '46', '▁',
16
+ 'w', 'd', 't', ':', 'P', '37', '1', '▁', '?', 'X', '▁', '.', '▁', '?', 'X', '▁', 'w', 'd', 't', ':', 'P', '20', '48',
17
+ '▁', '?', 'ans', 'wer', '}']
18
+ ```
19
+
20
+ Result from this tokenizer:
21
+ ```
22
+ ['▁SELECT', '▁?answer', '▁WHERE', '▁{', '▁wd:Q8', '259', '46', '▁wdt:P371', '▁?X', '▁.', '▁?X', '▁wdt:P2048', '▁?answer', '}']
23
+ ```