Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,23 @@
|
|
| 1 |
---
|
| 2 |
license: mit
|
|
|
|
|
|
|
| 3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: mit
|
| 3 |
+
datasets:
|
| 4 |
+
- lc_quad
|
| 5 |
---
|
| 6 |
+
|
| 7 |
+
This repo contains a custom tokenizer for SPARQL. Here is an example.
|
| 8 |
+
|
| 9 |
+
```
|
| 10 |
+
Query: SELECT ?answer WHERE { wd:Q825946 wdt:P371 ?X . ?X wdt:P2048 ?answer}
|
| 11 |
+
```
|
| 12 |
+
|
| 13 |
+
Result from default T5 tokenizer:
|
| 14 |
+
```
|
| 15 |
+
['▁', 'SEL', 'ECT', '▁', '?', 'ans', 'wer', '▁W', 'HER', 'E', '▁', '{', '▁', 'w', 'd', ':', 'Q', '82', '59', '46', '▁',
|
| 16 |
+
'w', 'd', 't', ':', 'P', '37', '1', '▁', '?', 'X', '▁', '.', '▁', '?', 'X', '▁', 'w', 'd', 't', ':', 'P', '20', '48',
|
| 17 |
+
'▁', '?', 'ans', 'wer', '}']
|
| 18 |
+
```
|
| 19 |
+
|
| 20 |
+
Result from this tokenizer:
|
| 21 |
+
```
|
| 22 |
+
['▁SELECT', '▁?answer', '▁WHERE', '▁{', '▁wd:Q8', '259', '46', '▁wdt:P371', '▁?X', '▁.', '▁?X', '▁wdt:P2048', '▁?answer', '}']
|
| 23 |
+
```
|