Set strip_accents to true in tokenizer_config.json
Browse files- README.md +2 -0
- tokenizer_config.json +1 -1
README.md
CHANGED
|
@@ -3,6 +3,8 @@ language: de
|
|
| 3 |
license: mit
|
| 4 |
---
|
| 5 |
|
|
|
|
|
|
|
| 6 |
# 🤗 + 📚 dbmdz German BERT models
|
| 7 |
|
| 8 |
In this repository the MDZ Digital Library team (dbmdz) at the Bavarian State
|
|
|
|
| 3 |
license: mit
|
| 4 |
---
|
| 5 |
|
| 6 |
+
This is a fork of [dbmdz/bert-base-german-uncased](https://huggingface.co/dbmdz/bert-base-german-uncased) with `strip_accents` being set to `true` in the tokenizer.
|
| 7 |
+
|
| 8 |
# 🤗 + 📚 dbmdz German BERT models
|
| 9 |
|
| 10 |
In this repository the MDZ Digital Library team (dbmdz) at the Bavarian State
|
tokenizer_config.json
CHANGED
|
@@ -1 +1 @@
|
|
| 1 |
-
{"do_lower_case": true, "max_len": 512, "init_inputs": []}
|
|
|
|
| 1 |
+
{"do_lower_case": true, "max_len": 512, "init_inputs": [], "strip_accents": true}
|