Set strip_accents to true in tokenizer_config.json

Files changed (2) hide show

README.md CHANGED Viewed

@@ -3,6 +3,8 @@ language: de
 license: mit
 ---
 # 🤗 + 📚 dbmdz German BERT models
 In this repository the MDZ Digital Library team (dbmdz) at the Bavarian State

 license: mit
 ---
+This is a fork of [dbmdz/bert-base-german-uncased](https://huggingface.co/dbmdz/bert-base-german-uncased) with `strip_accents` being set to `true` in the tokenizer.
 # 🤗 + 📚 dbmdz German BERT models
 In this repository the MDZ Digital Library team (dbmdz) at the Bavarian State

tokenizer_config.json CHANGED Viewed

	@@ -1 +1 @@
1	- {"do_lower_case": true, "max_len": 512, "init_inputs": []}


1	+ {"do_lower_case": true, "max_len": 512, "init_inputs": [], "strip_accents": true}