legacy-datasets/wikipedia
Updated • 122k • 632
How to use hiroshi-matsuda-rit/bert-base-japanese-basic-char-v2 with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("fill-mask", model="hiroshi-matsuda-rit/bert-base-japanese-basic-char-v2") # Load model directly
from transformers import AutoTokenizer, AutoModelForMaskedLM
tokenizer = AutoTokenizer.from_pretrained("hiroshi-matsuda-rit/bert-base-japanese-basic-char-v2")
model = AutoModelForMaskedLM.from_pretrained("hiroshi-matsuda-rit/bert-base-japanese-basic-char-v2")This pretrained model is almost the same as cl-tohoku/bert-base-japanese-char-v2 but do not need fugashi or unidic_lite.
The only difference is in word_tokenzer_type property (specify basic instead of mecab) in tokenizer_config.json.