| # Model description | |
| We pretrained a RoBERTa-based Japanese masked language model on paper abstracts from the academic database CiNii Articles. | |
| [A Japanese Masked Language Model for Academic Domain](https://aclanthology.org/2022.sdp-1.16/) | |
| # Vocabulary | |
| The vocabulary consists of 32000 tokens including subwords induced by the unigram language model of sentencepiece. | |
| --- | |
| license: apache-2.0 <br> | |
| language:ja | |
| --- |