| # ELECTRA discriminator base | |
| - pretrained with large Korean corpus datasets (30GB) | |
| - 113M model parameters (followed google/electra-base-discriminator config) | |
| - 35,000 vocab size | |
| - trained for 1,000,000 steps | |
| - built on [lassl](https://github.com/lassl/lassl) framework | |
| pretrain-data | |
| ┣ korean_corpus.txt | |
| ┣ kowiki_latest.txt | |
| ┣ modu_dialogue_v1.2.txt | |
| ┣ modu_news_v1.1.txt | |
| ┣ modu_news_v2.0.txt | |
| ┣ modu_np_2021_v1.0.txt | |
| ┣ modu_np_v1.1.txt | |
| ┣ modu_spoken_v1.2.txt | |
| ┗ modu_written_v1.0.txt |