Custom Japanese BERT (4-layer)
This model is a tiny Japanese BERT model with 4 layers, optimized for speed.
Model Background
- Architecture: BERT (4 layers, 256 hidden size, 4 heads, 1024 FFN)
- Distillation: Distilled from a fine-tuned version of
tohoku-nlp/bert-base-japanese-char-v2. - Initialization: The student model was randomly initialized.
- Tokenizer: Japanese Character-level tokenizer, shared with the teacher.
- Downloads last month
- 23