Custom Japanese BERT (4-layer)

This model is a tiny Japanese BERT model with 4 layers, optimized for speed.

Model Background

  • Architecture: BERT (4 layers, 256 hidden size, 4 heads, 1024 FFN)
  • Distillation: Distilled from a fine-tuned version of tohoku-nlp/bert-base-japanese-char-v2.
  • Initialization: The student model was randomly initialized.
  • Tokenizer: Japanese Character-level tokenizer, shared with the teacher.
Downloads last month
23
Safetensors
Model size
4.84M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train bluolightning/bert-tiny-japanese-char