Japanese Subject Insertion Model

BERT based Token Classification model based on tohoku-nlp/bert-base-japanse and trained to predict in a Japanese sentence without an explicit subject where the subject would be.

Model Uses

This model was trained as part of a bigger project to predict implicit subjects in Japanese text. You can find whole project here [https://github.com/Romi212/Japanese-Subject-Predictor-System]

Training Details

Training Data

Model was trained using dataset https://github.com/UniversalDependencies/UD_Japanese-GSDLUW

The dataset was reduced only to sentences with a subject, and the subject was removed from the sentence saving the position to train the model to predict where the subject should go.

Downloads last month: 16

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for Romi121/subject-insertion-model

Base model

tohoku-nlp/bert-base-japanese

Finetuned

(10)

this model