BERT-based Domain Classification for Japanese Complaint Texts

A BERT-based Japanese text classification model trained for domain classification of complaint texts.


Model Details

  • Architecture: BERT for Sequence Classification
  • Language: Japanese
  • Task: Multi-class domain classification
  • Framework: Hugging Face Transformers

Training Data

Training corpus:

BERT-basedDomainClassification_ComplaintTexts_ja Dataset

Dataset split:

  • Train: 90%
  • Validation: 5%
  • Test: 5%

Evaluation

Test Accuracy: 73.0%


Performance Discussion

The model was trained on primarily formal written text (Wikimedia-derived corpus), while evaluation was conducted on complaint-style texts.

The domain gap between formal and conversational language likely contributed to reduced performance.


Intended Use

  • Educational purposes
  • Research prototyping
  • Domain classification experiments

Limitations

  • No domain adaptation applied
  • Performance sensitive to genre distribution

Author

Independent implementation by Shota Tokunaga.

Downloads last month
16
Safetensors
Model size
69.7M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train SHSK0118/BERT-basedDomainClassification_ComplaintTexts_ja