SHSK0118's picture
Create README.md
1aeae8a verified
metadata
datasets:
  - SHSK0118/BERT-basedDomainClassification_ComplaintTexts_ja
language:
  - ja

BERT-based Domain Classification for Japanese Complaint Texts

A BERT-based Japanese text classification model trained for domain classification of complaint texts.


Model Details

  • Architecture: BERT for Sequence Classification
  • Language: Japanese
  • Task: Multi-class domain classification
  • Framework: Hugging Face Transformers

Training Data

Training corpus:

BERT-basedDomainClassification_ComplaintTexts_ja Dataset

Dataset split:

  • Train: 90%
  • Validation: 5%
  • Test: 5%

Evaluation

Test Accuracy: 73.0%


Performance Discussion

The model was trained on primarily formal written text (Wikimedia-derived corpus), while evaluation was conducted on complaint-style texts.

The domain gap between formal and conversational language likely contributed to reduced performance.


Intended Use

  • Educational purposes
  • Research prototyping
  • Domain classification experiments

Limitations

  • No domain adaptation applied
  • Performance sensitive to genre distribution

Author

Independent implementation by Shota Tokunaga.