SHSK0118's picture
Create README.md
1aeae8a verified
---
datasets:
- SHSK0118/BERT-basedDomainClassification_ComplaintTexts_ja
language:
- ja
---
<h1>BERT-based Domain Classification for Japanese Complaint Texts</h1>
<p>
A BERT-based Japanese text classification model trained for
domain classification of complaint texts.
</p>
<hr>
<h2>Model Details</h2>
<ul>
<li>Architecture: BERT for Sequence Classification</li>
<li>Language: Japanese</li>
<li>Task: Multi-class domain classification</li>
<li>Framework: Hugging Face Transformers</li>
</ul>
<hr>
<h2>Training Data</h2>
<p>
Training corpus:
</p>
<p>
<a href="https://huggingface.co/datasets/SHSK0118/BERT-basedDomainClassification_ComplaintTexts_ja">
BERT-basedDomainClassification_ComplaintTexts_ja Dataset
</a>
</p>
<p>
Dataset split:
</p>
<ul>
<li>Train: 90%</li>
<li>Validation: 5%</li>
<li>Test: 5%</li>
</ul>
<hr>
<h2>Evaluation</h2>
<p>
Test Accuracy: <strong>73.0%</strong>
</p>
<hr>
<h2>Performance Discussion</h2>
<p>
The model was trained on primarily formal written text (Wikimedia-derived corpus),
while evaluation was conducted on complaint-style texts.
</p>
<p>
The domain gap between formal and conversational language likely
contributed to reduced performance.
</p>
<hr>
<h2>Intended Use</h2>
<ul>
<li>Educational purposes</li>
<li>Research prototyping</li>
<li>Domain classification experiments</li>
</ul>
<hr>
<h2>Limitations</h2>
<ul>
<li>No domain adaptation applied</li>
<li>Performance sensitive to genre distribution</li>
</ul>
<hr>
<h2>Author</h2>
<p>
Independent implementation by Shota Tokunaga.
</p>