SHSK0118 commited on
Commit
1aeae8a
·
verified ·
1 Parent(s): 3523032

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +96 -0
README.md ADDED
@@ -0,0 +1,96 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - SHSK0118/BERT-basedDomainClassification_ComplaintTexts_ja
4
+ language:
5
+ - ja
6
+ ---
7
+ <h1>BERT-based Domain Classification for Japanese Complaint Texts</h1>
8
+
9
+ <p>
10
+ A BERT-based Japanese text classification model trained for
11
+ domain classification of complaint texts.
12
+ </p>
13
+
14
+ <hr>
15
+
16
+ <h2>Model Details</h2>
17
+
18
+ <ul>
19
+ <li>Architecture: BERT for Sequence Classification</li>
20
+ <li>Language: Japanese</li>
21
+ <li>Task: Multi-class domain classification</li>
22
+ <li>Framework: Hugging Face Transformers</li>
23
+ </ul>
24
+
25
+ <hr>
26
+
27
+ <h2>Training Data</h2>
28
+
29
+ <p>
30
+ Training corpus:
31
+ </p>
32
+
33
+ <p>
34
+ <a href="https://huggingface.co/datasets/SHSK0118/BERT-basedDomainClassification_ComplaintTexts_ja">
35
+ BERT-basedDomainClassification_ComplaintTexts_ja Dataset
36
+ </a>
37
+ </p>
38
+
39
+ <p>
40
+ Dataset split:
41
+ </p>
42
+
43
+ <ul>
44
+ <li>Train: 90%</li>
45
+ <li>Validation: 5%</li>
46
+ <li>Test: 5%</li>
47
+ </ul>
48
+
49
+ <hr>
50
+
51
+ <h2>Evaluation</h2>
52
+
53
+ <p>
54
+ Test Accuracy: <strong>73.0%</strong>
55
+ </p>
56
+
57
+ <hr>
58
+
59
+ <h2>Performance Discussion</h2>
60
+
61
+ <p>
62
+ The model was trained on primarily formal written text (Wikimedia-derived corpus),
63
+ while evaluation was conducted on complaint-style texts.
64
+ </p>
65
+
66
+ <p>
67
+ The domain gap between formal and conversational language likely
68
+ contributed to reduced performance.
69
+ </p>
70
+
71
+ <hr>
72
+
73
+ <h2>Intended Use</h2>
74
+
75
+ <ul>
76
+ <li>Educational purposes</li>
77
+ <li>Research prototyping</li>
78
+ <li>Domain classification experiments</li>
79
+ </ul>
80
+
81
+ <hr>
82
+
83
+ <h2>Limitations</h2>
84
+
85
+ <ul>
86
+ <li>No domain adaptation applied</li>
87
+ <li>Performance sensitive to genre distribution</li>
88
+ </ul>
89
+
90
+ <hr>
91
+
92
+ <h2>Author</h2>
93
+
94
+ <p>
95
+ Independent implementation by Shota Tokunaga.
96
+ </p>