gaaahee commited on
Commit
4bfd654
ยท
verified ยท
1 Parent(s): dcdbffb

Update model: Test Acc 73.93%, F1 0.7395, max_length=512

Browse files
Files changed (3) hide show
  1. README.md +99 -23
  2. config.json +21 -12
  3. pytorch_model.pt +1 -1
README.md CHANGED
@@ -2,49 +2,125 @@
2
  language: ko
3
  license: mit
4
  tags:
5
- - kobert
6
- - stance-detection
7
- - korean
8
- - text-classification
 
 
 
 
 
 
9
  metrics:
10
- - accuracy
11
- - f1
 
 
 
 
 
 
 
 
 
 
 
 
 
12
  ---
13
 
14
- # KoBERT Stance Classifier v2
15
 
16
- ํ•œ๊ตญ์–ด ์ •์น˜ ๋‰ด์Šค ์Šคํƒ ์Šค ๋ถ„๋ฅ˜ ๋ชจ๋ธ
 
 
 
 
 
 
 
17
 
18
  ## Performance
19
 
20
  | Metric | Score |
21
  |--------|-------|
22
- | Test Accuracy | 73.37% |
23
- | Test F1 (macro) | 0.7336 |
24
 
25
  ## Labels
26
 
27
- - 0: ์˜นํ˜ธ (Support)
28
- - 1: ์ค‘๋ฆฝ (Neutral)
29
- - 2: ๋น„ํŒ (Oppose)
 
 
30
 
31
  ## Usage
32
 
33
  ```python
34
  import torch
35
- from transformers import AutoTokenizer
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
36
 
 
37
  tokenizer = AutoTokenizer.from_pretrained('monologg/kobert', trust_remote_code=True)
38
 
39
- # Load model
40
- checkpoint = torch.load('pytorch_model.pt', map_location='cpu')
41
- # model.load_state_dict(checkpoint['model_state_dict'])
 
 
 
 
 
 
 
 
42
  ```
43
 
44
- ## Training Config
 
 
 
 
 
 
 
 
 
 
45
 
46
- - Base Model: monologg/kobert
47
- - Max Length: 256
48
- - Batch Size: 64
49
- - Learning Rate: 2e-05
50
- - Focal Loss: True
 
 
 
 
 
2
  language: ko
3
  license: mit
4
  tags:
5
+ - pytorch
6
+ - bert
7
+ - kobert
8
+ - text-classification
9
+ - stance-detection
10
+ - korean
11
+ - news
12
+ - political
13
+ datasets:
14
+ - custom
15
  metrics:
16
+ - accuracy
17
+ - f1
18
+ model-index:
19
+ - name: stance-classifier-v2
20
+ results:
21
+ - task:
22
+ type: text-classification
23
+ name: Stance Classification
24
+ metrics:
25
+ - type: accuracy
26
+ value: 73.93
27
+ name: Test Accuracy
28
+ - type: f1
29
+ value: 0.7395
30
+ name: Test F1
31
  ---
32
 
33
+ # Korean Political News Stance Classifier v2
34
 
35
+ KoBERT ๊ธฐ๋ฐ˜ ํ•œ๊ตญ์–ด ์ •์น˜ ๋‰ด์Šค ์Šคํƒ ์Šค(์ž…์žฅ) ๋ถ„๋ฅ˜ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.
36
+
37
+ ## Model Description
38
+
39
+ - **Base Model**: monologg/kobert
40
+ - **Task**: 3-class stance classification (์˜นํ˜ธ/์ค‘๋ฆฝ/๋น„ํŒ)
41
+ - **Language**: Korean
42
+ - **Training Data**: ~12,000 labeled political news articles
43
 
44
  ## Performance
45
 
46
  | Metric | Score |
47
  |--------|-------|
48
+ | Test Accuracy | 73.93% |
49
+ | Test F1 (macro) | 0.7395 |
50
 
51
  ## Labels
52
 
53
+ | Label ID | Korean | English | Description |
54
+ |----------|--------|---------|-------------|
55
+ | 0 | ์˜นํ˜ธ | support | ์ •๋ถ€/์—ฌ๋‹น์— ์šฐํ˜ธ์  |
56
+ | 1 | ์ค‘๋ฆฝ | neutral | ๊ฐ๊ด€์  ์‚ฌ์‹ค ์ „๋‹ฌ |
57
+ | 2 | ๋น„ํŒ | oppose | ์ •๋ถ€/์—ฌ๋‹น์— ๋น„ํŒ์  |
58
 
59
  ## Usage
60
 
61
  ```python
62
  import torch
63
+ from transformers import BertModel, AutoTokenizer
64
+ from huggingface_hub import hf_hub_download
65
+ import torch.nn as nn
66
+
67
+ # ๋ชจ๋ธ ์ •์˜
68
+ class StanceClassifier(nn.Module):
69
+ def __init__(self, bert_model, num_classes=3, dropout_rate=0.3):
70
+ super().__init__()
71
+ self.bert = bert_model
72
+ self.dropout = nn.Dropout(dropout_rate)
73
+ self.classifier = nn.Linear(768, num_classes)
74
+
75
+ def forward(self, input_ids, attention_mask, token_type_ids=None):
76
+ outputs = self.bert(input_ids=input_ids, attention_mask=attention_mask, token_type_ids=token_type_ids)
77
+ pooled_output = outputs.pooler_output
78
+ pooled_output = self.dropout(pooled_output)
79
+ return self.classifier(pooled_output)
80
+
81
+ # ๋ชจ๋ธ ๋กœ๋“œ
82
+ model_path = hf_hub_download(repo_id="gaaahee/stance-classifier-v2", filename="pytorch_model.pt")
83
+ checkpoint = torch.load(model_path, map_location='cpu')
84
+
85
+ bert_model = BertModel.from_pretrained('monologg/kobert')
86
+ model = StanceClassifier(bert_model)
87
+ model.load_state_dict(checkpoint['model_state_dict'])
88
+ model.eval()
89
 
90
+ # ํ† ํฌ๋‚˜์ด์ € ๋กœ๋“œ
91
  tokenizer = AutoTokenizer.from_pretrained('monologg/kobert', trust_remote_code=True)
92
 
93
+ # ์˜ˆ์ธก
94
+ text = "์ •๋ถ€์˜ ์ƒˆ ์ •์ฑ…์ด ๊ฒฝ์ œ ์„ฑ์žฅ์— ํฌ๊ฒŒ ๊ธฐ์—ฌํ•  ๊ฒƒ์œผ๋กœ ๊ธฐ๋Œ€๋œ๋‹ค"
95
+ encoding = tokenizer(text, truncation=True, max_length=512, padding='max_length', return_tensors='pt')
96
+
97
+ with torch.no_grad():
98
+ logits = model(encoding['input_ids'], encoding['attention_mask'])
99
+ probs = torch.softmax(logits, dim=1)
100
+ pred = torch.argmax(probs, dim=1).item()
101
+
102
+ labels = ['์˜นํ˜ธ', '์ค‘๋ฆฝ', '๋น„ํŒ']
103
+ print(f"Prediction: {labels[pred]} ({probs[0][pred].item()*100:.1f}%)")
104
  ```
105
 
106
+ ## Training Details
107
+
108
+ | Parameter | Value |
109
+ |-----------|-------|
110
+ | Base Model | monologg/kobert |
111
+ | Max Length | 512 |
112
+ | Batch Size | 64 |
113
+ | Learning Rate | 2e-05 |
114
+ | Dropout | 0.3 |
115
+ | Loss Function | Focal Loss (gamma=2.0) |
116
+ | Early Stopping | patience=3 |
117
 
118
+ ## Citation
119
+
120
+ ```bibtex
121
+ @misc{korean-stance-classifier-v2,
122
+ title={Korean Political News Stance Classifier v2},
123
+ year={2024},
124
+ publisher={HuggingFace}
125
+ }
126
+ ```
config.json CHANGED
@@ -1,22 +1,31 @@
1
  {
2
- "model_type": "bert",
3
  "base_model": "monologg/kobert",
 
4
  "num_labels": 3,
5
- "labels": {
6
- "0": "์˜นํ˜ธ",
7
- "1": "์ค‘๋ฆฝ",
8
- "2": "๋น„ํŒ"
9
  },
10
- "max_length": 256,
11
- "dropout": 0.3,
12
- "metrics": {
13
- "test_accuracy": 0.7336561743341404,
14
- "test_f1_macro": 0.7335882497408175,
15
- "best_val_f1": 0.7595240086537992
16
  },
 
 
 
 
 
 
 
 
 
 
17
  "training_config": {
18
  "model_name": "monologg/kobert",
19
- "max_length": 256,
20
  "dropout": 0.3,
21
  "batch_size": 64,
22
  "epochs": 10,
 
1
  {
2
+ "model_type": "kobert-stance-classifier",
3
  "base_model": "monologg/kobert",
4
+ "tokenizer": "monologg/kobert",
5
  "num_labels": 3,
6
+ "label2id": {
7
+ "support": 0,
8
+ "neutral": 1,
9
+ "oppose": 2
10
  },
11
+ "id2label": {
12
+ "0": "support",
13
+ "1": "neutral",
14
+ "2": "oppose"
 
 
15
  },
16
+ "label_names_kr": [
17
+ "์˜นํ˜ธ",
18
+ "์ค‘๋ฆฝ",
19
+ "๋น„ํŒ"
20
+ ],
21
+ "max_length": 512,
22
+ "dropout": 0.3,
23
+ "hidden_size": 768,
24
+ "test_accuracy": 0.7393058918482648,
25
+ "test_f1": 0.7394790179274192,
26
  "training_config": {
27
  "model_name": "monologg/kobert",
28
+ "max_length": 512,
29
  "dropout": 0.3,
30
  "batch_size": 64,
31
  "epochs": 10,
pytorch_model.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:1deb5151cb360cc5c01534c987bbfdd69a74336d619aae17668b999cb35525e7
3
  size 368845427
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:41c57deab19126e146c4e2a51bde628e65e31cf2952159bb8bb375bdead00fa2
3
  size 368845427