combe4259 commited on
Commit
6d3cce7
Β·
verified Β·
1 Parent(s): f2a3fa8

Create readme.md

Browse files
Files changed (1) hide show
  1. readme.md +111 -0
readme.md ADDED
@@ -0,0 +1,111 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: ko
3
+ license: apache-2.0
4
+ base_model: klue/bert-base
5
+ tags:
6
+ - klue-bert
7
+ - text-classification
8
+ - pytorch
9
+ - ko
10
+ - financial-domain
11
+ - text-difficulty
12
+ datasets:
13
+ - custom
14
+ metrics:
15
+ - f1
16
+ - accuracy
17
+ - mae
18
+ ---
19
+ [colab](https://colab.research.google.com/drive/112GWo0LrRls5B_uF6ghjZXzY6PxXzrnV?usp=sharing)
20
+
21
+ # 금육 λ¬Έμ„œ λ‚œμ΄λ„ λΆ„λ₯˜ λͺ¨λΈ (Text Difficulty Classification)
22
+
23
+ 이 λͺ¨λΈμ€ `klue/bert-base`λ₯Ό νŒŒμΈνŠœλ‹ν•˜μ—¬, ν•œκ΅­μ–΄ 금육 λ¬Έμž₯의 λ‚œμ΄λ„λ₯Ό **10단계(1~10)**둜 λΆ„λ₯˜ν•˜λŠ” **Text Classification λͺ¨λΈ**μž…λ‹ˆλ‹€.
24
+
25
+ 'μ–΄λ €μš΄ λ¬Έμž₯'이 λ“±μž₯ν–ˆλŠ”μ§€ μ‹€μ‹œκ°„μœΌλ‘œ κ°μ§€ν•˜μ—¬ 'μ‰¬μš΄ λ¬Έμž₯ λ³€ν™˜ AI'의 트리거 역할을 ν•˜λ„λ‘ μ„€κ³„λ˜μ—ˆμŠ΅λ‹ˆλ‹€.
26
+
27
+
28
+ ---
29
+
30
+ ## μ‚¬μš© 방법 (How to Use)
31
+
32
+ ```python
33
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
34
+ import torch
35
+
36
+ # Hugging Face Hub λ˜λŠ” μ €μž₯된 둜컬 κ²½λ‘œμ—μ„œ λͺ¨λΈ λ‘œλ“œ
37
+ MODEL_PATH = "combe4259/difficulty_klue"
38
+ tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
39
+ model = AutoModelForSequenceClassification.from_pretrained(MODEL_PATH)
40
+ model.eval()
41
+
42
+ # μΆ”λ‘ ν•  ν…μŠ€νŠΈ
43
+ text = "μ‹ μš©νŒŒμƒκ²°ν•©μ¦κΆŒμ˜ CDS μŠ€ν”„λ ˆλ“œ 변동에 λ”°λ₯Έ 수읡ꡬ쑰"
44
+
45
+ inputs = tokenizer(
46
+ text,
47
+ return_tensors="pt",
48
+ truncation=True,
49
+ max_length=512,
50
+ padding=True
51
+ )
52
+
53
+ # 예츑
54
+ with torch.no_grad():
55
+ outputs = model(**inputs)
56
+ logits = outputs.logits
57
+
58
+ # λͺ¨λΈμ€ 0-9둜 μ˜ˆμΈ‘ν•˜λ―€λ‘œ, +1 ν•˜μ—¬ 1-10 μŠ€μΌ€μΌλ‘œ λ³€ν™˜
59
+ prediction = torch.argmax(logits, dim=-1).item()
60
+ difficulty = prediction + 1
61
+
62
+ print(f"ν…μŠ€νŠΈ: {text}")
63
+ print(f"예츑 λ‚œμ΄λ„: {difficulty}")
64
+ # 좜λ ₯: 예츑 λ‚œμ΄λ„: 7
65
+
66
+
67
+ ## ν•™μŠ΅ 데이터 (Training Data)
68
+
69
+ - 자체 κ΅¬μΆ•ν•œ 2,880개의 금육 λ¬Έμž₯/λ‹¨λ½μœΌλ‘œ κ΅¬μ„±λœ JSON 데이터 μ‚¬μš©
70
+ - 데이터 λΆ„ν• : Train (2,016) / Validation (432) / Test (432)
71
+ - 데이터 λΆˆκ· ν˜•: λ‚œμ΄λ„ 7(28.2%)κ³Ό 8(18.6%) 집쀑, λ‚œμ΄λ„ 10(0.0%)은 1개 쑴재
72
+ - μ „μ²˜λ¦¬: klue/bert-base ν† ν¬λ‚˜μ΄μ € μ‚¬μš©, `max_length=512`둜 νŒ¨λ”© 및 μ ˆλ‹¨
73
+
74
+ ---
75
+
76
+ ## ν•™μŠ΅ 절차 (Training Procedure)
77
+
78
+ - Base Model: `klue/bert-base` (`num_labels=10`)
79
+ - Optimizer: AdamW
80
+ - Loss Function: Weighted CrossEntropyLoss (클래슀 κ°€μ€‘μΉ˜ 적용)
81
+ - 예: μƒ˜ν”Œ 1개인 λ‚œμ΄λ„ 10 β†’ 10.0
82
+ - μƒ˜ν”Œ 568개인 λ‚œμ΄λ„ 7 β†’ 0.35
83
+ - Epochs: 10
84
+ - Batch Size: 16
85
+ - Learning Rate: 2e-5 (with 500 warmup steps)
86
+ - Best Model: `metric_for_best_model='f1'` (F1 μ μˆ˜κ°€ κ°€μž₯ 높은 체크포인트 μ €μž₯)
87
+ - Early Stopping: patience=3 (F1 μ μˆ˜κ°€ 3회 연속 κ°œμ„ λ˜μ§€ μ•ŠμœΌλ©΄ ν•™μŠ΅ μ‘°κΈ° μ’…λ£Œ)
88
+
89
+ ---
90
+
91
+ ## 평가 κ²°κ³Ό (Evaluation Results)
92
+
93
+ Test Set (432개) κΈ°μ€€ μ΅œμ’… μ„±λŠ₯μž…λ‹ˆλ‹€.
94
+ 이 λͺ¨λΈμ€ **'μ •ν™•ν•œ 예츑(F1)'**κ³Ό **'μœ μš©ν•œ 예츑(MAE, Within 1 Acc)'** λͺ¨λ‘μ—μ„œ μ•ˆμ •μ μΈ μ„±λŠ₯을 λ³΄μ˜€μŠ΅λ‹ˆλ‹€.
95
+
96
+ | Metric | Score | μ„€λͺ… |
97
+ |-----------------------|-------|------|
98
+ | F1 Score (Weighted) | 0.607 | (핡심 μ§€ν‘œ) λͺ¨λΈμ˜ μ „λ°˜μ μΈ 정밀도/μž¬ν˜„μœ¨ |
99
+ | Accuracy (정확도) | 0.604 | 10개 쀑 μ •ν™•νžˆ 맞힐 ν™•λ₯  |
100
+ | MAE (평균 μ ˆλŒ€ 였차) | 0.560 | (μ€‘μš”) 예츑이 μ •λ‹΅μ—μ„œ 평균 0.56μΉΈ 벗어남 |
101
+ | Within 1 Acc | 0.926 | (μ€‘μš”) Β±1 였차 λ²”μœ„ λ‚΄ 정확도 (92.6%) |
102
+
103
+ ---
104
+
105
+ ## μƒ˜ν”Œ 예츑 (Sample Predictions)
106
+
107
+ | μž…λ ₯ ν…μŠ€νŠΈ | 예츑 λ‚œμ΄λ„ (1-10) |
108
+ |-------------|-------------------|
109
+ | "은행에 λˆμ„ λ§‘κ²¨μš”" | 1 |
110
+ | "μ˜ˆκΈˆμžλ³΄ν˜Έλ²•μ— 따라 5μ²œλ§Œμ›κΉŒμ§€ λ³΄ν˜Έλ©λ‹ˆλ‹€" | 2 |
111
+ | "μ‹ μš©νŒŒμƒκ²°ν•©μ¦κΆŒμ˜ CDS μŠ€ν”„λ ˆλ“œ 변동에 λ”°λ₯Έ 수읡ꡬ쑰" | 7 |