Update README.md
Browse files
README.md
CHANGED
|
@@ -14,73 +14,73 @@ metrics:
|
|
| 14 |
license: apache-2.0
|
| 15 |
---
|
| 16 |
|
| 17 |
-
# SMS
|
| 18 |
|
| 19 |
-
|
| 20 |
|
| 21 |
---
|
| 22 |
|
| 23 |
-
##
|
| 24 |
|
| 25 |
-
- **
|
| 26 |
-
- **
|
| 27 |
-
- **
|
| 28 |
-
- **
|
| 29 |
-
- **
|
| 30 |
|
| 31 |
---
|
| 32 |
|
| 33 |
-
##
|
| 34 |
|
| 35 |
-
|
| 36 |
-
- **
|
| 37 |
-
- **
|
| 38 |
|
| 39 |
---
|
| 40 |
|
| 41 |
-
##
|
| 42 |
|
| 43 |
-
- **Learning Rate**: 2e-5
|
| 44 |
-
- **Batch Size**: 8 (
|
| 45 |
-
- **Epochs**: 1
|
| 46 |
-
- **
|
| 47 |
-
- **
|
| 48 |
|
| 49 |
-
|
| 50 |
|
| 51 |
---
|
| 52 |
|
| 53 |
-
##
|
| 54 |
|
| 55 |
-
|
| 56 |
|
| 57 |
-
- **Evaluation Loss**: `<
|
| 58 |
-
- **Accuracy**: `<
|
| 59 |
-
- **F1 Score**: `<
|
| 60 |
|
| 61 |
-
(
|
| 62 |
|
| 63 |
---
|
| 64 |
|
| 65 |
-
##
|
| 66 |
|
| 67 |
-
|
| 68 |
|
| 69 |
```python
|
| 70 |
from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
| 71 |
|
| 72 |
-
#
|
| 73 |
tokenizer = AutoTokenizer.from_pretrained("blockenters/sms-spam-classifier")
|
| 74 |
model = AutoModelForSequenceClassification.from_pretrained("blockenters/sms-spam-classifier")
|
| 75 |
|
| 76 |
-
#
|
| 77 |
-
text = "
|
| 78 |
|
| 79 |
-
#
|
| 80 |
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=128)
|
| 81 |
outputs = model(**inputs)
|
| 82 |
predictions = outputs.logits.argmax(dim=-1)
|
| 83 |
|
| 84 |
-
#
|
| 85 |
label_map = {0: "ham", 1: "spam"}
|
| 86 |
-
print(f"
|
|
|
|
| 14 |
license: apache-2.0
|
| 15 |
---
|
| 16 |
|
| 17 |
+
# SMS ์คํธ ๋ถ๋ฅ๊ธฐ
|
| 18 |
|
| 19 |
+
์ด ๋ชจ๋ธ์ SMS ์คํธ ํ์ง๋ฅผ ์ํด ๋ฏธ์ธ ์กฐ์ ๋ **BERT ๊ธฐ๋ฐ ๋ค๊ตญ์ด ๋ชจ๋ธ**์
๋๋ค. SMS ๋ฉ์์ง๋ฅผ **ham(๋น์คํธ)** ๋๋ **spam(์คํธ)**์ผ๋ก ๋ถ๋ฅํ ์ ์์ต๋๋ค. Hugging Face Transformers ๋ผ์ด๋ธ๋ฌ๋ฆฌ์ **`bert-base-multilingual-cased`** ๋ชจ๋ธ์ ๊ธฐ๋ฐ์ผ๋ก ํ์ต๋์์ต๋๋ค.
|
| 20 |
|
| 21 |
---
|
| 22 |
|
| 23 |
+
## ๋ชจ๋ธ ์ธ๋ถ์ ๋ณด
|
| 24 |
|
| 25 |
+
- **๊ธฐ๋ณธ ๋ชจ๋ธ**: `bert-base-multilingual-cased`
|
| 26 |
+
- **ํ์คํฌ**: ๋ฌธ์ฅ ๋ถ๋ฅ(Sequence Classification)
|
| 27 |
+
- **์ง์ ์ธ์ด**: ๋ค๊ตญ์ด
|
| 28 |
+
- **๋ผ๋ฒจ ์**: 2 (`ham`, `spam`)
|
| 29 |
+
- **๋ฐ์ดํฐ์
**: ํด๋ฆฐ๋ SMS ์คํธ ๋ฐ์ดํฐ์
|
| 30 |
|
| 31 |
---
|
| 32 |
|
| 33 |
+
## ๋ฐ์ดํฐ์
์ ๋ณด
|
| 34 |
|
| 35 |
+
ํ๋ จ ๋ฐ ํ๊ฐ์ ์ฌ์ฉ๋ ๋ฐ์ดํฐ์
์ `ham`(๋น์คํธ) ๋๋ `spam`(์คํธ)์ผ๋ก ๋ผ๋ฒจ๋ง๋ SMS ๋ฉ์์ง๋ฅผ ํฌํจํ๊ณ ์์ต๋๋ค. ๋ฐ์ดํฐ๋ ์ ์ฒ๋ฆฌ๋ฅผ ๊ฑฐ์น ํ ๋ค์๊ณผ ๊ฐ์ด ๋ถ๋ฆฌ๋์์ต๋๋ค:
|
| 36 |
+
- **ํ๋ จ ๋ฐ์ดํฐ**: 80%
|
| 37 |
+
- **๊ฒ์ฆ ๋ฐ์ดํฐ**: 20%
|
| 38 |
|
| 39 |
---
|
| 40 |
|
| 41 |
+
## ํ์ต ์ค์
|
| 42 |
|
| 43 |
+
- **ํ์ต๋ฅ (Learning Rate)**: 2e-5
|
| 44 |
+
- **๋ฐฐ์น ํฌ๊ธฐ(Batch Size)**: 8 (๋๋ฐ์ด์ค ๋น)
|
| 45 |
+
- **์ํฌํฌ(Epochs)**: 1
|
| 46 |
+
- **ํ๊ฐ ์ ๋ต**: ์ํฌํฌ ๋จ์
|
| 47 |
+
- **ํ ํฌ๋์ด์ **: `bert-base-multilingual-cased`
|
| 48 |
|
| 49 |
+
์ด ๋ชจ๋ธ์ Hugging Face์ `Trainer` API๋ฅผ ์ฌ์ฉํ์ฌ ํจ์จ์ ์ผ๋ก ๋ฏธ์ธ ์กฐ์ ๋์์ต๋๋ค.
|
| 50 |
|
| 51 |
---
|
| 52 |
|
| 53 |
+
## ํ๊ฐ ๊ฒฐ๊ณผ
|
| 54 |
|
| 55 |
+
๋ชจ๋ธ์ ๊ฒ์ฆ ๋ฐ์ดํฐ์์ ๋ค์๊ณผ ๊ฐ์ ์ฑ๋ฅ์ ๋ณด์์ต๋๋ค:
|
| 56 |
|
| 57 |
+
- **ํ๊ฐ ์์ค(Evaluation Loss)**: `<๊ฒฐ๊ณผ๋ฅผ ์ถ๊ฐํ์ธ์>`
|
| 58 |
+
- **์ ํ๋(Accuracy)**: `<๊ฒฐ๊ณผ๋ฅผ ์ถ๊ฐํ์ธ์>`
|
| 59 |
+
- **F1 ์ ์(F1 Score)**: `<๊ฒฐ๊ณผ๋ฅผ ์ถ๊ฐํ์ธ์>`
|
| 60 |
|
| 61 |
+
(์ฐธ๊ณ : `<๊ฒฐ๊ณผ๋ฅผ ์ถ๊ฐํ์ธ์>` ๋ถ๋ถ์ `trainer.evaluate()` ๊ฒฐ๊ณผ๋ฅผ ์
๋ ฅํ์ธ์.)
|
| 62 |
|
| 63 |
---
|
| 64 |
|
| 65 |
+
## ์ฌ์ฉ ๋ฐฉ๋ฒ
|
| 66 |
|
| 67 |
+
์ด ๋ชจ๋ธ์ Hugging Face Transformers ๋ผ์ด๋ธ๋ฌ๋ฆฌ๋ฅผ ํตํด ๋ฐ๋ก ์ฌ์ฉํ ์ ์์ต๋๋ค:
|
| 68 |
|
| 69 |
```python
|
| 70 |
from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
| 71 |
|
| 72 |
+
# ๋ชจ๋ธ๊ณผ ํ ํฌ๋์ด์ ๋ก๋
|
| 73 |
tokenizer = AutoTokenizer.from_pretrained("blockenters/sms-spam-classifier")
|
| 74 |
model = AutoModelForSequenceClassification.from_pretrained("blockenters/sms-spam-classifier")
|
| 75 |
|
| 76 |
+
# ์
๋ ฅ ์ํ
|
| 77 |
+
text = "์ถํํฉ๋๋ค! ๋ฌด๋ฃ ๋ฐ๋ฆฌ ์ฌํ ํฐ์ผ์ ๋ฐ์ผ์
จ์ต๋๋ค. WIN์ด๋ผ๊ณ ํ์ ํ์ธ์."
|
| 78 |
|
| 79 |
+
# ํ ํฐํ ๋ฐ ์์ธก
|
| 80 |
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=128)
|
| 81 |
outputs = model(**inputs)
|
| 82 |
predictions = outputs.logits.argmax(dim=-1)
|
| 83 |
|
| 84 |
+
# ์์ธก ๊ฒฐ๊ณผ ๋์ฝ๋ฉ
|
| 85 |
label_map = {0: "ham", 1: "spam"}
|
| 86 |
+
print(f"์์ธก ๊ฒฐ๊ณผ: {label_map[predictions.item()]}")
|