| | --- |
| | license: mit |
| | base_model: klue/bert-base |
| | tags: |
| | - bert |
| | - lora |
| | - korean |
| | - text-classification |
| | - sentiment-analysis |
| | language: |
| | - ko |
| | datasets: |
| | - nsmc |
| | --- |
| | |
| | # NSMC ๊ฐ์ ๋ถ์ (LoRA Fine-tuned) |
| |
|
| | ์ด ๋ชจ๋ธ์ LoRA๋ฅผ ์ฌ์ฉํ์ฌ NSMC(Naver Sentiment Movie Corpus) ๋ฐ์ดํฐ์
์ผ๋ก ํ์ธํ๋๋ ๊ฐ์ ๋ถ์ ๋ชจ๋ธ์
๋๋ค. |
| |
|
| | ## ๋ชจ๋ธ ์ ๋ณด |
| |
|
| | - **๋ฒ ์ด์ค ๋ชจ๋ธ:** klue/bert-base |
| | - **ํ์ธํ๋ ๋ฐฉ๋ฒ:** LoRA (Low-Rank Adaptation) |
| | - **ํ์ต ํ๋ผ๋ฏธํฐ:** ์ฝ 0.3% (~300K) |
| | - **๋ฐ์ดํฐ์
:** NSMC (๋ค์ด๋ฒ ์ํ ๋ฆฌ๋ทฐ) |
| | - **Task:** ์ด์ง ๋ถ๋ฅ (๊ธ์ /๋ถ์ ) |
| |
|
| | ## ์ฌ์ฉ ๋ฐฉ๋ฒ |
| | ```python |
| | from transformers import AutoModelForSequenceClassification, AutoTokenizer |
| | from peft import PeftModel |
| | import torch |
| | |
| | # ๋ฒ ์ด์ค ๋ชจ๋ธ ๋ก๋ |
| | base_model = AutoModelForSequenceClassification.from_pretrained( |
| | "klue/bert-base", |
| | num_labels=2 |
| | ) |
| | |
| | # LoRA ์ด๋ํฐ ๋ก๋ |
| | model = PeftModel.from_pretrained(base_model, "JINIIII/nsmc-sentiment-lora") |
| | tokenizer = AutoTokenizer.from_pretrained("JINIIII/nsmc-sentiment-lora") |
| | |
| | # ์ถ๋ก |
| | text = "์ด ์ํ ์ ๋ง ์ฌ๋ฏธ์์ด์!" |
| | |
| | inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128) |
| | outputs = model(**inputs) |
| | |
| | probs = torch.softmax(outputs.logits, dim=-1) |
| | pred = torch.argmax(probs, dim=-1).item() |
| | |
| | label = "๊ธ์ " if pred == 1 else "๋ถ์ " |
| | confidence = probs[0][pred].item() |
| | |
| | print(f"๊ฒฐ๊ณผ: {label} (ํ์ ๋: {confidence:.2%})") |
| | ``` |
| |
|
| | ## ํ์ต ์ธ๋ถ์ฌํญ |
| |
|
| | - **LoRA Rank (r):** 8 |
| | - **LoRA Alpha:** 16 |
| | - **Target Modules:** query, value |
| | - **Dropout:** 0.1 |
| | - **ํ์ต ์ํญ:** 5 |
| | - **ํ์ต๋ฅ :** 5e-4 |
| |
|
| | ## ์ฑ๋ฅ |
| |
|
| | - **ํ์ต ๋ฐ์ดํฐ:** 10,000๊ฐ |
| | - **ํ๊ฐ ๋ฐ์ดํฐ:** 2,000๊ฐ |
| | - **ํ๊ฐ ์ ํ๋:** ~85-90% |
| |
|
| | ## ํ์ฉ ์์ |
| |
|
| | - ์ํ ๋ฆฌ๋ทฐ ๊ฐ์ ๋ถ์ |
| | - ์ํ ๋ฆฌ๋ทฐ ๋ถ์ |
| | - SNS ๊ฐ์ ๋ชจ๋ํฐ๋ง |
| | - ๊ณ ๊ฐ ํผ๋๋ฐฑ ์๋ ๋ถ๋ฅ |
| |
|
| | ## ์ ํ์ฌํญ |
| |
|
| | - ์ํ ๋ฆฌ๋ทฐ ๋๋ฉ์ธ์ ํนํ๋์ด ์์ |
| | - ์งง์ ํ
์คํธ์์ ๊ฐ์ฅ ์ข์ ์ฑ๋ฅ |
| | - ๊ทน๋จ์ ์ธ ๊ฐ์ ํํ์์ ์ ํ๋๊ฐ ๋์ |
| |
|
| | ## ๋ผ์ด์ ์ค |
| |
|
| | MIT License |
| |
|
| | ## ์์ฑ์ |
| |
|
| | JINIIII |
| |
|
| | ## ์ธ์ฉ |
| | ```bibtex |
| | @misc{nsmc-sentiment-lora, |
| | author = {JINIIII}, |
| | title = {NSMC Sentiment Analysis with LoRA}, |
| | year = {2024}, |
| | publisher = {Hugging Face}, |
| | url = {https://huggingface.co/JINIIII/nsmc-sentiment-lora} |
| | } |
| | ``` |
| |
|
| | **Note**: ์ด ๋ชจ๋ธ์ ๊ต์ก ๋ชฉ์ ์ผ๋ก ๋ง๋ค์ด์ก์ต๋๋ค. |
| |
|