--- library_name: transformers license: apache-2.0 base_model: monologg/koelectra-base-v3-discriminator tags: - intent-classification - korean - koelectra - generated_from_trainer metrics: - accuracy - f1 model-index: - name: koelectra_intent_model results: [] --- # koelectra_intent_model This model is a fine-tuned version of [monologg/koelectra-base-v3-discriminator](https://huggingface.co/monologg/koelectra-base-v3-discriminator) on the custom-intent-dataset dataset. It achieves the following results on the evaluation set: - Loss: 0.9360 - Accuracy: 0.9885 - F1: 0.9884 ## Model description # ๐Ÿ“‹ ๋ชจ๋ธ ์นด๋“œ (Model Card) ## ๋ชจ๋ธ ์ •๋ณด ### ๊ธฐ๋ณธ ์ •๋ณด - **๋ชจ๋ธ๋ช…**: Intent Classifier KoELECTRA Fine-tuned - **๋ชจ๋ธ ID**: `kakao1513/koelectra_intent_model` - **๊ธฐ๋ณธ ๋ชจ๋ธ**: `monologg/koelectra-base-v3-discriminator` - **์ž‘์—…**: ํ…์ŠคํŠธ ๋ถ„๋ฅ˜ (Text Classification) - **์–ธ์–ด**: ํ•œ๊ตญ์–ด (Korean) ### ๋ชจ๋ธ ๊ฐœ์š” ์ด ๋ชจ๋ธ์€ ์‚ฌ์šฉ์ž์˜ ์˜๋„๋ฅผ ๋ถ„๋ฅ˜ํ•˜๊ธฐ ์œ„ํ•ด KoELECTRA ๋ชจ๋ธ์„ ํ•œ๊ตญ์–ด ์˜๋„ ๋ถ„๋ฅ˜ ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ ๋ฏธ์„ธ ์กฐ์ •(fine-tuning)ํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์‡ผํ•‘๋ชฐ, ํšŒ์›๊ฐ€์ž…, ๋กœ๊ทธ์ธ ๋“ฑ ์›น ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์˜ ์‚ฌ์šฉ์ž ํ–‰๋™ ์˜๋„๋ฅผ 35๊ฐœ์˜ ํด๋ž˜์Šค๋กœ ๋ถ„๋ฅ˜ํ•ฉ๋‹ˆ๋‹ค. --- ## ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ ### ๋ฐ์ดํ„ฐ์…‹ ํ†ต๊ณ„ | ํ•ญ๋ชฉ | ๊ฐ’ | |------|-----| | **์ด ๋ฐ์ดํ„ฐ ์ˆ˜** | 7,084 | | **ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ** | 5,698 (80%) | | **ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ** | 1,386 (20%) | | **์˜๋„ ํด๋ž˜์Šค ์ˆ˜** | 35๊ฐœ | ### ์ฃผ์š” ์˜๋„ ํด๋ž˜์Šค (์˜ˆ์‹œ) | ์˜๋„ | ์„ค๋ช… | ์ƒ˜ํ”Œ ์ˆ˜ | |------|------|--------| | `unknown` | ๋ฌด๊ด€/์ผ์ƒ์žก๋‹ด | 748 | | `go_mall` | ์‡ผํ•‘๋ชฐ๋กœ ์ด๋™ | 220 | | `go_coupang` | ์ฟ ํŒก์œผ๋กœ ์ด๋™ | 220 | | `click_login` | ๋กœ๊ทธ์ธ | 220 | | `click_signup` | ํšŒ์›๊ฐ€์ž… ํด๋ฆญ | 220 | | ... | ๊ทธ ์™ธ 30๊ฐœ ์˜๋„ | - | --- ## ํ›ˆ๋ จ ์„ค์ • ### ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ```python ํ•™์Šต๋ฅ : 2e-5 ๋ฐฐ์น˜ ํฌ๊ธฐ: 32 ์—ํฌํฌ: 5 ์ตœ๋Œ€ ์‹œํ€€์Šค ๊ธธ์ด: 64 ๊ฐ€์ค‘์น˜ ๊ฐ์†Œ: 0.01 ๋ผ๋ฒจ ์Šค๋ฌด๋”ฉ: 0.1 ์˜ตํ‹ฐ๋งˆ์ด์ €: AdamW ``` ### ํ›ˆ๋ จ ๊ฒฐ๊ณผ | Epoch | Validation Loss | Accuracy | F1 Score | |-------|-----------------|----------|----------| | 1 | 2.651761 | 71.63% | 0.6689 | | 2 | 1.768677 | 92.35% | 0.9065 | | 3 | 1.241083 | 97.99% | 0.9797 | | 4 | 0.999594 | 98.91% | 0.9890 | | 5 | 0.936003 | 98.85% | 0.9884 | **์ตœ์ข… ์„ฑ๋Šฅ (ํ…Œ์ŠคํŠธ ์…‹)** - **์ •ํ™•๋„ (Accuracy)**: 98.85% - **F1 ์ ์ˆ˜ (Weighted)**: 0.9884 --- ## ์‚ฌ์šฉ ๋ฐฉ๋ฒ• ### ์„ค์น˜ ```bash pip install transformers torch ``` ### ๊ธฐ๋ณธ ์‚ฌ์šฉ๋ฒ• ```python from transformers import pipeline # ๋ชจ๋ธ ๋กœ๋“œ classifier = pipeline("text-classification", model="smj1513/intent-classifier-koElectra-finetuned") # ์˜ˆ์ธก ์‹คํ–‰ text = "์‡ผํ•‘๋ชฐ ์‚ฌ์ดํŠธ๋กœ ์ด๋™ ํ• ๊นŒ ๋ง๊นŒ ํ• ๊ฒŒ" result = classifier(text)[0] print(f"์˜๋„: {result['label']}") print(f"ํ™•์‹ ๋„: {result['score']:.4f}") ``` ### ์ƒ์„ธ ์‚ฌ์šฉ๋ฒ• ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch # ๋ชจ๋ธ ๋ฐ ํ† ํฌ๋‚˜์ด์ € ๋กœ๋“œ model_name = "smj1513/intent-classifier-koElectra-finetuned" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name) # ํ…์ŠคํŠธ ์ „์ฒ˜๋ฆฌ text = "๋กœ๊ทธ์ธ ํŽ˜์ด์ง€๋กœ ๊ฐ€์ค˜" inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=64) # ์˜ˆ์ธก with torch.no_grad(): outputs = model(**inputs) logits = outputs.logits predicted_class_id = logits.argmax().item() confidence = torch.softmax(logits, dim=-1)[0][predicted_class_id].item() print(f"์˜ˆ์ธก ํด๋ž˜์Šค: {model.config.id2label[predicted_class_id]}") print(f"์‹ ๋ขฐ๋„: {confidence:.4f}") ``` --- ## ์„ฑ๋Šฅ ๋ถ„์„ ### ๊ฐ•์  โœ… **๋†’์€ ์ •ํ™•๋„**: 98.85%์˜ ํ…Œ์ŠคํŠธ ์ •ํ™•๋„๋กœ ๋งค์šฐ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ โœ… **๊ท ํ˜•์žกํžŒ F1 ์ ์ˆ˜**: 0.9884์˜ F1 ์ ์ˆ˜๋กœ ์ •๋ฐ€๋„์™€ ์žฌํ˜„์œจ์˜ ๊ท ํ˜• ์œ ์ง€ โœ… **๋น ๋ฅธ ์ถ”๋ก **: GPU์—์„œ ์•ฝ 94๊ฐœ/์ดˆ์˜ ์ฒ˜๋ฆฌ ์†๋„ โœ… **ํ•œ๊ตญ์–ด ํŠนํ™”**: KoELECTRA๋ฅผ ์‚ฌ์šฉํ•œ ํšจ์œจ์ ์ธ ํ•œ๊ตญ์–ด ์ฒ˜๋ฆฌ ### ์ฃผ์˜์‚ฌํ•ญ โš ๏ธ **๋„๋ฉ”์ธ ํŠนํ™”**: ์‡ผํ•‘๋ชฐ/ํšŒ์›๊ด€๋ฆฌ ๋„๋ฉ”์ธ์— ์ตœ์ ํ™”๋˜์–ด ์žˆ์Œ โš ๏ธ **ํ† ํฐ ๊ธธ์ด ์ œํ•œ**: ์ตœ๋Œ€ 64 ํ† ํฐ์œผ๋กœ ์ œํ•œ (๊ธด ๋ฌธ์žฅ์€ ํ™œ์šฉ ์ œํ•œ์ ) โš ๏ธ **๋ฏธ์ง€ ์˜๋„**: `unknown` ํด๋ž˜์Šค๋กœ ๋ถ„๋ฅ˜๋˜๋Š” ์ผ์ƒ ์žก๋‹ด์ด ํฌํ•จ๋จ --- ## ๊ธฐ์ˆ  ์‚ฌํ•ญ ### ๋ชจ๋ธ ์•„ํ‚คํ…์ฒ˜ - **๋ชจ๋ธ ํฌ๊ธฐ**: ELECTRA Base - **ํŒŒ๋ผ๋ฏธํ„ฐ ์ˆ˜**: ~110M - **์ถœ๋ ฅ ๋ ˆ์ด์–ด**: ์„ ํ˜• ๋ถ„๋ฅ˜ ํ—ค๋“œ (35๊ฐœ ํด๋ž˜์Šค) ### ์ž…์ถœ๋ ฅ ๋ช…์„ธ - **์ž…๋ ฅ**: ์ตœ๋Œ€ 64 ํ† ํฐ ๊ธธ์ด์˜ ํ•œ๊ตญ์–ด ํ…์ŠคํŠธ - **์ถœ๋ ฅ**: 35๊ฐœ ์˜๋„ ํด๋ž˜์Šค ์ค‘ ํ™•๋ฅ ์ด ๊ฐ€์žฅ ๋†’์€ ํด๋ž˜์Šค ๋ฐ ์‹ ๋ขฐ๋„ --- ## ์ œํ•œ์‚ฌํ•ญ ๋ฐ ๊ถŒ์žฅ์‚ฌํ•ญ ### ์ ์šฉ ๊ฐ€๋Šฅ ๋„๋ฉ”์ธ - โœ… ์‡ผํ•‘๋ชฐ/์ „์ž์ƒ๊ฑฐ๋ž˜ ์‹œ์Šคํ…œ - โœ… ํšŒ์›๊ฐ€์ž…/๋กœ๊ทธ์ธ ์˜๋„ ๋ถ„๋ฅ˜ - โœ… ์›น/๋ชจ๋ฐ”์ผ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ์‚ฌ์šฉ์ž ๋ช…๋ น ### ๋ถ€์ ์ ˆํ•œ ์‚ฌ์šฉ ์‚ฌ๋ก€ - โŒ ์˜๋ฃŒ, ๋ฒ•๋ฅ  ๋“ฑ ๊ณ ์œ„ํ—˜ ๋„๋ฉ”์ธ - โŒ ์‹ค์‹œ๊ฐ„ ์Œ์„ฑ ์ธ์‹ (์ด ๋ชจ๋ธ์€ ํ…์ŠคํŠธ ๊ธฐ๋ฐ˜) - โŒ ๋‹ค๋ฅธ ์–ธ์–ด ๋˜๋Š” ๋„๋ฉ”์ธ์˜ ์˜๋„ ๋ถ„๋ฅ˜ ### ์„ฑ๋Šฅ ๊ฐœ์„  ํŒ 1. **๋งฅ๋ฝ ์ถ”๊ฐ€**: ๊ธด ๋ฌธ์žฅ์€ ์š”์•ฝํ•˜์—ฌ 64ํ† ํฐ ์ด๋‚ด๋กœ ์œ ์ง€ 2. **ํ›„์ฒ˜๋ฆฌ**: ์‹ ๋ขฐ๋„๊ฐ€ ๋‚ฎ์€ ๊ฒฝ์šฐ(< 0.7) ์‚ฌ๋žŒ์˜ ๊ฒ€ํ†  ๊ถŒ์žฅ 3. **์žฌํ›ˆ๋ จ**: ์ƒˆ๋กœ์šด ์˜๋„ ํด๋ž˜์Šค ์ถ”๊ฐ€ ์‹œ ๋ชจ๋ธ ์žฌํ›ˆ๋ จ --- ## ๋ผ์ด์„ ์Šค ๋ฐ ์ถœ์ฒ˜ - **๊ธฐ๋ณธ ๋ชจ๋ธ ๋ผ์ด์„ ์Šค**: MIT (KoELECTRA) - **๋ชจ๋ธ ๊ณต๊ฐœ**: Hugging Face Model Hub - **์‚ฌ์šฉ ๋ผ์ด์„ ์Šค**: MIT --- ## ์ธ์šฉ ์ •๋ณด ์ด ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ธ์šฉํ•ด์ฃผ์„ธ์š”: ```bibtex @misc{intent-classifier-koelectra, author = {Your Name}, title = {Intent Classifier KoELECTRA Fine-tuned}, year = {2026}, publisher = {Hugging Face}, url = {https://huggingface.co/smj1513/intent-classifier-koElectra-finetuned} } ``` --- ## ์—ฐ๋ฝ์ฒ˜ ๋ฐ ์ง€์› ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•˜๊ฑฐ๋‚˜ ํ”ผ๋“œ๋ฐฑ์ด ์žˆ์œผ์‹œ๋ฉด Hugging Face ๋ชจ๋ธ ํŽ˜์ด์ง€์—์„œ Issues๋ฅผ ์ œ์ถœํ•ด์ฃผ์„ธ์š”. **๋งˆ์ง€๋ง‰ ์—…๋ฐ์ดํŠธ**: 2026๋…„ 2์›” 11์ผ ### Framework versions - Transformers 5.1.0 - Pytorch 2.9.1+cu128 - Datasets 4.5.0 - Tokenizers 0.22.2