blockenters
/

sms-spam-classifier

@@ -14,73 +14,73 @@ metrics:
 license: apache-2.0
 ---
-# SMS Spam Classifier
-This is a fine-tuned **BERT-based multilingual model** designed for SMS spam detection. The model can classify SMS messages as either **ham (non-spam)** or **spam**. It was trained using the **`bert-base-multilingual-cased`** model from Hugging Face Transformers library.
 ---
-## Model Details
-- **Base Model**: `bert-base-multilingual-cased`
-- **Task**: Sequence Classification
-- **Languages Supported**: Multilingual
-- **Number of Labels**: 2 (`ham`, `spam`)
-- **Dataset**: A cleaned SMS spam dataset.
 ---
-## Dataset
-The dataset used for training and evaluation contains SMS messages labeled as `ham` (non-spam) or `spam`. The dataset was preprocessed for tokenization and split into training and evaluation subsets:
-- **Training Set**: 80%
-- **Evaluation Set**: 20%
 ---
-## Training Configuration
-- **Learning Rate**: 2e-5
-- **Batch Size**: 8 (per device)
-- **Epochs**: 1
-- **Evaluation Strategy**: Per epoch
-- **Tokenizer**: `bert-base-multilingual-cased`
-The model was trained using the Hugging Face `Trainer` API for efficient fine-tuning.
 ---
-## Evaluation Results
-The model achieved the following performance metrics during evaluation:
-- **Evaluation Loss**: `<add_results_here>`
-- **Accuracy**: `<add_results_here>`
-- **F1 Score**: `<add_results_here>`
-(Note: Replace `<add_results_here>` with actual values from the `trainer.evaluate()` results.)
 ---
-## How to Use
-You can use this model directly with the Hugging Face Transformers library:
 ```python
 from transformers import AutoTokenizer, AutoModelForSequenceClassification
-# Load the model and tokenizer
 tokenizer = AutoTokenizer.from_pretrained("blockenters/sms-spam-classifier")
 model = AutoModelForSequenceClassification.from_pretrained("blockenters/sms-spam-classifier")
-# Sample input
-text = "Congratulations! You've won a free ticket to Bali. Reply WIN to claim."
-# Tokenize and predict
 inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=128)
 outputs = model(**inputs)
 predictions = outputs.logits.argmax(dim=-1)
-# Decode prediction
 label_map = {0: "ham", 1: "spam"}
-print(f"Prediction: {label_map[predictions.item()]}")

 license: apache-2.0
 ---
+# SMS 스팸 분류기
+이 모델은 SMS 스팸 탐지를 위해 미세 조정된 **BERT 기반 다국어 모델**입니다. SMS 메시지를 **ham(비스팸)** 또는 **spam(스팸)**으로 분류할 수 있습니다. Hugging Face Transformers 라이브러리의 **`bert-base-multilingual-cased`** 모델을 기반으로 학습되었습니다.
 ---
+## 모델 세부정보
+- **기본 모델**: `bert-base-multilingual-cased`
+- **태스크**: 문장 분류(Sequence Classification)
+- **지원 언어**: 다국어
+- **라벨 수**: 2 (`ham`, `spam`)
+- **데이터셋**: 클린된 SMS 스팸 데이터셋
 ---
+## 데이터셋 정보
+훈련 및 평가에 사용된 데이터셋은 `ham`(비스팸) 또는 `spam`(스팸)으로 라벨링된 SMS 메시지를 포함하고 있습니다. 데이터는 전처리를 거친 후 다음과 같이 분리되었습니다:
+- **훈련 데이터**: 80%
+- **검증 데이터**: 20%
 ---
+## 학습 설정
+- **학습률(Learning Rate)**: 2e-5
+- **배치 크기(Batch Size)**: 8 (디바이스 당)
+- **에포크(Epochs)**: 1
+- **평가 전략**: 에포크 단위
+- **토크나이저**: `bert-base-multilingual-cased`
+이 모델은 Hugging Face의 `Trainer` API를 사용하여 효율적으로 미세 조정되었습니다.
 ---
+## 평가 결과
+모델은 검증 데이터에서 다음과 같은 성능을 보였습니다:
+- **평가 손실(Evaluation Loss)**: `<결과를 추가하세요>`
+- **정확도(Accuracy)**: `<결과를 추가하세요>`
+- **F1 점수(F1 Score)**: `<결과를 추가하세요>`
+(참고: `<결과를 추가하세요>` 부분에 `trainer.evaluate()` 결과를 입력하세요.)
 ---
+## 사용 방법
+이 모델은 Hugging Face Transformers 라이브러리를 통해 바로 사용할 수 있습니다:
 ```python
 from transformers import AutoTokenizer, AutoModelForSequenceClassification
+# 모델과 토크나이저 로드
 tokenizer = AutoTokenizer.from_pretrained("blockenters/sms-spam-classifier")
 model = AutoModelForSequenceClassification.from_pretrained("blockenters/sms-spam-classifier")
+# 입력 샘플
+text = "축하합니다! 무료 발리 여행 티켓을 받으셨습니다. WIN이라고 회신하세요."
+# 토큰화 및 예측
 inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=128)
 outputs = model(**inputs)
 predictions = outputs.logits.argmax(dim=-1)
+# 예측 결과 디코딩
 label_map = {0: "ham", 1: "spam"}
+print(f"예측 결과: {label_map[predictions.item()]}")