| | --- |
| | language: vi |
| | tags: |
| | - spam-detection |
| | - vietnamese |
| | - phobert |
| | license: apache-2.0 |
| | datasets: |
| | - visolex/ViSpamReviews |
| | metrics: |
| | - accuracy |
| | - f1 |
| | model-index: |
| | - name: phobert-spam-classification |
| | results: |
| | - task: |
| | type: text-classification |
| | name: Spam Detection (Multi-Class) |
| | dataset: |
| | name: ViSpamReviews |
| | type: custom |
| | metrics: |
| | - name: Accuracy |
| | type: accuracy |
| | value: <INSERT_ACCURACY> |
| | - name: F1 Score |
| | type: f1 |
| | value: <INSERT_F1_SCORE> |
| | base_model: |
| | - vinai/phobert-base |
| | pipeline_tag: text-classification |
| | --- |
| | |
| | # PhoBERT-Spam-MultiClass |
| |
|
| | Fine-tuned from [`vinai/phobert-base`](https://huggingface.co/vinai/phobert-base) on **ViSpamReviews** (multi-class). |
| |
|
| | * **Task**: 4-way classification |
| | * **Dataset**: [ViSpamReviews](https://huggingface.co/datasets/visolex/ViSpamReviews) |
| | * **Hyperparameters** |
| |
|
| | * Batch size: 32 |
| | * LR: 3e-5 |
| | * Epochs: 100 |
| | * Max seq len: 256 |
| | |
| | ## Usage |
| |
|
| | ```python |
| | from transformers import AutoTokenizer, AutoModelForSequenceClassification |
| | |
| | tokenizer = AutoTokenizer.from_pretrained("visolex/phobert-spam-classification") |
| | model = AutoModelForSequenceClassification.from_pretrained("visolex/phobert-spam-classification") |
| | |
| | text = "Chỉ PR thương hiệu chứ không review thật." |
| | inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256) |
| | pred = model(**inputs).logits.argmax(dim=-1).item() |
| | label_map = {0: "NO-SPAM",1: "SPAM-1",2: "SPAM-2",3: "SPAM-3"} |
| | print(label_map[pred]) |
| | ``` |
| |
|