|
|
--- |
|
|
license: mit |
|
|
language: |
|
|
- fa |
|
|
metrics: |
|
|
- f1 |
|
|
- accuracy |
|
|
base_model: |
|
|
- HooshvareLab/bert-fa-base-uncased |
|
|
pipeline_tag: text-classification |
|
|
--- |
|
|
|
|
|
# Fine-tuned BERT for Persian Comment Discrepancy Classification |
|
|
|
|
|
This project fine-tunes a BERT model to classify Persian comments into two categories: complaints about Product discrepancy (`True`) and not (`False`). The model is trained on the [Basalam Comments](https://www.kaggle.com/datasets/alirezaazizkhani/labeled-persian-comments) dataset. |
|
|
|
|
|
## 🛠 Training Details |
|
|
- **Base Model**: `HooshvareLab/bert-fa-base-uncased` |
|
|
- **Fine-Tuning Dataset**: Basalam comments |
|
|
- **[NoteBook](https://www.kaggle.com/code/alirezaazizkhani/finetune-bert-for-discrepancy)** |
|
|
- **Evaluation Metrics**: |
|
|
- **Accuracy**: 95.89% |
|
|
- **F1 Score**: 95.62% |
|
|
|
|
|
|
|
|
## 📥 How to Use |
|
|
You can load and use the fine-tuned model as follows: |
|
|
|
|
|
```python |
|
|
from transformers import AutoModelForSequenceClassification, AutoTokenizer |
|
|
import torch |
|
|
|
|
|
def classify_comment(text): |
|
|
model_name = "alireza-2003/bert-fa-discrepancy-detection" |
|
|
model = AutoModelForSequenceClassification.from_pretrained(model_name) |
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
|
|
|
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True) |
|
|
with torch.no_grad(): |
|
|
outputs = model(**inputs) |
|
|
prediction = torch.argmax(outputs.logits).item() |
|
|
|
|
|
return "Discrepancy Complaint" if prediction == 1 else "Not a Complaint" |
|
|
|
|
|
comment = "دو تا سفارش داده بودم یدونه ابی و یدونه قرمز ولی هردوتاش قرمز بود" |
|
|
print(classify_comment(comment)) |
|
|
``` |
|
|
|
|
|
--- |
|
|
📝 **Author**: [Alireza] |
|
|
📅 **Last Updated**: [2/16/2025] |
|
|
🔗 **Dataset**: [Kaggle Dataset](https://www.kaggle.com/datasets/alirezaazizkhani/labeled-persian-comments) |