| | --- |
| | license: cc-by-nc-nd-4.0 |
| | datasets: |
| | - alt-gnome/telegram-spam |
| | language: |
| | - ru |
| | metrics: |
| | - accuracy |
| | - f1 |
| | - recall |
| | - precision |
| | base_model: |
| | - deepvk/RuModernBERT-base |
| | pipeline_tag: text-classification |
| | tags: |
| | - spam |
| | - detection |
| | - classification |
| | - russian |
| | library_name: transformers |
| | --- |
| | # russian_spam_detector |
| |
|
| | Модель **russian_spam_detector** предназначена для бинарной классификации текстов на 2 категории: |
| | - **LABEL_1** — спам-сообщение |
| | - **LABEL_0** — нормальное сообщение (не спам) |
| |
|
| | ## 🚀 Использование |
| |
|
| | ```python |
| | from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline |
| | |
| | model_name = "corall88/russian_spam_detector" |
| | tokenizer = AutoTokenizer.from_pretrained(model_name) |
| | model = AutoModelForSequenceClassification.from_pretrained(model_name) |
| | |
| | detector = pipeline("text-classification", model=model, tokenizer=tokenizer) |
| | |
| | message = "Поздравляем! Вы выиграли 1000000 рублей, пройдите по ссылке - ..." |
| | predict = detector(message) |
| | print(predict) |
| | ``` |
| |
|
| | ## 📊 Датасет |
| | В качетсвете данных для файнтюнинга модели был выбран **[датасет](https://huggingface.co/datasets/alt-gnome/telegram-spam)** cо спам сообщениями. |
| | |
| | ## 🧠 Архитектура |
| | Модель основана на **[RuModernBERT-base](https://huggingface.co/deepvk/RuModernBERT-base)** и дообучена на задаче бинарной классификации. |
| |
|
| | ## ⚙️ Параметры обучения |
| | - **Epochs**: 4 |
| | - **Batch size**: 16 |
| | - **Optimizer**: AdamW |
| | - **Learning rate**: 2e-5 |
| | - **Loss**: CrossEntropyLoss |
| | - **Max sequence length**: 256 |
| |
|
| | ## 📈 Результаты |
| | | Metric | Value | |
| | |-----------|-------| |
| | | Accuracy | 0.99 | |
| | | F1-score | 0.99 | |
| | | Precision | 0.99 | |
| | | Recall | 0.99 | |
| |
|
| | ## Citation |
| | ``` |
| | @misc{russian_spam_detector, |
| | title={russian_spam_detector: modern model for spam detection}, |
| | author={corall88}, |
| | url={https://huggingface.co/corall88/russian_spam_detector}, |
| | publisher={Hugging Face} |
| | year={2025}, |
| | } |
| | ``` |