fasherr
/

toxicity_rubert

Text Classification

text-embeddings-inference

Model card Files Files and versions

toxicity_rubert / README.md

fasherr's picture

Update README.md

c83897b verified 13 days ago

|

history blame contribute delete

1.55 kB

	---
	language: ru
	library_name: transformers
	pipeline_tag: text-classification
	tags:
	- toxicity
	- safetensors
	base_model:
	- DeepPavlov/rubert-base-cased-conversational
	---

	A model for toxicity classification in Russian texts.
	Fine-tuned based on the [DeepPavlov/rubert-base-cased-conversational](https://huggingface.co/DeepPavlov/rubert-base-cased-conversational) model.

	It's a binary classifier designed to detect toxicity in text.

	* Label 0 (NEUTRAL): Neutral text
	* Label 1 (TOXIC): Toxic text / Insults / Threats

	Dataset

	This model was trained on two datasets:

	[Toxic Russian Comments](https://www.kaggle.com/datasets/alexandersemiletov/toxic-russian-comments)

	[Russian Language Toxic Comments](https://www.kaggle.com/datasets/blackmoon/russian-language-toxic-comments)


	Usage

	```python
	from transformers import pipeline

	classifier = pipeline("text-classification", model="fasherr/toxicity_rubert")
	text_1 = "Ты сегодня прекрасно выглядишь!"
	text_2 = "Ты очень плохой человек"
	print(classifier(text_1))
	# [{'label': 'NEUTRAL', 'score': 0.99...}]
	print(classifier(text_2))
	#[{'label': 'TOXIC', 'score': 1}]
	```

	Eval results
	\|\| Accuracy \| Precision \| Recall \| F1-Score \| AUC-ROC \| Support \|
	\| :--- \| :---: \| :---: \| :---: \| :---: \| :---: \| :---: \|
	\| Overall (Macro) \| 97.93% \| 96.37% \| 96.86% \| 96.61% \| 0.9962 \| 26271 \|
	\| Neutral \| 97.93% \| 98.88% \| 98.57% \| 98.72% \| 0.9962 \| 21347 \|
	\| Toxic \| 97.93% \| 93.87% \| 95.15% \| 94.50% \| 0.9962 \| 4924 \|