dorian20
/

roberta_base_6000_sl

Text Classification

Model card Files Files and versions

roberta_base_6000_sl / README.md

dorian20's picture

Update README.md

311ce79 verified about 1 year ago

|

history blame contribute delete

1.65 kB

	---
	language: en
	tags:
	- text-classification
	- roberta
	- custom
	datasets:
	- google/jigsaw_toxicity_pred
	base_model:
	- FacebookAI/roberta-base
	pipeline_tag: text-classification
	---

	Lien Hugging Face: https://huggingface.co/dorian20/roberta_base_6000_sl

	# Modèle finetuné de RoBERTa-base pour la détection de toxicité dans un texte

	Le modèle a pour objectif de détecter la toxicité dans un texte en prédisant la probabilité d'appartenir à ces catégories attribuant un score pour chacune de ces catégories.
	Catégories: toxic, severe_toxic, obscene, threat, insult, identity_hate

	Le dataset utilisé est celui de Google appelé jigsaw_toxicity_pred. Nous avons utilisé un subset pour cette version du modèle.

	# Paramètres d'entraînement
	training_args = TrainingArguments(
	output_dir="./results",
	evaluation_strategy="epoch",
	save_strategy="epoch",
	learning_rate=2e-5,
	per_device_train_batch_size=32,
	per_device_eval_batch_size=16,
	num_train_epochs=10,
	weight_decay=0.01,
	save_total_limit=5,
	logging_dir="./logs",
	logging_steps=50,
	load_best_model_at_end=True,
	gradient_accumulation_steps=4,
	dataloader_num_workers=8,
	dataloader_pin_memory=True,
	fp16=True,

	)

	# Erreur moyenne absolue par catégorie sur le dataset d'entraînement:

	toxic: 0.1266
	severe_toxic: 0.0386
	obscene: 0.0673
	threat: 0.0437
	insult: 0.0832
	identity_hate: 0.0513