ruanchaves/hatebr
Updated • 285 • 18
How to use ruanchaves/mdeberta-v3-base-hatebr with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-classification", model="ruanchaves/mdeberta-v3-base-hatebr") # Load model directly
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("ruanchaves/mdeberta-v3-base-hatebr")
model = AutoModelForSequenceClassification.from_pretrained("ruanchaves/mdeberta-v3-base-hatebr")This is the microsoft/mdeberta-v3-base model finetuned for Offensive Language Detection with the HateBR dataset. This model is suitable for Portuguese.
from transformers import AutoModelForSequenceClassification, AutoTokenizer, AutoConfig
import numpy as np
import torch
from scipy.special import softmax
model_name = "ruanchaves/mdeberta-v3-base-hatebr"
s1 = "Quem não deve não teme!!"
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
config = AutoConfig.from_pretrained(model_name)
model_input = tokenizer(*([s1],), padding=True, return_tensors="pt")
with torch.no_grad():
output = model(**model_input)
scores = output[0][0].detach().numpy()
scores = softmax(scores)
ranking = np.argsort(scores)
ranking = ranking[::-1]
for i in range(scores.shape[0]):
l = config.id2label[ranking[i]]
s = scores[ranking[i]]
print(f"{i+1}) Label: {l} Score: {np.round(float(s), 4)}")
The HateBR dataset, including all its components, is provided strictly for academic and research purposes. The use of the dataset for any commercial or non-academic purpose is expressly prohibited without the prior written consent of SINCH.
Our research is ongoing, and we are currently working on describing our experiments in a paper, which will be published soon. In the meanwhile, if you would like to cite our work or models before the publication of the paper, please cite our GitHub repository:
@software{Chaves_Rodrigues_eplm_2023,
author = {Chaves Rodrigues, Ruan and Tanti, Marc and Agerri, Rodrigo},
doi = {10.5281/zenodo.7781848},
month = {3},
title = {{Evaluation of Portuguese Language Models}},
url = {https://github.com/ruanchaves/eplm},
version = {1.0.0},
year = {2023}
}