Silly-Machine
/

TuPy-Bert-Large-Multilabel

Text Classification

Eval Results (legacy)

text-embeddings-inference

Model card Files Files and versions

TuPy-Bert-Large-Multilabel / README.md

FpOliveira's picture

Update README.md

c6ee238 over 2 years ago

|

3.2 kB

	---
	license: mit
	datasets:
	- Silly-Machine/TuPy-Dataset
	language:
	- pt

	pipeline_tag: text-classification
	base_model: neuralmind/bert-base-portuguese-cased
	widget:
	- text: 'Bom dia, flor do dia!!'

	model-index:
	- name: Yi-34B
	results:
	- task:
	type: text-classfication
	dataset:
	name: Silly-Machine/TuPy-Dataset
	type: Silly-Machine/TuPy-Dataset
	metrics:
	- name: AI2 Reasoning Challenge (25-Shot)
	type: AI2 Reasoning Challenge (25-Shot)
	value: 64.59
	source:
	name: Open LLM Leaderboard
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard
	---

	## Introduction


	Tupi-BERT-Base is a fine-tuned BERT model designed specifically for binary classification of hate speech in Portuguese. Derived from the [BERTimbau base](https://huggingface.co/neuralmind/bert-base-portuguese-cased), TuPi-Base is refinde solution for addressing hate speech concerns.
	For more details or specific inquiries, please refer to the [BERTimbau repository](https://github.com/neuralmind-ai/portuguese-bert/).

	The efficacy of Language Models can exhibit notable variations when confronted with a shift in domain between training and test data. In the creation of a specialized Portuguese Language Model tailored for hate speech classification, the original BERTimbau model underwent fine-tuning processe carried out on the [TuPi Hate Speech DataSet](https://huggingface.co/datasets/FpOliveira/TuPi-Portuguese-Hate-Speech-Dataset-Binary), sourced from diverse social networks.

	## Available models

	\| Model \| Arch. \| #Layers \| #Params \|
	\| ---------------------------------------- \| ---------- \| ------- \| ------- \|
	\| `FpOliveira/tupi-bert-base-portuguese-cased` \| BERT-Base \|12 \|109M\|
	\| `FpOliveira/tupi-bert-large-portuguese-cased` \| BERT-Large \| 24 \| 334M \|
	\| `FpOliveira/tupi-bert-base-portuguese-cased-multiclass-multilabel` \| BERT-Base \| 12 \| 109M \|
	\| `FpOliveira/tupi-bert-large-portuguese-cased-multiclass-multilabel` \| BERT-Large \| 24 \| 334M \|

	## Example usage usage

	```python
	from transformers import AutoModelForSequenceClassification, AutoTokenizer, AutoConfig
	import torch
	import numpy as np
	from scipy.special import softmax

	def classify_hate_speech(model_name, text):
	model = AutoModelForSequenceClassification.from_pretrained(model_name)
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	config = AutoConfig.from_pretrained(model_name)

	# Tokenize input text and prepare model input
	model_input = tokenizer(text, padding=True, return_tensors="pt")

	# Get model output scores
	with torch.no_grad():
	output = model(**model_input)
	scores = softmax(output.logits.numpy(), axis=1)
	ranking = np.argsort(scores[0])[::-1]

	# Print the results
	for i, rank in enumerate(ranking):
	label = config.id2label[rank]
	score = scores[0, rank]
	print(f"{i + 1}) Label: {label} Score: {score:.4f}")

	# Example usage
	model_name = "Silly-Machine/TuPy-Bert-Large-Multilabel"
	text = "Bom dia, flor do dia!!"
	classify_hate_speech(model_name, text)

	```