fabiopassos
/

miso-br-classifier

Model card Files Files and versions

miso-br-classifier / README.md

Fabio Passos

info about requeriments

f59c66a 7 months ago

|

history blame contribute delete

1.94 kB

	# MISO-BR Misogyny Classifier

	This model classifies text in Brazilian Portuguese as misogynistic or non-misogynistic. It's trained on the [MISO-BR dataset](https://huggingface.co/datasets/fabiopassos/miso-br).

	## Model Details

	- Model Type: TF-IDF + RandomForest classifier
	- Language: Portuguese (Brazil)
	- Task: Binary classification (misogynistic vs non-misogynistic content)
	- Framework: scikit-learn

	## Performance

	The model was evaluated on a test set and achieved:

	- F1 Score (macro): 0.6758
	- Accuracy: 0.6778
	- AUC: 0.7314

	## Requirements

	This project requires the following libraries:

	- `scikit-learn==1.7.0`
	- `spacy==3.7.2`
	- `joblib>=1.3.0`
	- `pt_core_news_sm` (downloadable from [here](https://github.com/explosion/spacy-models/releases/download/pt_core_news_sm-3.7.0/pt_core_news_sm-3.7.0-py3-none-any.whl))

	Install the dependencies using the `requirements.txt` file:

	```bash
	pip install -r requirements.txt
	```

	## Usage

	```python
	from huggingface_hub import hf_hub_download
	import joblib
	import spacy

	# Download the model from Hugging Face Hub
	model_path = hf_hub_download(repo_id="fabiopassos/miso-br-classifier",
	filename="models/miso_br_rf_classifier.joblib")

	# Load the model
	model = joblib.load(model_path)

	# Load spaCy for Portuguese
	nlp = spacy.load("pt_core_news_sm")

	# Preprocess function
	def preprocess_text(text):
	doc = nlp(text)
	tokens = [token.lemma_.lower() for token in doc
	if not token.is_stop and not token.is_punct and token.is_alpha]
	return " ".join(tokens)

	# Example text
	text = "Seu texto para classificar aqui"
	processed_text = preprocess_text(text)

	# Predict
	prediction = model.predict([processed_text])[0]
	probability = model.predict_proba([processed_text])[0][1]

	print(f"Texto: {text}")
	print(f"É misógino: {'Sim' if prediction == 1 else 'Não'}")
	print(f"Probabilidade: {probability:.4f}")
	```