miso-br-classifier / README.md
Fabio Passos
info about requeriments
f59c66a
# MISO-BR Misogyny Classifier
This model classifies text in Brazilian Portuguese as misogynistic or non-misogynistic. It's trained on the [MISO-BR dataset](https://huggingface.co/datasets/fabiopassos/miso-br).
## Model Details
- **Model Type**: TF-IDF + RandomForest classifier
- **Language**: Portuguese (Brazil)
- **Task**: Binary classification (misogynistic vs non-misogynistic content)
- **Framework**: scikit-learn
## Performance
The model was evaluated on a test set and achieved:
- **F1 Score (macro)**: 0.6758
- **Accuracy**: 0.6778
- **AUC**: 0.7314
## Requirements
This project requires the following libraries:
- `scikit-learn==1.7.0`
- `spacy==3.7.2`
- `joblib>=1.3.0`
- `pt_core_news_sm` (downloadable from [here](https://github.com/explosion/spacy-models/releases/download/pt_core_news_sm-3.7.0/pt_core_news_sm-3.7.0-py3-none-any.whl))
Install the dependencies using the `requirements.txt` file:
```bash
pip install -r requirements.txt
```
## Usage
```python
from huggingface_hub import hf_hub_download
import joblib
import spacy
# Download the model from Hugging Face Hub
model_path = hf_hub_download(repo_id="fabiopassos/miso-br-classifier",
filename="models/miso_br_rf_classifier.joblib")
# Load the model
model = joblib.load(model_path)
# Load spaCy for Portuguese
nlp = spacy.load("pt_core_news_sm")
# Preprocess function
def preprocess_text(text):
doc = nlp(text)
tokens = [token.lemma_.lower() for token in doc
if not token.is_stop and not token.is_punct and token.is_alpha]
return " ".join(tokens)
# Example text
text = "Seu texto para classificar aqui"
processed_text = preprocess_text(text)
# Predict
prediction = model.predict([processed_text])[0]
probability = model.predict_proba([processed_text])[0][1]
print(f"Texto: {text}")
print(f"É misógino: {'Sim' if prediction == 1 else 'Não'}")
print(f"Probabilidade: {probability:.4f}")
```