# MISO-BR Misogyny Classifier

This model classifies text in Brazilian Portuguese as misogynistic or non-misogynistic. It's trained on the [MISO-BR dataset](https://huggingface.co/datasets/fabiopassos/miso-br).

## Model Details

- **Model Type**: TF-IDF + RandomForest classifier
- **Language**: Portuguese (Brazil)
- **Task**: Binary classification (misogynistic vs non-misogynistic content)
- **Framework**: scikit-learn

## Performance

The model was evaluated on a test set and achieved:

- **F1 Score (macro)**: 0.6758
- **Accuracy**: 0.6778
- **AUC**: 0.7314

## Requirements

This project requires the following libraries:

- `scikit-learn==1.7.0`
- `spacy==3.7.2`
- `joblib>=1.3.0`
- `pt_core_news_sm` (downloadable from [here](https://github.com/explosion/spacy-models/releases/download/pt_core_news_sm-3.7.0/pt_core_news_sm-3.7.0-py3-none-any.whl))

Install the dependencies using the `requirements.txt` file:

```bash
pip install -r requirements.txt
```

## Usage

```python
from huggingface_hub import hf_hub_download
import joblib
import spacy

# Download the model from Hugging Face Hub
model_path = hf_hub_download(repo_id="fabiopassos/miso-br-classifier", 
                             filename="models/miso_br_rf_classifier.joblib")

# Load the model
model = joblib.load(model_path)

# Load spaCy for Portuguese
nlp = spacy.load("pt_core_news_sm")

# Preprocess function
def preprocess_text(text):
    doc = nlp(text)
    tokens = [token.lemma_.lower() for token in doc 
              if not token.is_stop and not token.is_punct and token.is_alpha]
    return " ".join(tokens)

# Example text
text = "Seu texto para classificar aqui"
processed_text = preprocess_text(text)

# Predict
prediction = model.predict([processed_text])[0]
probability = model.predict_proba([processed_text])[0][1]

print(f"Texto: {text}")
print(f"É misógino: {'Sim' if prediction == 1 else 'Não'}")
print(f"Probabilidade: {probability:.4f}")
```