| # MISO-BR Misogyny Classifier |
|
|
| This model classifies text in Brazilian Portuguese as misogynistic or non-misogynistic. It's trained on the [MISO-BR dataset](https://huggingface.co/datasets/fabiopassos/miso-br). |
|
|
| ## Model Details |
|
|
| - **Model Type**: TF-IDF + RandomForest classifier |
| - **Language**: Portuguese (Brazil) |
| - **Task**: Binary classification (misogynistic vs non-misogynistic content) |
| - **Framework**: scikit-learn |
|
|
| ## Performance |
|
|
| The model was evaluated on a test set and achieved: |
|
|
| - **F1 Score (macro)**: 0.6758 |
| - **Accuracy**: 0.6778 |
| - **AUC**: 0.7314 |
|
|
| ## Requirements |
|
|
| This project requires the following libraries: |
|
|
| - `scikit-learn==1.7.0` |
| - `spacy==3.7.2` |
| - `joblib>=1.3.0` |
| - `pt_core_news_sm` (downloadable from [here](https://github.com/explosion/spacy-models/releases/download/pt_core_news_sm-3.7.0/pt_core_news_sm-3.7.0-py3-none-any.whl)) |
|
|
| Install the dependencies using the `requirements.txt` file: |
|
|
| ```bash |
| pip install -r requirements.txt |
| ``` |
|
|
| ## Usage |
|
|
| ```python |
| from huggingface_hub import hf_hub_download |
| import joblib |
| import spacy |
| |
| # Download the model from Hugging Face Hub |
| model_path = hf_hub_download(repo_id="fabiopassos/miso-br-classifier", |
| filename="models/miso_br_rf_classifier.joblib") |
| |
| # Load the model |
| model = joblib.load(model_path) |
| |
| # Load spaCy for Portuguese |
| nlp = spacy.load("pt_core_news_sm") |
| |
| # Preprocess function |
| def preprocess_text(text): |
| doc = nlp(text) |
| tokens = [token.lemma_.lower() for token in doc |
| if not token.is_stop and not token.is_punct and token.is_alpha] |
| return " ".join(tokens) |
| |
| # Example text |
| text = "Seu texto para classificar aqui" |
| processed_text = preprocess_text(text) |
| |
| # Predict |
| prediction = model.predict([processed_text])[0] |
| probability = model.predict_proba([processed_text])[0][1] |
| |
| print(f"Texto: {text}") |
| print(f"É misógino: {'Sim' if prediction == 1 else 'Não'}") |
| print(f"Probabilidade: {probability:.4f}") |
| ``` |