# MISO-BR Misogyny Classifier This model classifies text in Brazilian Portuguese as misogynistic or non-misogynistic. It's trained on the [MISO-BR dataset](https://huggingface.co/datasets/fabiopassos/miso-br). ## Model Details - **Model Type**: TF-IDF + RandomForest classifier - **Language**: Portuguese (Brazil) - **Task**: Binary classification (misogynistic vs non-misogynistic content) - **Framework**: scikit-learn ## Performance The model was evaluated on a test set and achieved: - **F1 Score (macro)**: 0.6758 - **Accuracy**: 0.6778 - **AUC**: 0.7314 ## Requirements This project requires the following libraries: - `scikit-learn==1.7.0` - `spacy==3.7.2` - `joblib>=1.3.0` - `pt_core_news_sm` (downloadable from [here](https://github.com/explosion/spacy-models/releases/download/pt_core_news_sm-3.7.0/pt_core_news_sm-3.7.0-py3-none-any.whl)) Install the dependencies using the `requirements.txt` file: ```bash pip install -r requirements.txt ``` ## Usage ```python from huggingface_hub import hf_hub_download import joblib import spacy # Download the model from Hugging Face Hub model_path = hf_hub_download(repo_id="fabiopassos/miso-br-classifier", filename="models/miso_br_rf_classifier.joblib") # Load the model model = joblib.load(model_path) # Load spaCy for Portuguese nlp = spacy.load("pt_core_news_sm") # Preprocess function def preprocess_text(text): doc = nlp(text) tokens = [token.lemma_.lower() for token in doc if not token.is_stop and not token.is_punct and token.is_alpha] return " ".join(tokens) # Example text text = "Seu texto para classificar aqui" processed_text = preprocess_text(text) # Predict prediction = model.predict([processed_text])[0] probability = model.predict_proba([processed_text])[0][1] print(f"Texto: {text}") print(f"É misógino: {'Sim' if prediction == 1 else 'Não'}") print(f"Probabilidade: {probability:.4f}") ```