Fabio Passos commited on
Commit
3402a30
·
1 Parent(s): 8465782

first commit

Browse files
README.md CHANGED
@@ -1,3 +1,54 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # MISO-BR Misogyny Classifier
2
+
3
+ This model classifies text in Brazilian Portuguese as misogynistic or non-misogynistic. It's trained on the [MISO-BR dataset](https://huggingface.co/datasets/fabiopassos/miso-br).
4
+
5
+ ## Model Details
6
+
7
+ - **Model Type**: TF-IDF + RandomForest classifier
8
+ - **Language**: Portuguese (Brazil)
9
+ - **Task**: Binary classification (misogynistic vs non-misogynistic content)
10
+ - **Framework**: scikit-learn
11
+
12
+ ## Performance
13
+
14
+ The model was evaluated on a test set and achieved:
15
+
16
+ - **F1 Score (macro)**: 0.6758
17
+ - **Accuracy**: 0.6778
18
+ - **AUC**: 0.7314
19
+
20
+ ## Usage
21
+
22
+ ```python
23
+ from huggingface_hub import hf_hub_download
24
+ import joblib
25
+ import spacy
26
+
27
+ # Download the model from Hugging Face Hub
28
+ model_path = hf_hub_download(repo_id="fabiopassos/miso-br-classifier",
29
+ filename="models/miso_br_rf_classifier.joblib")
30
+
31
+ # Load the model
32
+ model = joblib.load(model_path)
33
+
34
+ # Load spaCy for Portuguese
35
+ nlp = spacy.load("pt_core_news_sm")
36
+
37
+ # Preprocess function
38
+ def preprocess_text(text):
39
+ doc = nlp(text)
40
+ tokens = [token.lemma_.lower() for token in doc
41
+ if not token.is_stop and not token.is_punct and token.is_alpha]
42
+ return " ".join(tokens)
43
+
44
+ # Example text
45
+ text = "Seu texto para classificar aqui"
46
+ processed_text = preprocess_text(text)
47
+
48
+ # Predict
49
+ prediction = model.predict([processed_text])[0]
50
+ probability = model.predict_proba([processed_text])[0][1]
51
+
52
+ print(f"Texto: {text}")
53
+ print(f"É misógino: {'Sim' if prediction == 1 else 'Não'}")
54
+ print(f"Probabilidade: {probability:.4f}")
models/miso_br_rf_classifier.joblib ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cad11eeaffb1bc4112bd47c86791a6dcd5f81a363d4ddaacc36770f5483b1095
3
+ size 1177125
models/preprocessing_config.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0d7203b61175a2eabd286bdb156f984d12e47e2ac8bf61b8ffde9677f5b21cc4
3
+ size 3498
requirements.txt ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ scikit-learn==1.7.0
2
+ spacy==3.7.2
3
+ joblib>=1.3.0
4
+ pt_core_news_sm @ https://github.com/explosion/spacy-models/releases/download/pt_core_news_sm-3.7.0/pt_core_news_sm-3.7.0-py3-none-any.whl