|
|
--- |
|
|
license: apache-2.0 |
|
|
--- |
|
|
--- |
|
|
license: mit |
|
|
language: en |
|
|
tags: |
|
|
- sklearn |
|
|
- text-classification |
|
|
- psychology |
|
|
- mbti |
|
|
--- |
|
|
|
|
|
# MBTI Personality Predictor |
|
|
|
|
|
This repository contains scikit-learn models for predicting MBTI personality types from text. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
This system consists of a `TfidfVectorizer` and four separate `LogisticRegression` models, one for each of the MBTI dimensions: |
|
|
|
|
|
* **Mind:** Introversion (I) vs. Extraversion (E) |
|
|
* **Energy:** Intuition (N) vs. Sensing (S) |
|
|
* **Nature:** Thinking (T) vs. Feeling (F) |
|
|
* **Tactics:** Judging (J) vs. Perceiving (P) |
|
|
|
|
|
## Intended Use |
|
|
|
|
|
These models are intended for educational purposes and to demonstrate building an NLP classification system. They can be used to predict an MBTI type from a block of English text. **This is not a clinical or diagnostic tool.** |
|
|
|
|
|
## Training Data |
|
|
|
|
|
The models were trained on the [Myers-Briggs Personality Type Dataset](https://www.kaggle.com/datasets/datasnaek/mbti-type) from Kaggle, which contains over 8,600 entries of text from social media forums. |
|
|
|
|
|
## Training Procedure |
|
|
|
|
|
Text was cleaned by removing URLs and punctuation, lemmatizing, and removing stopwords. The text was then vectorized using TF-IDF (`max_features=5000`, `ngram_range=(1, 2)`). Each `LogisticRegression` model was trained with `class_weight='balanced'` to counteract the natural imbalance in the dataset. |
|
|
|
|
|
### Evaluation Results |
|
|
|
|
|
Average F1-Scores on the test set: |
|
|
* **I/E Model:** Macro F1-Score: ~0.79 |
|
|
* **N/S Model:** Macro F1-Score: [Add Your Score] |
|
|
* **F/T Model:** Macro F1-Score: [Add Your Score] |
|
|
* **J/P Model:** Macro F1-Score: [Add Your Score] |
|
|
|
|
|
## How to Use |
|
|
|
|
|
```python |
|
|
import joblib |
|
|
from huggingface_hub import hf_hub_download |
|
|
|
|
|
# Define the repo ID |
|
|
repo_id = "YOUR_USERNAME/mbti-personality-predictor" |
|
|
|
|
|
# Download all the model files |
|
|
vectorizer = joblib.load(hf_hub_download(repo_id=repo_id, filename="mbti_vectorizer.joblib")) |
|
|
model_ie = joblib.load(hf_hub_download(repo_id=repo_id, filename="mbti_model_ie.joblib")) |
|
|
model_ns = joblib.load(hf_hub_download(repo_id=repo_id, filename="mbti_model_ns.joblib")) |
|
|
model_ft = joblib.load(hf_hub_download(repo_id=repo_id, filename="mbti_model_ft.joblib")) |
|
|
model_jp = joblib.load(hf_hub_download(repo_id=repo_id, filename="mbti_model_jp.joblib")) |
|
|
|
|
|
# You can now use these objects for prediction... |