File size: 2,315 Bytes

3c1a273

---
license: apache-2.0
---
---
license: mit
language: en
tags:
- sklearn
- text-classification
- psychology
- mbti
---

# MBTI Personality Predictor

This repository contains scikit-learn models for predicting MBTI personality types from text.

## Model Details

This system consists of a `TfidfVectorizer` and four separate `LogisticRegression` models, one for each of the MBTI dimensions:

* **Mind:** Introversion (I) vs. Extraversion (E)
* **Energy:** Intuition (N) vs. Sensing (S)
* **Nature:** Thinking (T) vs. Feeling (F)
* **Tactics:** Judging (J) vs. Perceiving (P)

## Intended Use

These models are intended for educational purposes and to demonstrate building an NLP classification system. They can be used to predict an MBTI type from a block of English text. **This is not a clinical or diagnostic tool.**

## Training Data

The models were trained on the [Myers-Briggs Personality Type Dataset](https://www.kaggle.com/datasets/datasnaek/mbti-type) from Kaggle, which contains over 8,600 entries of text from social media forums.

## Training Procedure

Text was cleaned by removing URLs and punctuation, lemmatizing, and removing stopwords. The text was then vectorized using TF-IDF (`max_features=5000`, `ngram_range=(1, 2)`). Each `LogisticRegression` model was trained with `class_weight='balanced'` to counteract the natural imbalance in the dataset.

### Evaluation Results

Average F1-Scores on the test set:
* **I/E Model:** Macro F1-Score: ~0.79
* **N/S Model:** Macro F1-Score: [Add Your Score]
* **F/T Model:** Macro F1-Score: [Add Your Score]
* **J/P Model:** Macro F1-Score: [Add Your Score]

## How to Use

```python
import joblib
from huggingface_hub import hf_hub_download

# Define the repo ID
repo_id = "YOUR_USERNAME/mbti-personality-predictor"

# Download all the model files
vectorizer = joblib.load(hf_hub_download(repo_id=repo_id, filename="mbti_vectorizer.joblib"))
model_ie = joblib.load(hf_hub_download(repo_id=repo_id, filename="mbti_model_ie.joblib"))
model_ns = joblib.load(hf_hub_download(repo_id=repo_id, filename="mbti_model_ns.joblib"))
model_ft = joblib.load(hf_hub_download(repo_id=repo_id, filename="mbti_model_ft.joblib"))
model_jp = joblib.load(hf_hub_download(repo_id=repo_id, filename="mbti_model_jp.joblib"))

# You can now use these objects for prediction...