--- license: apache-2.0 --- --- license: mit language: en tags: - sklearn - text-classification - psychology - mbti --- # MBTI Personality Predictor This repository contains scikit-learn models for predicting MBTI personality types from text. ## Model Details This system consists of a `TfidfVectorizer` and four separate `LogisticRegression` models, one for each of the MBTI dimensions: * **Mind:** Introversion (I) vs. Extraversion (E) * **Energy:** Intuition (N) vs. Sensing (S) * **Nature:** Thinking (T) vs. Feeling (F) * **Tactics:** Judging (J) vs. Perceiving (P) ## Intended Use These models are intended for educational purposes and to demonstrate building an NLP classification system. They can be used to predict an MBTI type from a block of English text. **This is not a clinical or diagnostic tool.** ## Training Data The models were trained on the [Myers-Briggs Personality Type Dataset](https://www.kaggle.com/datasets/datasnaek/mbti-type) from Kaggle, which contains over 8,600 entries of text from social media forums. ## Training Procedure Text was cleaned by removing URLs and punctuation, lemmatizing, and removing stopwords. The text was then vectorized using TF-IDF (`max_features=5000`, `ngram_range=(1, 2)`). Each `LogisticRegression` model was trained with `class_weight='balanced'` to counteract the natural imbalance in the dataset. ### Evaluation Results Average F1-Scores on the test set: * **I/E Model:** Macro F1-Score: ~0.79 * **N/S Model:** Macro F1-Score: [Add Your Score] * **F/T Model:** Macro F1-Score: [Add Your Score] * **J/P Model:** Macro F1-Score: [Add Your Score] ## How to Use ```python import joblib from huggingface_hub import hf_hub_download # Define the repo ID repo_id = "YOUR_USERNAME/mbti-personality-predictor" # Download all the model files vectorizer = joblib.load(hf_hub_download(repo_id=repo_id, filename="mbti_vectorizer.joblib")) model_ie = joblib.load(hf_hub_download(repo_id=repo_id, filename="mbti_model_ie.joblib")) model_ns = joblib.load(hf_hub_download(repo_id=repo_id, filename="mbti_model_ns.joblib")) model_ft = joblib.load(hf_hub_download(repo_id=repo_id, filename="mbti_model_ft.joblib")) model_jp = joblib.load(hf_hub_download(repo_id=repo_id, filename="mbti_model_jp.joblib")) # You can now use these objects for prediction...