Model Card: Phi-3 Mini MBTI Classifier (LoRA Fine-Tuned)


library_name: transformers tags: mbti, personality-prediction, phi3, lora, finetuned-model


🧠 Model Summary

This repository contains a LoRA fine‑tuned version of microsoft/Phi-3-mini-4k-instruct, optimized for MBTI personality prediction based on text input. The model is trained using Lightning AI on L40S GPUs and supports lightweight inference on T4 GPUs with 4‑bit quantization.

The model predicts MBTI types like: INTJ, ENFP, ISTP, etc.


πŸ“Œ Model Details

Model Description

This model adapts Phi-3-mini-4k-instruct using PEFT LoRA for efficient fine-tuning on MBTI classification from social media / profile text. It analyzes writing patterns, behaviors, and linguistic signals to output the most likely MBTI type.

  • Developed by: Md Al Amin (alam1n)
  • Model type: Causal LM (LoRA fine‑tuned)
  • Language: English
  • License: Same as base model (Phi-3-mini-4k-instruct)
  • Finetuned from: microsoft/Phi-3-mini-4k-instruct

Model Sources


πŸ”§ Intended Uses

Direct Use

  • Predicting MBTI type from user-generated text.
  • Personality insight for research, educational, and experimental applications.

Downstream Use

  • Integrating into RAG systems.
  • Social media analytics pipelines.
  • Psychological pattern analysis tools.

❌ Out-of-Scope / Limitations

  • Not suitable for clinical diagnosis.
  • Not intended for decisions with real-world consequences.
  • May show bias if input text is too short or ambiguous.

⚠️ Bias, Risks & Limitations

The model is trained on a dataset derived from behavior/social text and may inherit biases such as:

  • Overgeneralization based on linguistic style.
  • Cultural or regional bias depending on dataset distribution.
  • Input shorter than 20–30 words may produce inaccurate predictions.

Recommendations

  • Always treat predictions as probabilistic, not absolute.
  • Use with human oversight.

πŸš€ How to Use

Here is the exact inference code used for this model:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel, LoraConfig
import json
from huggingface_hub import hf_hub_download
import os

# Check versions
import transformers, peft
print(f"Transformers: {transformers.__version__}")
print(f"PEFT: {peft.__version__}")

# Model paths
model_name = "microsoft/Phi-3-mini-4k-instruct"\model_path = "alam1n/phi3-mbti-lora"

# Step 1: Download and fix the config file
print("Downloading and fixing adapter config...")
config_file = hf_hub_download(repo_id=model_path, filename="adapter_config.json")

with open(config_file, 'r') as f:
    config_data = json.load(f)

print(f"Original config keys: {list(config_data.keys())}")

# Create clean config
clean_config = {
    "base_model_name_or_path": config_data.get("base_model_name_or_path", model_name),
    "bias": config_data.get("bias", "none"),
    "fan_in_fan_out": config_data.get("fan_in_fan_out", False),
    "inference_mode": True,
    "init_lora_weights": config_data.get("init_lora_weights", True),
    "lora_alpha": config_data.get("lora_alpha", 32),
    "lora_dropout": config_data.get("lora_dropout", 0.05),
    "modules_to_save": config_data.get("modules_to_save"),
    "peft_type": "LORA",
    "r": config_data.get("r", 16),
    "target_modules": config_data.get("target_modules", []),
    "task_type": config_data.get("task_type", "CAUSAL_LM")
}

with open(config_file, 'w') as f:
    json.dump(clean_config, f, indent=2)

print("Config fixed!")

# Step 2: Load model with quantization
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16
)

print("Loading base model...")
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True
)

print("Loading LoRA adapter...")
model = PeftModel.from_pretrained(model, model_path)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True, trust_remote_code=True)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

print("Model loaded successfully!")

# MBTI prediction function
def predict_mbti(person_text):
    model.eval()
    prompt = f"""<|system|>
You are an expert in MBTI personality analysis. Return ONLY the MBTI type.
<|end|>
<|user|>
Analyze this person's posts and determine their MBTI type:
"{person_text}"<|end|>
<|assistant|>
"""

    input_ids = tokenizer(prompt, return_tensors="pt").to(model.device)

    with torch.no_grad():
        outputs = model.generate(
            **input_ids,
            max_new_tokens=10,
            do_sample=False,
            eos_token_id=tokenizer.convert_tokens_to_ids(["<|end|"])[0],
            pad_token_id=tokenizer.convert_tokens_to_ids(["<|end|"])[0]
        )

    generated_text = tokenizer.decode(outputs[:, input_ids['input_ids'].shape[-1]:][0], skip_special_tokens=False)
    return generated_text.split("<|end|>")[0].strip()

# Test example
print(predict_mbti("I love analyzing systems and optimizing code."))

πŸ‹οΈ Training Details

Training Data

  • Custom dataset of user posts & profile text mapped to MBTI labels.
  • Text cleaned and tokenized using AutoTokenizer from Phi-3.

Training Procedure

  • LoRA fine‑tuning using Lightning AI.
  • Precision: BF16 mixed precision.
  • Optimizer: AdamW.
  • Scheduler: Cosine Annealing.

Hardware

  • Training: NVIDIA L40S
  • Inference: NVIDIA T4 (4‑bit quantized)

πŸ“Š Evaluation

Metrics

  • Accuracy
  • F1‑Score (macro)

Summary

Model performs well for longer text (> 40–50 words). Very short inputs may decrease accuracy.


🌱 Environmental Impact

  • Hardware: L40S
  • Cloud Provider: Lightning AI
  • Training Duration: Approx. hours-scale (depends on dataset size)

Carbon estimation can be computed using ML COβ‚‚ Impact calculator.


πŸ“š Citation

BibTeX:

@model{phi3_mbti_lora,
  title={Phi-3 Mini MBTI Classifier},
  author={Md Al Amin},
  year={2025},
  publisher={HuggingFace}
}

πŸ‘€ Model Card Author

Md Al Amin (alam1n)


πŸ“¬ Contact

For questions/issues: Open an issue in this repository.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for alam1n/phi3-mbti-lora

Finetuned
(866)
this model