Introduction

Commercial and academic AI mental-health tools are growing in popularity, yet testing and red-teaming these high-stakes systems is still difficult. Working with real users under proper ethical and clinical oversight is essential but costly, and it often fails to capture the full range of mental-health presentations. MH-Persona aims to close this gap by fine-tuning a Llama-3.2-1B model to mimic the style and behavior of a real user with mental health issues interacting with a chatbot.

Many existing LLMs aren’t fall short for testing mental-health applications. They default to “helpful assistant” mode with polished formatting, bullet points, and responses filled with the dreaded em dash. Safety training adds another limitation as some models refuse to engage with topics like suicide or self-harm. Both the style and content limitations of LLMs can undermine their usefulness for testing mental health applications.

MH-Persona uses an open source dataset of mental health social media posts to fine-tune a non-instruction tuned version of Llama-3.2-1B in order to avoid the pitfalls of using a standard LLM for mental health user testing and red-teaming. The model is designed to respond in the style of real users, specifically, without highly stylized, rich formatting that characterizes many instruction tuned LLM models. The model also focuses on mental health topics, including suicide/self-harm, anxiety, depression, and PTSD. The model performs well on testing using LLM-As-A-Judge and maintains performance on the AIME benchmark compared to the underlying base model.

Data

Training Data: solomonk/reddit_mental_health_posts was used for fine-tuning. Additional cleaning was done to remove deleted content. The dataset was also filtered to only content relating to suicide/self-harm, anxiety, depression, OCD, and PTSD.

Data Split

  • Train: 80%
  • Test: 20%
  • Random seed:42

Methodology

The model was fine-tuned using Low-Rank Adaption (LoRA). For this model, LoRA was ideal because it provides a parameter efficient fine-tuning method that can be as effective as full fine-tuning without the higher compute costs and risk of catastrophic forgetting.

Fine-tuning Hyperparameters:

  • LoRA R = 96
  • LoRA Alpha = 96
  • LoRA Dropout = 0.05
  • Learning rate = 0.00001
  • Epochs = 1

Evaluation

Dataset LLM-As-a-Judge HellaSwag
meta-llama/Llama-3.2-1B Instruct ~0.58 ~0.47
meta-llama/Llama-3.2-1B ~0.60 ~0.44
meta-llama/Llama-3.2-1B with Prompt Tuning ~0.70 ~0.41
MH-Persona ~0.87 ~0.44

A LLM-As-a-Judge was used to evaluate both user style and mental health content in the model's responses. HellaSwag benchmark was also used to test against catastrophic forgetting on commonsense reasoning abilities as a result of the fine-tuning. The models evaluated include Llama-3.2-1B Instruct, Llama-3.2-1B (without instruction tuning), an prompt tuned version of Llama-3.2-1B, and the final version of MH-Persona. The models selected for evaluation provide a useful comparison of Llama’s official instructing tuning, the base model, and different fine-tuning techniques. The instruction tuned version of Llama-3.2-1B performed worse than the non-instruction tuned version, likely because the instruction tune version responds in the style of a “helpful assistant” and has been optimized for rich formatting in responses. As a result of the instruction tuning, the instruct version performs the best on HellaSwag. However, for the purposes of user style and mental health content, the instruct version is actual counter to the goals for this project. A version of Llama-3.2-1B with prompt tuning was evaluated in part to provide a comparison against using LoRA. Prompt tuning does yield significant results over the base version and prompt tuning has the benefit of being an extremely efficient model fine-tuning technique. Still, the highest performance improvement is in fine-tuning with LoRA using the solomonk/reddit_mental_health_posts dataset. Further improvements could probably yield even higher results and also diverse tests of different LLM-As-a-Judge would probably result in more robust evaluation.

Usage and Intended Uses

The intended uses for this model are to assist with testing and red-teaming mental health AI tools, including chatbots, or academic research on mental health issues. The model is designed to respond to input as a real user and does not require any specific formatting. The code below provides a basic way to load, query the model, and inspect responses.

Loading the Model

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base = "meta-llama/Llama-3.2-1B"
adapter = "OliviaKK/MH-Persona-Model"

tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(
    base,
    torch_dtype="auto",
    device_map="auto"
)
model = PeftModel.from_pretrained(model, adapter)

tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "left"

model.config.pad_token_id = tokenizer.pad_token_id
model.config.eos_token_id = tokenizer.eos_token_id

Prompting the Model

The model is designed to respond to prompts as a real user so no specific format is required. In a testing or red-teaming use case, the prompt can easily be replaced by output from another model or with a pipeline to an AI tool.

pipe = pipeline(
    "text-generation",
    model = model, 
    dtype = torch.bfloat16,
    device_map = "auto",
    tokenizer = tokenizer,
    max_new_tokens = 250, 
    do_sample = False)
prompt = f"How are you feeling today?"

Expected Output Format

The model will return free form text mimicking a real user expressing mental health concerns.

text = pipe(prompt)
print(text[0]['generated_text'])

I’m feeling a little better today. I’m not sure if it’s because I’m getting better or because I’m not as tired.

Limitations

This an experimental, fine-tuned model and should be used with caution. The output has not be evaluated by any mental health professional and nothing about the model should be construed as medical advice. The main limitation of the model is that using a small, non-instruction tuned model limits the controllability of the model and the model can struggle to predict a stop token. This can lead to long and repetitive responses. Furthermore, there was not any testing or evaluation on multi-turn conversations. More testing and further training is likely required to make the MH-persona useful in longer conversations and with longer context windows. Additional fine-tuning using different dataset from a mix of different sources (including data from different countries, age, gender, etc) would also help ensure the model’s responses accurately reflects different kinds of users.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for OliviaKK/MH-Persona-Model

Finetuned
(824)
this model