File size: 10,832 Bytes

e49cc4c
 
9b1f311
 
 
 
 
 
 
 
 
e49cc4c
 
 
 
9b1f311
 
e49cc4c
 
 
 
 
 
9b1f311
e49cc4c
9b1f311
e49cc4c
9b1f311
e49cc4c
9b1f311
 
 
 
 
e49cc4c
9b1f311
e49cc4c
9b1f311
e49cc4c
 
 
9b1f311
e49cc4c
9b1f311
 
 
 
 
e49cc4c
9b1f311
 
e49cc4c
9b1f311
 
 
e49cc4c
9b1f311
 
 
e49cc4c
9b1f311
 
 
 
 
 
 
e49cc4c
 
 
 
9b1f311
e49cc4c
9b1f311
 
 
 
e49cc4c
 
 
9b1f311
e49cc4c
 
 
9b1f311
 
 
 
e49cc4c
 
 
9b1f311
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e49cc4c
 
 
 
 
9b1f311
 
e49cc4c
 
 
9b1f311
e49cc4c
9b1f311
e49cc4c
9b1f311
 
e49cc4c
9b1f311
e49cc4c
9b1f311
e49cc4c
9b1f311
e49cc4c
9b1f311
e49cc4c
9b1f311
e49cc4c
9b1f311
e49cc4c
9b1f311
e49cc4c
9b1f311
e49cc4c
9b1f311
e49cc4c
9b1f311
e49cc4c
9b1f311
e49cc4c
9b1f311
e49cc4c
9b1f311
e49cc4c
9b1f311
e49cc4c
9b1f311
e49cc4c
9b1f311
 
 
e49cc4c
9b1f311
 
 
 
e49cc4c
 
 
 
9b1f311
 
e49cc4c
9b1f311
 
 
 
e49cc4c
9b1f311
 
 
 
e49cc4c
9b1f311
 
e49cc4c
9b1f311
 
 
 
e49cc4c
9b1f311
 
 
 
e49cc4c
9b1f311
 
 
 
 
 
e49cc4c
 
 
 
 
 
 
9b1f311
 
 
 
e49cc4c
 
 
 
9b1f311
 
 
 
 
 
 
 
 
 
e49cc4c
 
 
9b1f311

---
library_name: transformers
tags:
- parenting
- empathy
license: apache-2.0
language:
- en
base_model:
- mistralai/Mistral-7B-Instruct-v0.3
pipeline_tag: text-generation
---

# Model Card for Model ID

**ParentPalAI** fine-tunes `mistralai/Mistral-7B-Instruct-v0.3` using **Direct Preference Optimization (DPO)** combined with **Parameter-Efficient Fine-Tuning (PEFT)** via **Quantized Low-Rank Adaptation (QLoRA)**.  
The goal is to enhance **empathy and emotional resonance** in parenting-related conversations while studying the trade-offs between emotional alignment, clarity, and factual quality.


## Model Details

### Model Description

**Goal:** Improve the empathy and emotional resonance of parenting-focused LLM responses while analyzing the impact of alignment techniques on overall quality.

**Action:** Fine-tuned Mistral-7B-Instruct on ~1K synthetic preference pairs using Direct Preference Optimization (DPO) with Parameter-Efficient Fine Tuning (PEFT) i.e. Quantized Low-Rank Adaptation (QLoRA). Built a complete alignment workflow covering prompt engineering, preference pairs generation, QLoRA fine-tuning, and LLM-as-a-Judge (GPT-4o) evaluation with custom empathy and quality metrics.

**Result:** Drove a +65-point increase in empathy win rate (11% to 76%), revealing meaningful trade-offs between emotional alignment, and clarity and overall quality to inform subsequent multi-objective fine-tuning strategies.

- **Developed by:** Prerna Chikersal
- **Model type:** PEFT
- **Language(s) (NLP):** English
- **License:** Apache 2.0
- **Finetuned from model:** Mistral-7B-Instruct-v0.3

### Model Sources

- **Repository:** https://github.com/prernaa/ParentPalAI (includes sample responses)

## Uses

ParentPalAI was developed for research and educational purposes — primarily to explore:

- How fine-tuning on synthetic preference pairs affects empathy, tone, and relatability in LLM responses.
- The trade-off between emotional resonance and clarity/helpfulness in aligned models.
- Methods for enhancing warmth and naturalness in conversational AI through DPO and PEFT (QLoRA).

Researchers, educators, and ML practitioners can use this model to:

- Study fine-tuning effects on emotional style and alignment.
- Prototype empathy-driven LLMs for social or psychological dialogue settings.

### Direct Use

You can use ParentPalAI to:

- Generate empathetic, supportive, and warm responses to parenting-related prompts.
- Experiment with style transfer and tone control in conversational AI.
- Test LLM evaluation metrics (e.g., LLM-as-a-Judge) for empathy, tone, and clarity.

Example:
```
prompt = "My toddler cries every night before bed. What should I do?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=250)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```


### Out-of-Scope Use

This model is not suitable for:

- Clinical, medical, or therapeutic advice.
- Real-world parenting counseling or behavioral guidance.
- Any deployment scenario involving high-stakes decision-making, mental health support, or childcare recommendations.
- Content moderation, bias-free generation, or factual question answering — the Reddit dataset may contain noisy or biased language.

## Bias, Risks, and Limitations

The model should not be used for real parenting, psychological, or medical guidance. Instead, it serves as a research tool for exploring empathy and tone in language models, and all outputs should be reviewed critically before use.

### Recommendations

- Always pair this adapter with the base model mistralai/Mistral-7B-Instruct-v0.3.
- Use bfloat16 precision and FlashAttention 2 on A100 or H100 GPUs for optimal speed.
- Evaluate generations qualitatively for empathy, clarity, and factual accuracy before any downstream use.
- For production or sensitive domains, fine-tune further using curated, high-quality data or Direct Preference Optimization (DPO) to balance warmth and helpfulness.

## How to Get Started with the Model

This repository only contains PEFT adapter weights — not the full 7B model.
To use the model, you must load the base Mistral model and apply this adapter.

- Base model: mistralai/Mistral-7B-Instruct-v0.3
- Fine-tuning method: QLoRA (PEFT)
- Training data: synthetic preference pairs data from GPT
- Goal: Explore how DPO by optimization for empathy and overall quality affects empathy and warmth in responses.

```python
# LOAD THE BASE MODEL IN 4-BIT PRECISION WITH DOUBLE QUANTIZATION

import torch
from transformers import AutoModelForCausalLM, BitsAndBytesConfig, AutoTokenizer

torch.backends.cuda.matmul.allow_tf32 = True
torch.set_float32_matmul_precision("high")

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True, # loads base model in 4-bit precision
    bnb_4bit_use_double_quant=True, # double quantization saves VRAM
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

model = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL_ID,
    quantization_config=bnb_config,
    device_map="auto",
    dtype=torch.bfloat16,
    attn_implementation="flash_attention_2", # FA2 is fastest on A100
    token=HF_TOKEN # login to hugging face
)

model.config.pad_token_id = tokenizer.pad_token_id
model.generation_config.pad_token_id = tokenizer.pad_token_id

## Load the ParentPalAI PEFT Model
from peft import PeftModel
model = PeftModel.from_pretrained(model, "prernac1/parentpalai")

## Inference
prompt = """You’re a supportive parent responding to another parent who is struggling with toddler tantrums."""
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=300,
    temperature=0.7,
    top_p=0.9
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

```

## Training Details

### Training Data

V1 (optimizing for empathy): https://github.com/prernaa/ParentPalAI/blob/main/data_to_share/dpo_dataset_dpo_labels_v1.jsonl
V2 (optimizing for overall quality): https://github.com/prernaa/ParentPalAI/blob/main/data_to_share/dpo_dataset_dpo_labels_v2.jsonl

### Training Procedure

PEFT with QLoRA (4-bit precision) on A100 Google Collab.

#### Training Hyperparameters

- ParentPalAI was fine-tuned using Quantized Low-Rank Adaptation (QLoRA) on the base model mistralai/Mistral-7B-Instruct-v0.3.
The model was trained in 4-bit precision with double quantization (NF4) and bfloat16 compute, optimized for VRAM efficiency on T4 and A100 GPUs. The model was trained on A100.

- Training method: QLoRA (Parameter-Efficient Fine-Tuning)

- Precision: 4-bit quantization (NF4) with double quantization, compute in bfloat16

- Optimizer: paged_adamw_8bit

- Scheduler: Cosine learning rate decay with 3% warmup

Batching: Effective batch size of 24 (per_device_train_batch_size=6, gradient_accumulation_steps=4)

Epochs: 1–2 (best checkpoint after 1 epoch, ~40 steps)

Dropout: 0.15 (LoRA)

LoRA rank: 8 (r=8), scaling factor alpha=32

Trainable parameters: ~0.18% of total model parameters

Gradient checkpointing: Enabled

Attention implementation: FlashAttention 2

Mixed precision: bfloat16 mixed precision

Base precision (non-quantized runs): bfloat16

## Evaluation

### Testing Data, Factors & Metrics

#### Testing Data

Here is the test dataset generated by GPT4o: https://github.com/prernaa/ParentPalAI/blob/main/data_to_share/dpo_dataset_test.jsonl

#### Metrics
ParentPalAI was evaluated using **GPT-4o as an LLM-as-a-Judge**, comparing its responses (System B) to the base model `mistralai/Mistral-7B-Instruct-v0.3` (System A).  
Each model pair was scored on six qualitative dimensions — *empathy, clarity, comprehensiveness, practicality, adoptability,* and *overall quality* — across 100 GPT-generated parenting prompts.  
Two variants of ParentPalAI were tested to understand alignment trade-offs.


### Results

### **Version 1 (Empathy-Focused DPO)**  
*(optimized for empathy but considers overall quality)*  

| System | winner_empathy | winner_clarity | winner_overall |
|:-------|:---------------:|:---------------:|:---------------:|
| **System A** | 0.1066 | **0.8883** | **0.7462** |
| **System B (ParentPalAI V1)** | **0.7640** | 0.1117 | 0.2538 |

**Findings:**  
- ParentPalAI V1 dramatically increased *empathy* (+65 points, from ~11% → 76%).  
- The model produced noticeably warmer, more supportive tone but with reduced *clarity* and *practical helpfulness*.  
- Despite lower clarity, some responses were judged as more *relatable* and *emotionally resonant*, showing that empathic alignment can enhance perceived authenticity even when utility drops.

### **Version 2 (Overall-Quality-Focused DPO)**  
*(optimized only for overall win rate)*  

| System | winner_empathy | winner_clarity | winner_overall |
|:-------|:---------------:|:---------------:|:---------------:|
| **System A** | 0.4340 | **0.8604** | **0.6371** |
| **System B (ParentPalAI V2)** | 0.2843 | 0.1371 | 0.3629 |

**Findings:**  
- Optimizing purely for *overall quality* partially recovered clarity and practicality but reduced empathic warmth (43 % → 28 %).  
- The model balanced tone and coherence better than V1 but sounded less emotionally attuned.  
- This highlights a core alignment tension: maximizing clarity and factual strength can come at the expense of empathy and perceived connection.


#### Summary
- **V1**: Highest empathy and relatability, weaker clarity -> ideal for exploring affective alignment.  
- **V2**: More balanced but emotionally flatter -> better for generalized instruction following.  
- Empathy and clarity appear inversely correlated when optimizing single-objective DPO.  
- Future work will explore **multi-objective DPO** and **reinforcement from human preferences** to jointly optimize warmth, clarity, and factual helpfulness.

## Environmental Impact

<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->

Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).

- **Hardware Type:** A100
- **Hours used:** 5
- **Cloud Provider:** Google Collab
- **Compute Region:** USA
- **Carbon Emitted:** [More Information Needed]

## Citation [optional]

```
@misc{chikersal2025parentpalai,
author = {Prerna Chikersal},
title = {ParentPalAI — Empathic Fine-Tuning of LLMs using Direct Preference Optimization (DPO) with QLoRA},
year = {2025},
publisher = {GitHub},
howpublished = {\url{https://github.com/prernaa/ParentPalAI}},
note = {Hugging Face Model: https://huggingface.co/prernac1/parentpalai}
}
```

## Model Card Contact

Prerna Chikersal: pchikersal@gmail.com