Upload folder using huggingface_hub

a4b4bca verified 2 months ago

13.2 kB

	---
	language:
	- en
	license: apache-2.0
	tags:
	- text-classification
	- multilabel-classification
	- behavioral-coding
	- motivational-interviewing
	- modernbert
	- transformers
	base_model: answerdotai/ModernBERT-base
	metrics:
	- f1
	- precision
	- recall
	- exact_match
	- hamming_loss
	model-index:
	- name: bc-multilabel-classifier
	results:
	- task:
	type: text-classification
	name: Multilabel Text Classification
	metrics:
	- name: Exact Match
	type: exact_match
	value: 0.8563
	- name: Hamming Loss
	type: hamming_loss
	value: 0.0579
	- name: F1 Macro
	type: f1_macro
	value: 0.8666
	- name: F1 Micro
	type: f1_micro
	value: 0.9246
	- name: Adherent F1
	type: f1
	value: 0.7429
	- name: Non-Adherent F1
	type: f1
	value: 0.8932
	- name: Neutral F1
	type: f1
	value: 0.9639
	widget:
	- text: "That's a great step you're taking to improve your health."
	- text: "You really should stop smoking, it's bad for you."
	- text: "What do you think about trying to quit?"
	---

	# Behavioral Coding Multilabel Classifier

	## Model Description

	This model is a fine-tuned version of [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) for multilabel classification of Motivational Interviewing (MI) behavioral codes. It classifies utterances into three non-mutually-exclusive categories used in behavioral coding of therapeutic conversations.

	Developed by: Lekhansh

	Model type: Multilabel Text Classification

	Language: English

	Base model: [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base)

	License: Apache 2.0

	## Intended Uses

	### Primary Use Case

	This model is designed for automated behavioral coding in Motivational Interviewing contexts, predicting three types of MI-consistent and MI-inconsistent behaviors:

	- Adherent: MI-adherent behaviors (e.g., affirmations, seek collaboration)
	- Non-Adherent: MI-non-adherent behaviors (e.g., confrontation, persuade without permission)
	- Neutral: Neutral behaviors (e.g., giving information, questions, reflections)

	### Key Features

	- Multilabel Classification: Utterances can have multiple labels simultaneously
	- Therapeutic Context: Specifically trained on Motivational Interviewing conversations
	- Context-Aware: Includes three preceding utterances for context

	### Potential Applications

	- Automated analysis of therapy session transcripts
	- Training and feedback for MI practitioners
	- Quality assurance in behavioral health interventions
	- Research in therapeutic communication patterns

	## Model Performance

	### Test Set Metrics

	The model was evaluated on a held-out test set of 3,235 coded utterances.

	#### Overall Performance

	\| Metric \| Score \|
	\|--------\|------:\|
	\| Exact Match Accuracy \| 85.63% \|
	\| Hamming Loss \| 0.0579 \|
	\| F1 Macro \| 86.66% \|
	\| F1 Micro \| 92.46% \|
	\| Precision Macro \| 86.53% \|
	\| Precision Micro \| 93.47% \|
	\| Recall Macro \| 86.84% \|
	\| Recall Micro \| 91.48% \|

	Exact Match: Percentage of examples where all labels are predicted correctly
	Hamming Loss: Average fraction of labels that are incorrectly predicted (lower is better)

	#### Per-Label Performance

	\| Label \| F1 Score \| Precision \| Recall \| Accuracy \|
	\|-------\|----------\|-----------\|--------\|----------\|
	\| Adherent \| 74.29% \| 74.47% \| 74.10% \| 90.26% \|
	\| Non-Adherent \| 89.32% \| 87.34% \| 91.39% \| 98.98% \|
	\| Neutral \| 96.39% \| 97.77% \| 95.04% \| 93.38% \|

	### Class Distribution

	The training data exhibits class imbalance, addressed through positive class weighting:
	- Neutral: Most common (majority class)
	- Non-Adherent: Moderate frequency
	- Adherent: Least common (minority class)

	## Training Details

	### Training Data

	- Source: Multilabel behavioral coding dataset from Motivational Interviewing transcripts
	- Preprocessing:
	- Excluded utterances marked as "not_coded" (no MI codes assigned)
	- Included context from three preceding utterances
	- Stratified splitting to maintain label distribution
	- Split: 70% train, 15% validation, 15% test

	### Training Procedure

	Hardware:
	- GPU training with CUDA
	- Mixed precision (BFloat16) training

	Hyperparameters:

	\| Parameter \| Value \|
	\|-----------\|-------\|
	\| Learning Rate \| 6e-5 \|
	\| Batch Size (per device) \| 12 \|
	\| Gradient Accumulation \| 2 steps \|
	\| Effective Batch Size \| 24 \|
	\| Max Sequence Length \| 3000 tokens \|
	\| Epochs \| 20 (early stopped at epoch 14) \|
	\| Weight Decay \| 0.01 \|
	\| Warmup Ratio \| 0.1 \|
	\| LR Scheduler \| Cosine \|
	\| Optimizer \| AdamW \|
	\| Dropout \| 0.1 \|

	Training Features:
	- Positive Class Weighting: BCEWithLogitsLoss with computed pos_weights for each label
	- Early Stopping: Patience of 3 epochs on validation F1 macro
	- Gradient Checkpointing: Enabled for memory efficiency
	- Flash Attention 2: For efficient attention computation
	- Best Model Selection: Based on validation F1 macro score

	Loss Function: Binary Cross-Entropy with Logits Loss (BCEWithLogitsLoss) with per-label positive class weights

	### Model Architecture

	The model uses a custom architecture on top of ModernBERT:
	```
	ModernBERT-base (encoder)
	→ [CLS] token extraction
	→ Dropout (0.1)
	→ Linear layer (hidden_size → 3)
	→ Sigmoid activation (applied during inference)
	```

	## Usage

	### Direct Use

	```python
	import torch
	from transformers import AutoTokenizer, AutoModel
	import torch.nn as nn

	# Define the model class
	class MultiLabelBERTModel(nn.Module):
	def __init__(self, model_name, num_labels=3, dropout=0.1):
	super().__init__()
	self.bert = AutoModel.from_pretrained(model_name)
	self.dropout = nn.Dropout(dropout)
	self.classifier = nn.Linear(self.bert.config.hidden_size, num_labels)
	self.num_labels = num_labels

	def forward(self, input_ids, attention_mask):
	outputs = self.bert(input_ids=input_ids, attention_mask=attention_mask)
	pooled_output = outputs.last_hidden_state[:, 0, :] # [CLS] token
	pooled_output = self.dropout(pooled_output)
	logits = self.classifier(pooled_output)
	return logits

	# Load model and tokenizer
	model_name = "Lekhansh/bc-multilabel-classifier"
	tokenizer = AutoTokenizer.from_pretrained(model_name)

	# Initialize model architecture
	model = MultiLabelBERTModel(model_name, num_labels=3)

	# Load trained weights
	# Note: You'll need to load the weights from the saved model
	model.eval()

	# Prepare input
	text = "That's a wonderful goal you've set for yourself."
	inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=3000)

	# Get predictions
	with torch.no_grad():
	logits = model(inputs['input_ids'], inputs['attention_mask'])
	probs = torch.sigmoid(logits)
	predictions = (probs > 0.5).int()

	# Interpret results
	labels = ['adherent', 'non_adherent', 'neutral']
	print(f"Text: {text}")
	print("\nPredictions:")
	for i, label in enumerate(labels):
	if predictions[0][i]:
	print(f" ✓ {label} (confidence: {probs[0][i]:.2%})")
	```

	### Batch Prediction with Confidence Scores

	```python
	def predict_multilabel(texts, model, tokenizer, threshold=0.5):
	"""
	Predict multiple labels for each text with confidence scores.

	Args:
	texts: List of input texts
	model: The multilabel classification model
	tokenizer: The tokenizer
	threshold: Probability threshold for positive prediction (default: 0.5)

	Returns:
	List of dicts with predictions and probabilities
	"""
	inputs = tokenizer(
	texts,
	return_tensors="pt",
	truncation=True,
	max_length=3000,
	padding=True
	)

	with torch.no_grad():
	logits = model(inputs['input_ids'], inputs['attention_mask'])
	probs = torch.sigmoid(logits)

	labels = ['adherent', 'non_adherent', 'neutral']
	results = []

	for i in range(len(texts)):
	predictions = (probs[i] > threshold).int()
	result = {
	'text': texts[i],
	'labels': {},
	'probabilities': {}
	}

	for j, label in enumerate(labels):
	result['labels'][label] = bool(predictions[j])
	result['probabilities'][label] = float(probs[i][j])

	results.append(result)

	return results

	# Example usage
	utterances = [
	"I hear you saying that you want to change but you're not sure how.",
	"You need to stop making excuses and just do it.",
	"How many cigarettes do you smoke per day?"
	]

	results = predict_multilabel(utterances, model, tokenizer)
	for r in results:
	print(f"\nText: {r['text'][:60]}...")
	print("Predicted labels:")
	for label in ['adherent', 'non_adherent', 'neutral']:
	status = "✓" if r['labels'][label] else "✗"
	conf = r['probabilities'][label]
	print(f" {status} {label}: {conf:.2%}")
	```

	### Custom Threshold Tuning

	```python
	# Adjust threshold for precision/recall trade-off
	def predict_with_custom_threshold(text, model, tokenizer, thresholds):
	"""
	Predict with different thresholds for each label.

	Args:
	thresholds: Dict with keys 'adherent', 'non_adherent', 'neutral'
	"""
	inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=3000)

	with torch.no_grad():
	logits = model(inputs['input_ids'], inputs['attention_mask'])
	probs = torch.sigmoid(logits)

	labels_list = ['adherent', 'non_adherent', 'neutral']
	predictions = {}

	for i, label in enumerate(labels_list):
	threshold = thresholds.get(label, 0.5)
	predictions[label] = {
	'predicted': bool(probs[0][i] > threshold),
	'probability': float(probs[0][i]),
	'threshold': threshold
	}

	return predictions

	# Example: Higher threshold for adherent (higher precision)
	custom_thresholds = {
	'adherent': 0.6,
	'non_adherent': 0.5,
	'neutral': 0.5
	}

	result = predict_with_custom_threshold(
	"What are your thoughts on reducing your drinking?",
	model,
	tokenizer,
	custom_thresholds
	)
	```


	## Limitations and Bias

	### Limitations

	1. Domain Specificity: Trained on Motivational Interviewing data; may not generalize to other therapeutic modalities
	2. Context Dependency: Performance may vary with utterances lacking proper conversational context
	3. Class Imbalance: Lower performance on "adherent" label due to class imbalance in training data
	4. Multilabel Complexity: Some utterances may have ambiguous or overlapping codes
	5. Context Length: Maximum 3000 tokens; longer texts will be truncated
	6. Language: Trained on English text only

	### Potential Biases

	- Training data may reflect biases from the original coding framework and human coders
	- Performance may vary across different MI contexts (e.g., substance use vs. health behavior change)
	- Cultural and linguistic variations in therapeutic communication may affect predictions
	- The model may be more accurate on populations/contexts similar to training data

	### Recommended Use

	- Use as a screening tool or preliminary analysis, not as definitive behavioral coding
	- Validate predictions with human expert review, especially for critical applications
	- Consider adjusting prediction thresholds based on your use case (precision vs. recall trade-off)
	- Be aware that multilabel predictions may sometimes conflict with clinical judgment

	## Technical Specifications

	### Model Architecture

	- Base: ModernBERT-base (encoder-only transformer)
	- Custom Head: Dropout (0.1) + Linear layer (hidden_size → 3 labels)
	- Activation: Sigmoid (for independent label probabilities)
	- Attention: Flash Attention 2 implementation
	- Parameters: ~110M (inherited from base model + classification head)
	- Precision: BFloat16

	### Compute Infrastructure

	- Training: Single GPU with CUDA
	- Inference: CPU or GPU compatible
	- Memory: ~500MB model size

	### Label Format

	```python
	# Output format
	{
	"adherent": 0 or 1,
	"non_adherent": 0 or 1,
	"neutral": 0 or 1
	}

	# Example: An utterance can have multiple labels
	# "I hear that you're struggling, and I believe you can overcome this."
	# → adherent=1, non_adherent=0, neutral=0
	```

	## Environmental Impact

	Training was conducted using mixed precision to optimize resource usage. Exact carbon footprint was not measured.

	## Citation

	If you use this model in your research, please cite:

	```bibtex
	@misc{lekhansh2025bcmultilabel,
	author = {Lekhansh},
	title = {Behavioral Coding Multilabel Classifier for Motivational Interviewing},
	year = {2025},
	publisher = {HuggingFace},
	howpublished = {\url{https://huggingface.co/Lekhansh/bc-multilabel-classifier}}
	}
	```

	## References

	For more information on Motivational Interviewing behavioral coding:
	- Miller, W. R., & Rollnick, S. (2013). Motivational Interviewing: Helping People Change (3rd ed.)
	- Moyers, T. B., et al. (2016). Motivational Interviewing Treatment Integrity Coding Manual 4.2.1

	## Model Card Authors

	Lekhansh

	## Model Card Contact

	[drlekhansh@gmail.com]