| | --- |
| | language: |
| | - en |
| | license: apache-2.0 |
| | tags: |
| | - text-classification |
| | - multilabel-classification |
| | - behavioral-coding |
| | - motivational-interviewing |
| | - modernbert |
| | - transformers |
| | base_model: answerdotai/ModernBERT-base |
| | metrics: |
| | - f1 |
| | - precision |
| | - recall |
| | - exact_match |
| | - hamming_loss |
| | model-index: |
| | - name: bc-multilabel-classifier |
| | results: |
| | - task: |
| | type: text-classification |
| | name: Multilabel Text Classification |
| | metrics: |
| | - name: Exact Match |
| | type: exact_match |
| | value: 0.8563 |
| | - name: Hamming Loss |
| | type: hamming_loss |
| | value: 0.0579 |
| | - name: F1 Macro |
| | type: f1_macro |
| | value: 0.8666 |
| | - name: F1 Micro |
| | type: f1_micro |
| | value: 0.9246 |
| | - name: Adherent F1 |
| | type: f1 |
| | value: 0.7429 |
| | - name: Non-Adherent F1 |
| | type: f1 |
| | value: 0.8932 |
| | - name: Neutral F1 |
| | type: f1 |
| | value: 0.9639 |
| | widget: |
| | - text: "That's a great step you're taking to improve your health." |
| | - text: "You really should stop smoking, it's bad for you." |
| | - text: "What do you think about trying to quit?" |
| | --- |
| | |
| | # Behavioral Coding Multilabel Classifier |
| |
|
| | ## Model Description |
| |
|
| | This model is a fine-tuned version of [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) for multilabel classification of Motivational Interviewing (MI) behavioral codes. It classifies utterances into three non-mutually-exclusive categories used in behavioral coding of therapeutic conversations. |
| |
|
| | **Developed by:** Lekhansh |
| |
|
| | **Model type:** Multilabel Text Classification |
| |
|
| | **Language:** English |
| |
|
| | **Base model:** [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) |
| |
|
| | **License:** Apache 2.0 |
| |
|
| | ## Intended Uses |
| |
|
| | ### Primary Use Case |
| |
|
| | This model is designed for automated behavioral coding in Motivational Interviewing contexts, predicting three types of MI-consistent and MI-inconsistent behaviors: |
| |
|
| | - **Adherent:** MI-adherent behaviors (e.g., affirmations, seek collaboration) |
| | - **Non-Adherent:** MI-non-adherent behaviors (e.g., confrontation, persuade without permission) |
| | - **Neutral:** Neutral behaviors (e.g., giving information, questions, reflections) |
| |
|
| | ### Key Features |
| |
|
| | - **Multilabel Classification:** Utterances can have multiple labels simultaneously |
| | - **Therapeutic Context:** Specifically trained on Motivational Interviewing conversations |
| | - **Context-Aware:** Includes three preceding utterances for context |
| |
|
| | ### Potential Applications |
| |
|
| | - Automated analysis of therapy session transcripts |
| | - Training and feedback for MI practitioners |
| | - Quality assurance in behavioral health interventions |
| | - Research in therapeutic communication patterns |
| |
|
| | ## Model Performance |
| |
|
| | ### Test Set Metrics |
| |
|
| | The model was evaluated on a held-out test set of 3,235 coded utterances. |
| |
|
| | #### Overall Performance |
| |
|
| | | Metric | Score | |
| | |--------|------:| |
| | | **Exact Match Accuracy** | **85.63%** | |
| | | **Hamming Loss** | **0.0579** | |
| | | **F1 Macro** | **86.66%** | |
| | | **F1 Micro** | **92.46%** | |
| | | **Precision Macro** | 86.53% | |
| | | **Precision Micro** | 93.47% | |
| | | **Recall Macro** | 86.84% | |
| | | **Recall Micro** | 91.48% | |
| |
|
| | **Exact Match:** Percentage of examples where all labels are predicted correctly |
| | **Hamming Loss:** Average fraction of labels that are incorrectly predicted (lower is better) |
| |
|
| | #### Per-Label Performance |
| |
|
| | | Label | F1 Score | Precision | Recall | Accuracy | |
| | |-------|----------|-----------|--------|----------| |
| | | **Adherent** | 74.29% | 74.47% | 74.10% | 90.26% | |
| | | **Non-Adherent** | 89.32% | 87.34% | 91.39% | 98.98% | |
| | | **Neutral** | 96.39% | 97.77% | 95.04% | 93.38% | |
| |
|
| | ### Class Distribution |
| |
|
| | The training data exhibits class imbalance, addressed through positive class weighting: |
| | - **Neutral:** Most common (majority class) |
| | - **Non-Adherent:** Moderate frequency |
| | - **Adherent:** Least common (minority class) |
| |
|
| | ## Training Details |
| |
|
| | ### Training Data |
| |
|
| | - **Source:** Multilabel behavioral coding dataset from Motivational Interviewing transcripts |
| | - **Preprocessing:** |
| | - Excluded utterances marked as "not_coded" (no MI codes assigned) |
| | - Included context from three preceding utterances |
| | - Stratified splitting to maintain label distribution |
| | - **Split:** 70% train, 15% validation, 15% test |
| | |
| | ### Training Procedure |
| | |
| | **Hardware:** |
| | - GPU training with CUDA |
| | - Mixed precision (BFloat16) training |
| | |
| | **Hyperparameters:** |
| | |
| | | Parameter | Value | |
| | |-----------|-------| |
| | | Learning Rate | 6e-5 | |
| | | Batch Size (per device) | 12 | |
| | | Gradient Accumulation | 2 steps | |
| | | Effective Batch Size | 24 | |
| | | Max Sequence Length | 3000 tokens | |
| | | Epochs | 20 (early stopped at epoch 14) | |
| | | Weight Decay | 0.01 | |
| | | Warmup Ratio | 0.1 | |
| | | LR Scheduler | Cosine | |
| | | Optimizer | AdamW | |
| | | Dropout | 0.1 | |
| | |
| | **Training Features:** |
| | - **Positive Class Weighting:** BCEWithLogitsLoss with computed pos_weights for each label |
| | - **Early Stopping:** Patience of 3 epochs on validation F1 macro |
| | - **Gradient Checkpointing:** Enabled for memory efficiency |
| | - **Flash Attention 2:** For efficient attention computation |
| | - **Best Model Selection:** Based on validation F1 macro score |
| |
|
| | **Loss Function:** Binary Cross-Entropy with Logits Loss (BCEWithLogitsLoss) with per-label positive class weights |
| |
|
| | ### Model Architecture |
| |
|
| | The model uses a custom architecture on top of ModernBERT: |
| | ``` |
| | ModernBERT-base (encoder) |
| | → [CLS] token extraction |
| | → Dropout (0.1) |
| | → Linear layer (hidden_size → 3) |
| | → Sigmoid activation (applied during inference) |
| | ``` |
| |
|
| | ## Usage |
| |
|
| | ### Direct Use |
| |
|
| | ```python |
| | import torch |
| | from transformers import AutoTokenizer, AutoModel |
| | import torch.nn as nn |
| | |
| | # Define the model class |
| | class MultiLabelBERTModel(nn.Module): |
| | def __init__(self, model_name, num_labels=3, dropout=0.1): |
| | super().__init__() |
| | self.bert = AutoModel.from_pretrained(model_name) |
| | self.dropout = nn.Dropout(dropout) |
| | self.classifier = nn.Linear(self.bert.config.hidden_size, num_labels) |
| | self.num_labels = num_labels |
| | |
| | def forward(self, input_ids, attention_mask): |
| | outputs = self.bert(input_ids=input_ids, attention_mask=attention_mask) |
| | pooled_output = outputs.last_hidden_state[:, 0, :] # [CLS] token |
| | pooled_output = self.dropout(pooled_output) |
| | logits = self.classifier(pooled_output) |
| | return logits |
| | |
| | # Load model and tokenizer |
| | model_name = "Lekhansh/bc-multilabel-classifier" |
| | tokenizer = AutoTokenizer.from_pretrained(model_name) |
| | |
| | # Initialize model architecture |
| | model = MultiLabelBERTModel(model_name, num_labels=3) |
| | |
| | # Load trained weights |
| | # Note: You'll need to load the weights from the saved model |
| | model.eval() |
| | |
| | # Prepare input |
| | text = "That's a wonderful goal you've set for yourself." |
| | inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=3000) |
| | |
| | # Get predictions |
| | with torch.no_grad(): |
| | logits = model(inputs['input_ids'], inputs['attention_mask']) |
| | probs = torch.sigmoid(logits) |
| | predictions = (probs > 0.5).int() |
| | |
| | # Interpret results |
| | labels = ['adherent', 'non_adherent', 'neutral'] |
| | print(f"Text: {text}") |
| | print("\nPredictions:") |
| | for i, label in enumerate(labels): |
| | if predictions[0][i]: |
| | print(f" ✓ {label} (confidence: {probs[0][i]:.2%})") |
| | ``` |
| |
|
| | ### Batch Prediction with Confidence Scores |
| |
|
| | ```python |
| | def predict_multilabel(texts, model, tokenizer, threshold=0.5): |
| | """ |
| | Predict multiple labels for each text with confidence scores. |
| | |
| | Args: |
| | texts: List of input texts |
| | model: The multilabel classification model |
| | tokenizer: The tokenizer |
| | threshold: Probability threshold for positive prediction (default: 0.5) |
| | |
| | Returns: |
| | List of dicts with predictions and probabilities |
| | """ |
| | inputs = tokenizer( |
| | texts, |
| | return_tensors="pt", |
| | truncation=True, |
| | max_length=3000, |
| | padding=True |
| | ) |
| | |
| | with torch.no_grad(): |
| | logits = model(inputs['input_ids'], inputs['attention_mask']) |
| | probs = torch.sigmoid(logits) |
| | |
| | labels = ['adherent', 'non_adherent', 'neutral'] |
| | results = [] |
| | |
| | for i in range(len(texts)): |
| | predictions = (probs[i] > threshold).int() |
| | result = { |
| | 'text': texts[i], |
| | 'labels': {}, |
| | 'probabilities': {} |
| | } |
| | |
| | for j, label in enumerate(labels): |
| | result['labels'][label] = bool(predictions[j]) |
| | result['probabilities'][label] = float(probs[i][j]) |
| | |
| | results.append(result) |
| | |
| | return results |
| | |
| | # Example usage |
| | utterances = [ |
| | "I hear you saying that you want to change but you're not sure how.", |
| | "You need to stop making excuses and just do it.", |
| | "How many cigarettes do you smoke per day?" |
| | ] |
| | |
| | results = predict_multilabel(utterances, model, tokenizer) |
| | for r in results: |
| | print(f"\nText: {r['text'][:60]}...") |
| | print("Predicted labels:") |
| | for label in ['adherent', 'non_adherent', 'neutral']: |
| | status = "✓" if r['labels'][label] else "✗" |
| | conf = r['probabilities'][label] |
| | print(f" {status} {label}: {conf:.2%}") |
| | ``` |
| |
|
| | ### Custom Threshold Tuning |
| |
|
| | ```python |
| | # Adjust threshold for precision/recall trade-off |
| | def predict_with_custom_threshold(text, model, tokenizer, thresholds): |
| | """ |
| | Predict with different thresholds for each label. |
| | |
| | Args: |
| | thresholds: Dict with keys 'adherent', 'non_adherent', 'neutral' |
| | """ |
| | inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=3000) |
| | |
| | with torch.no_grad(): |
| | logits = model(inputs['input_ids'], inputs['attention_mask']) |
| | probs = torch.sigmoid(logits) |
| | |
| | labels_list = ['adherent', 'non_adherent', 'neutral'] |
| | predictions = {} |
| | |
| | for i, label in enumerate(labels_list): |
| | threshold = thresholds.get(label, 0.5) |
| | predictions[label] = { |
| | 'predicted': bool(probs[0][i] > threshold), |
| | 'probability': float(probs[0][i]), |
| | 'threshold': threshold |
| | } |
| | |
| | return predictions |
| | |
| | # Example: Higher threshold for adherent (higher precision) |
| | custom_thresholds = { |
| | 'adherent': 0.6, |
| | 'non_adherent': 0.5, |
| | 'neutral': 0.5 |
| | } |
| | |
| | result = predict_with_custom_threshold( |
| | "What are your thoughts on reducing your drinking?", |
| | model, |
| | tokenizer, |
| | custom_thresholds |
| | ) |
| | ``` |
| |
|
| |
|
| | ## Limitations and Bias |
| |
|
| | ### Limitations |
| |
|
| | 1. **Domain Specificity:** Trained on Motivational Interviewing data; may not generalize to other therapeutic modalities |
| | 2. **Context Dependency:** Performance may vary with utterances lacking proper conversational context |
| | 3. **Class Imbalance:** Lower performance on "adherent" label due to class imbalance in training data |
| | 4. **Multilabel Complexity:** Some utterances may have ambiguous or overlapping codes |
| | 5. **Context Length:** Maximum 3000 tokens; longer texts will be truncated |
| | 6. **Language:** Trained on English text only |
| |
|
| | ### Potential Biases |
| |
|
| | - Training data may reflect biases from the original coding framework and human coders |
| | - Performance may vary across different MI contexts (e.g., substance use vs. health behavior change) |
| | - Cultural and linguistic variations in therapeutic communication may affect predictions |
| | - The model may be more accurate on populations/contexts similar to training data |
| |
|
| | ### Recommended Use |
| |
|
| | - Use as a screening tool or preliminary analysis, not as definitive behavioral coding |
| | - Validate predictions with human expert review, especially for critical applications |
| | - Consider adjusting prediction thresholds based on your use case (precision vs. recall trade-off) |
| | - Be aware that multilabel predictions may sometimes conflict with clinical judgment |
| |
|
| | ## Technical Specifications |
| |
|
| | ### Model Architecture |
| |
|
| | - **Base:** ModernBERT-base (encoder-only transformer) |
| | - **Custom Head:** Dropout (0.1) + Linear layer (hidden_size → 3 labels) |
| | - **Activation:** Sigmoid (for independent label probabilities) |
| | - **Attention:** Flash Attention 2 implementation |
| | - **Parameters:** ~110M (inherited from base model + classification head) |
| | - **Precision:** BFloat16 |
| | |
| | ### Compute Infrastructure |
| | |
| | - **Training:** Single GPU with CUDA |
| | - **Inference:** CPU or GPU compatible |
| | - **Memory:** ~500MB model size |
| | |
| | ### Label Format |
| | |
| | ```python |
| | # Output format |
| | { |
| | "adherent": 0 or 1, |
| | "non_adherent": 0 or 1, |
| | "neutral": 0 or 1 |
| | } |
| |
|
| | # Example: An utterance can have multiple labels |
| | # "I hear that you're struggling, and I believe you can overcome this." |
| | # → adherent=1, non_adherent=0, neutral=0 |
| | ``` |
| | |
| | ## Environmental Impact |
| | |
| | Training was conducted using mixed precision to optimize resource usage. Exact carbon footprint was not measured. |
| | |
| | ## Citation |
| | |
| | If you use this model in your research, please cite: |
| | |
| | ```bibtex |
| | @misc{lekhansh2025bcmultilabel, |
| | author = {Lekhansh}, |
| | title = {Behavioral Coding Multilabel Classifier for Motivational Interviewing}, |
| | year = {2025}, |
| | publisher = {HuggingFace}, |
| | howpublished = {\url{https://huggingface.co/Lekhansh/bc-multilabel-classifier}} |
| | } |
| | ``` |
| | |
| | ## References |
| | |
| | For more information on Motivational Interviewing behavioral coding: |
| | - Miller, W. R., & Rollnick, S. (2013). *Motivational Interviewing: Helping People Change* (3rd ed.) |
| | - Moyers, T. B., et al. (2016). *Motivational Interviewing Treatment Integrity Coding Manual 4.2.1* |
| | |
| | ## Model Card Authors |
| | |
| | Lekhansh |
| | |
| | ## Model Card Contact |
| | |
| | [drlekhansh@gmail.com] |
| | |