|
|
--- |
|
|
language: |
|
|
- en |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- text-classification |
|
|
- binary-classification |
|
|
- behavioral-coding |
|
|
- modernbert |
|
|
- transformers |
|
|
base_model: answerdotai/ModernBERT-base |
|
|
metrics: |
|
|
- accuracy |
|
|
- f1 |
|
|
- precision |
|
|
- recall |
|
|
model-index: |
|
|
- name: bc-not-coded-classifier |
|
|
results: |
|
|
- task: |
|
|
type: text-classification |
|
|
name: Binary Text Classification |
|
|
metrics: |
|
|
- name: Accuracy |
|
|
type: accuracy |
|
|
value: 0.9642 |
|
|
- name: F1 (Not Coded) |
|
|
type: f1 |
|
|
value: 0.8584 |
|
|
- name: Precision (Not Coded) |
|
|
type: precision |
|
|
value: 0.8742 |
|
|
- name: Recall (Not Coded) |
|
|
type: recall |
|
|
value: 0.8431 |
|
|
- name: F1 Macro |
|
|
type: f1_macro |
|
|
value: 0.9189 |
|
|
widget: |
|
|
- text: "I don't understand what you're asking me to do." |
|
|
- text: "Let me help you with that problem by explaining the steps." |
|
|
- text: "Okay, I see." |
|
|
--- |
|
|
|
|
|
# Behavior Coding Not-Coded Classifier |
|
|
|
|
|
## Model Description |
|
|
|
|
|
This model is a fine-tuned version of [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) for binary classification of behavioral coding utterances. It identifies whether utterances should be coded or marked as "not_coded" in behavioral analysis workflows. |
|
|
|
|
|
**Developed by:** Lekhansh |
|
|
|
|
|
**Model type:** Binary Text Classification |
|
|
|
|
|
**Language:** English |
|
|
|
|
|
**Base model:** [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) |
|
|
|
|
|
**License:** Apache 2.0 |
|
|
|
|
|
## Intended Uses |
|
|
|
|
|
### Primary Use Case |
|
|
|
|
|
This model is designed to automatically filter utterances in behavioral coding tasks, distinguishing between: |
|
|
- **Coded (Label 0):** Utterances suitable for behavioral code assignment |
|
|
- **Not Coded (Label 1):** Utterances that should not receive behavioral codes |
|
|
|
|
|
### Potential Applications |
|
|
|
|
|
- Pre-filtering in behavioral coding pipelines |
|
|
- Quality control for behavioral analysis datasets |
|
|
- Automated utterance classification in conversation analysis |
|
|
- Research in human behavior and communication patterns |
|
|
|
|
|
## Model Performance |
|
|
|
|
|
### Test Set Metrics |
|
|
|
|
|
The model was evaluated on a held-out test set of 3,713 examples with the following class distribution: |
|
|
- Coded samples: 3,235 (87.1%) |
|
|
- Not Coded samples: 478 (12.9%) |
|
|
|
|
|
| Metric | Score | |
|
|
|--------|------:| |
|
|
| **Overall Accuracy** | **96.42%** | |
|
|
| **F1 (Not Coded)** | **85.84%** | |
|
|
| **Precision (Not Coded)** | 87.42% | |
|
|
| **Recall (Not Coded)** | 84.31% | |
|
|
| **F1 (Coded)** | 97.95% | |
|
|
| **Precision (Coded)** | 97.69% | |
|
|
| **Recall (Coded)** | 98.21% | |
|
|
| **Macro F1** | 91.89% | |
|
|
|
|
|
### Confusion Matrix |
|
|
|
|
|
| | Predicted Coded | Predicted Not Coded | |
|
|
|-----------|----------------:|--------------------:| |
|
|
| **Actual Coded** | 3,177 | 58 | |
|
|
| **Actual Not Coded** | 75 | 403 | |
|
|
|
|
|
The model shows strong performance on both classes, with particularly high accuracy on the majority class (coded utterances) while maintaining good F1 score (85.84%) on the minority class (not coded utterances). |
|
|
|
|
|
## Training Details |
|
|
|
|
|
### Training Data |
|
|
|
|
|
- Source: Multilabel behavioral coding dataset reframed as binary classification |
|
|
- Split: 70% train, 15% validation, 15% test (stratified) |
|
|
- Preprocessing: Stratified splitting to maintain class balance across splits |
|
|
- Context size: Three preceding utterances. |
|
|
|
|
|
### Training Procedure |
|
|
|
|
|
**Hardware:** |
|
|
- GPU training with CUDA |
|
|
- Mixed precision (BFloat16) training |
|
|
|
|
|
**Hyperparameters:** |
|
|
|
|
|
| Parameter | Value | |
|
|
|-----------|-------| |
|
|
| Learning Rate | 6e-5 | |
|
|
| Batch Size (per device) | 12 | |
|
|
| Gradient Accumulation | 2 steps | |
|
|
| Effective Batch Size | 24 | |
|
|
| Max Sequence Length | 3000 tokens | |
|
|
| Epochs | 20 (early stopped at epoch 13) | |
|
|
| Weight Decay | 0.01 | |
|
|
| Warmup Ratio | 0.1 | |
|
|
| LR Scheduler | Cosine | |
|
|
| Optimizer | AdamW | |
|
|
|
|
|
**Training Features:** |
|
|
- **Class Weighting:** Balanced weights to address class imbalance (87:13 ratio) |
|
|
- **Early Stopping:** Patience of 3 epochs on validation F1 |
|
|
- **Gradient Checkpointing:** Enabled for memory efficiency |
|
|
- **Flash Attention 2:** For efficient attention computation |
|
|
- **Best Model Selection:** Based on validation F1 score |
|
|
|
|
|
**Loss Function:** Weighted Cross-Entropy Loss |
|
|
|
|
|
## Usage |
|
|
|
|
|
### Direct Use |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
import torch |
|
|
|
|
|
# Load model and tokenizer |
|
|
model_name = "lekhansh/bc-not-coded-classifier" |
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
model = AutoModelForSequenceClassification.from_pretrained(model_name) |
|
|
|
|
|
# Prepare input |
|
|
text = "Your utterance text here" |
|
|
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=3000) |
|
|
|
|
|
# Get prediction |
|
|
with torch.no_grad(): |
|
|
outputs = model(**inputs) |
|
|
prediction = torch.argmax(outputs.logits, dim=-1) |
|
|
|
|
|
# Interpret result |
|
|
label = "Not Coded" if prediction.item() == 1 else "Coded" |
|
|
print(f"Prediction: {label}") |
|
|
``` |
|
|
|
|
|
### Batch Prediction with Probabilities |
|
|
|
|
|
```python |
|
|
def classify_utterances(texts, model, tokenizer): |
|
|
""" |
|
|
Classify multiple utterances with confidence scores. |
|
|
|
|
|
Returns: |
|
|
List of dicts with predictions and probabilities |
|
|
""" |
|
|
inputs = tokenizer( |
|
|
texts, |
|
|
return_tensors="pt", |
|
|
truncation=True, |
|
|
max_length=3000, |
|
|
padding=True |
|
|
) |
|
|
|
|
|
with torch.no_grad(): |
|
|
outputs = model(**inputs) |
|
|
probs = torch.softmax(outputs.logits, dim=-1) |
|
|
predictions = torch.argmax(outputs.logits, dim=-1) |
|
|
|
|
|
results = [] |
|
|
for i in range(len(texts)): |
|
|
results.append({ |
|
|
'text': texts[i], |
|
|
'label': 'not_coded' if predictions[i].item() == 1 else 'coded', |
|
|
'confidence': probs[i][predictions[i]].item(), |
|
|
'probabilities': { |
|
|
'coded': probs[i][0].item(), |
|
|
'not_coded': probs[i][1].item() |
|
|
} |
|
|
}) |
|
|
|
|
|
return results |
|
|
|
|
|
# Example |
|
|
utterances = [ |
|
|
"I don't know what to say.", |
|
|
"Let me explain the process step by step.", |
|
|
"Mmm-hmm." |
|
|
] |
|
|
|
|
|
results = classify_utterances(utterances, model, tokenizer) |
|
|
for r in results: |
|
|
print(f"Text: {r['text']}") |
|
|
print(f" Label: {r['label']} (confidence: {r['confidence']:.2%})") |
|
|
``` |
|
|
|
|
|
### Pipeline Usage |
|
|
|
|
|
```python |
|
|
from transformers import pipeline |
|
|
|
|
|
classifier = pipeline( |
|
|
"text-classification", |
|
|
model="lekhansh/bc-not-coded-classifier", |
|
|
tokenizer="lekhansh/bc-not-coded-classifier" |
|
|
) |
|
|
|
|
|
result = classifier("Your utterance here", truncation=True, max_length=3000) |
|
|
print(result) |
|
|
# Output: [{'label': 'coded', 'score': 0.98}] |
|
|
``` |
|
|
|
|
|
## Limitations and Bias |
|
|
|
|
|
### Limitations |
|
|
|
|
|
1. **Domain Specificity:** The model is trained on behavioral coding data and may not generalize well to other text classification tasks |
|
|
2. **Class Imbalance:** Training data has 87% coded vs 13% not coded examples, which may affect performance on datasets with different distributions |
|
|
3. **Context Length:** Maximum sequence length is 3000 tokens; longer texts will be truncated |
|
|
4. **Language:** Trained on English text only |
|
|
|
|
|
### Potential Biases |
|
|
|
|
|
- The model's performance may vary depending on the specific behavioral coding framework used |
|
|
- Biases present in the training data may be reflected in predictions |
|
|
- Performance may differ across different conversation types or domains |
|
|
|
|
|
## Technical Specifications |
|
|
|
|
|
### Model Architecture |
|
|
|
|
|
- **Base:** ModernBERT-base (encoder-only transformer) |
|
|
- **Classification Head:** Linear layer for binary classification |
|
|
- **Attention:** Flash Attention 2 implementation |
|
|
- **Parameters:** ~110M (inherited from base model) |
|
|
- **Precision:** BFloat16 |
|
|
|
|
|
### Compute Infrastructure |
|
|
|
|
|
- **Training:** Single GPU with CUDA |
|
|
- **Inference:** CPU or GPU compatible |
|
|
- **Memory:** ~500MB model size |
|
|
|
|
|
## Environmental Impact |
|
|
|
|
|
Training was conducted using mixed precision to optimize resource usage. Exact carbon footprint was not measured. |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model in your research, please cite: |
|
|
|
|
|
```bibtex |
|
|
@misc{lekhansh2025bcnotcoded, |
|
|
author = {Lekhansh}, |
|
|
title = {Behavior Coding Not-Coded Classifier}, |
|
|
year = {2025}, |
|
|
publisher = {HuggingFace}, |
|
|
howpublished = {\url{https://huggingface.co/lekhansh/bc-not-coded-classifier}} |
|
|
} |
|
|
``` |
|
|
|
|
|
## Model Card Authors |
|
|
|
|
|
Lekhansh |
|
|
|
|
|
## Model Card Contact |
|
|
|
|
|
[Your contact information or GitHub profile] |
|
|
|