Lekhansh's picture
Upload folder using huggingface_hub
d336f78 verified
---
language:
- en
license: apache-2.0
tags:
- text-classification
- binary-classification
- behavioral-coding
- modernbert
- transformers
base_model: answerdotai/ModernBERT-base
metrics:
- accuracy
- f1
- precision
- recall
model-index:
- name: bc-not-coded-classifier
results:
- task:
type: text-classification
name: Binary Text Classification
metrics:
- name: Accuracy
type: accuracy
value: 0.9642
- name: F1 (Not Coded)
type: f1
value: 0.8584
- name: Precision (Not Coded)
type: precision
value: 0.8742
- name: Recall (Not Coded)
type: recall
value: 0.8431
- name: F1 Macro
type: f1_macro
value: 0.9189
widget:
- text: "I don't understand what you're asking me to do."
- text: "Let me help you with that problem by explaining the steps."
- text: "Okay, I see."
---
# Behavior Coding Not-Coded Classifier
## Model Description
This model is a fine-tuned version of [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) for binary classification of behavioral coding utterances. It identifies whether utterances should be coded or marked as "not_coded" in behavioral analysis workflows.
**Developed by:** Lekhansh
**Model type:** Binary Text Classification
**Language:** English
**Base model:** [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base)
**License:** Apache 2.0
## Intended Uses
### Primary Use Case
This model is designed to automatically filter utterances in behavioral coding tasks, distinguishing between:
- **Coded (Label 0):** Utterances suitable for behavioral code assignment
- **Not Coded (Label 1):** Utterances that should not receive behavioral codes
### Potential Applications
- Pre-filtering in behavioral coding pipelines
- Quality control for behavioral analysis datasets
- Automated utterance classification in conversation analysis
- Research in human behavior and communication patterns
## Model Performance
### Test Set Metrics
The model was evaluated on a held-out test set of 3,713 examples with the following class distribution:
- Coded samples: 3,235 (87.1%)
- Not Coded samples: 478 (12.9%)
| Metric | Score |
|--------|------:|
| **Overall Accuracy** | **96.42%** |
| **F1 (Not Coded)** | **85.84%** |
| **Precision (Not Coded)** | 87.42% |
| **Recall (Not Coded)** | 84.31% |
| **F1 (Coded)** | 97.95% |
| **Precision (Coded)** | 97.69% |
| **Recall (Coded)** | 98.21% |
| **Macro F1** | 91.89% |
### Confusion Matrix
| | Predicted Coded | Predicted Not Coded |
|-----------|----------------:|--------------------:|
| **Actual Coded** | 3,177 | 58 |
| **Actual Not Coded** | 75 | 403 |
The model shows strong performance on both classes, with particularly high accuracy on the majority class (coded utterances) while maintaining good F1 score (85.84%) on the minority class (not coded utterances).
## Training Details
### Training Data
- Source: Multilabel behavioral coding dataset reframed as binary classification
- Split: 70% train, 15% validation, 15% test (stratified)
- Preprocessing: Stratified splitting to maintain class balance across splits
- Context size: Three preceding utterances.
### Training Procedure
**Hardware:**
- GPU training with CUDA
- Mixed precision (BFloat16) training
**Hyperparameters:**
| Parameter | Value |
|-----------|-------|
| Learning Rate | 6e-5 |
| Batch Size (per device) | 12 |
| Gradient Accumulation | 2 steps |
| Effective Batch Size | 24 |
| Max Sequence Length | 3000 tokens |
| Epochs | 20 (early stopped at epoch 13) |
| Weight Decay | 0.01 |
| Warmup Ratio | 0.1 |
| LR Scheduler | Cosine |
| Optimizer | AdamW |
**Training Features:**
- **Class Weighting:** Balanced weights to address class imbalance (87:13 ratio)
- **Early Stopping:** Patience of 3 epochs on validation F1
- **Gradient Checkpointing:** Enabled for memory efficiency
- **Flash Attention 2:** For efficient attention computation
- **Best Model Selection:** Based on validation F1 score
**Loss Function:** Weighted Cross-Entropy Loss
## Usage
### Direct Use
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load model and tokenizer
model_name = "lekhansh/bc-not-coded-classifier"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# Prepare input
text = "Your utterance text here"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=3000)
# Get prediction
with torch.no_grad():
outputs = model(**inputs)
prediction = torch.argmax(outputs.logits, dim=-1)
# Interpret result
label = "Not Coded" if prediction.item() == 1 else "Coded"
print(f"Prediction: {label}")
```
### Batch Prediction with Probabilities
```python
def classify_utterances(texts, model, tokenizer):
"""
Classify multiple utterances with confidence scores.
Returns:
List of dicts with predictions and probabilities
"""
inputs = tokenizer(
texts,
return_tensors="pt",
truncation=True,
max_length=3000,
padding=True
)
with torch.no_grad():
outputs = model(**inputs)
probs = torch.softmax(outputs.logits, dim=-1)
predictions = torch.argmax(outputs.logits, dim=-1)
results = []
for i in range(len(texts)):
results.append({
'text': texts[i],
'label': 'not_coded' if predictions[i].item() == 1 else 'coded',
'confidence': probs[i][predictions[i]].item(),
'probabilities': {
'coded': probs[i][0].item(),
'not_coded': probs[i][1].item()
}
})
return results
# Example
utterances = [
"I don't know what to say.",
"Let me explain the process step by step.",
"Mmm-hmm."
]
results = classify_utterances(utterances, model, tokenizer)
for r in results:
print(f"Text: {r['text']}")
print(f" Label: {r['label']} (confidence: {r['confidence']:.2%})")
```
### Pipeline Usage
```python
from transformers import pipeline
classifier = pipeline(
"text-classification",
model="lekhansh/bc-not-coded-classifier",
tokenizer="lekhansh/bc-not-coded-classifier"
)
result = classifier("Your utterance here", truncation=True, max_length=3000)
print(result)
# Output: [{'label': 'coded', 'score': 0.98}]
```
## Limitations and Bias
### Limitations
1. **Domain Specificity:** The model is trained on behavioral coding data and may not generalize well to other text classification tasks
2. **Class Imbalance:** Training data has 87% coded vs 13% not coded examples, which may affect performance on datasets with different distributions
3. **Context Length:** Maximum sequence length is 3000 tokens; longer texts will be truncated
4. **Language:** Trained on English text only
### Potential Biases
- The model's performance may vary depending on the specific behavioral coding framework used
- Biases present in the training data may be reflected in predictions
- Performance may differ across different conversation types or domains
## Technical Specifications
### Model Architecture
- **Base:** ModernBERT-base (encoder-only transformer)
- **Classification Head:** Linear layer for binary classification
- **Attention:** Flash Attention 2 implementation
- **Parameters:** ~110M (inherited from base model)
- **Precision:** BFloat16
### Compute Infrastructure
- **Training:** Single GPU with CUDA
- **Inference:** CPU or GPU compatible
- **Memory:** ~500MB model size
## Environmental Impact
Training was conducted using mixed precision to optimize resource usage. Exact carbon footprint was not measured.
## Citation
If you use this model in your research, please cite:
```bibtex
@misc{lekhansh2025bcnotcoded,
author = {Lekhansh},
title = {Behavior Coding Not-Coded Classifier},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/lekhansh/bc-not-coded-classifier}}
}
```
## Model Card Authors
Lekhansh
## Model Card Contact
[Your contact information or GitHub profile]