---
language:
- en
license: apache-2.0
tags:
- text-classification
- binary-classification
- behavioral-coding
- modernbert
- transformers
base_model: answerdotai/ModernBERT-base
metrics:
- accuracy
- f1
- precision
- recall
model-index:
- name: bc-not-coded-classifier
  results:
  - task:
      type: text-classification
      name: Binary Text Classification
    metrics:
    - name: Accuracy
      type: accuracy
      value: 0.9642
    - name: F1 (Not Coded)
      type: f1
      value: 0.8584
    - name: Precision (Not Coded)
      type: precision
      value: 0.8742
    - name: Recall (Not Coded)
      type: recall
      value: 0.8431
    - name: F1 Macro
      type: f1_macro
      value: 0.9189
widget:
- text: "I don't understand what you're asking me to do."
- text: "Let me help you with that problem by explaining the steps."
- text: "Okay, I see."
---

# Behavior Coding Not-Coded Classifier

## Model Description

This model is a fine-tuned version of [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) for binary classification of behavioral coding utterances. It identifies whether utterances should be coded or marked as "not_coded" in behavioral analysis workflows.

**Developed by:** Lekhansh

**Model type:** Binary Text Classification

**Language:** English

**Base model:** [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base)

**License:** Apache 2.0

## Intended Uses

### Primary Use Case

This model is designed to automatically filter utterances in behavioral coding tasks, distinguishing between:
- **Coded (Label 0):** Utterances suitable for behavioral code assignment
- **Not Coded (Label 1):** Utterances that should not receive behavioral codes

### Potential Applications

- Pre-filtering in behavioral coding pipelines
- Quality control for behavioral analysis datasets
- Automated utterance classification in conversation analysis
- Research in human behavior and communication patterns

## Model Performance

### Test Set Metrics

The model was evaluated on a held-out test set of 3,713 examples with the following class distribution:
- Coded samples: 3,235 (87.1%)
- Not Coded samples: 478 (12.9%)

| Metric | Score |
|--------|------:|
| **Overall Accuracy** | **96.42%** |
| **F1 (Not Coded)** | **85.84%** |
| **Precision (Not Coded)** | 87.42% |
| **Recall (Not Coded)** | 84.31% |
| **F1 (Coded)** | 97.95% |
| **Precision (Coded)** | 97.69% |
| **Recall (Coded)** | 98.21% |
| **Macro F1** | 91.89% |

### Confusion Matrix

|           | Predicted Coded | Predicted Not Coded |
|-----------|----------------:|--------------------:|
| **Actual Coded** | 3,177 | 58 |
| **Actual Not Coded** | 75 | 403 |

The model shows strong performance on both classes, with particularly high accuracy on the majority class (coded utterances) while maintaining good F1 score (85.84%) on the minority class (not coded utterances).

## Training Details

### Training Data

- Source: Multilabel behavioral coding dataset reframed as binary classification
- Split: 70% train, 15% validation, 15% test (stratified)
- Preprocessing: Stratified splitting to maintain class balance across splits
- Context size: Three preceding utterances.

### Training Procedure

**Hardware:**
- GPU training with CUDA
- Mixed precision (BFloat16) training

**Hyperparameters:**

| Parameter | Value |
|-----------|-------|
| Learning Rate | 6e-5 |
| Batch Size (per device) | 12 |
| Gradient Accumulation | 2 steps |
| Effective Batch Size | 24 |
| Max Sequence Length | 3000 tokens |
| Epochs | 20 (early stopped at epoch 13) |
| Weight Decay | 0.01 |
| Warmup Ratio | 0.1 |
| LR Scheduler | Cosine |
| Optimizer | AdamW |

**Training Features:**
- **Class Weighting:** Balanced weights to address class imbalance (87:13 ratio)
- **Early Stopping:** Patience of 3 epochs on validation F1
- **Gradient Checkpointing:** Enabled for memory efficiency
- **Flash Attention 2:** For efficient attention computation
- **Best Model Selection:** Based on validation F1 score

**Loss Function:** Weighted Cross-Entropy Loss

## Usage

### Direct Use

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
model_name = "lekhansh/bc-not-coded-classifier"  
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Prepare input
text = "Your utterance text here"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=3000)

# Get prediction
with torch.no_grad():
    outputs = model(**inputs)
    prediction = torch.argmax(outputs.logits, dim=-1)

# Interpret result
label = "Not Coded" if prediction.item() == 1 else "Coded"
print(f"Prediction: {label}")
```

### Batch Prediction with Probabilities

```python
def classify_utterances(texts, model, tokenizer):
    """
    Classify multiple utterances with confidence scores.

    Returns:
        List of dicts with predictions and probabilities
    """
    inputs = tokenizer(
        texts,
        return_tensors="pt",
        truncation=True,
        max_length=3000,
        padding=True
    )

    with torch.no_grad():
        outputs = model(**inputs)
        probs = torch.softmax(outputs.logits, dim=-1)
        predictions = torch.argmax(outputs.logits, dim=-1)

    results = []
    for i in range(len(texts)):
        results.append({
            'text': texts[i],
            'label': 'not_coded' if predictions[i].item() == 1 else 'coded',
            'confidence': probs[i][predictions[i]].item(),
            'probabilities': {
                'coded': probs[i][0].item(),
                'not_coded': probs[i][1].item()
            }
        })

    return results

# Example
utterances = [
    "I don't know what to say.",
    "Let me explain the process step by step.",
    "Mmm-hmm."
]

results = classify_utterances(utterances, model, tokenizer)
for r in results:
    print(f"Text: {r['text']}")
    print(f"  Label: {r['label']} (confidence: {r['confidence']:.2%})")
```

### Pipeline Usage

```python
from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="lekhansh/bc-not-coded-classifier",  
    tokenizer="lekhansh/bc-not-coded-classifier"
)

result = classifier("Your utterance here", truncation=True, max_length=3000)
print(result)
# Output: [{'label': 'coded', 'score': 0.98}]
```

## Limitations and Bias

### Limitations

1. **Domain Specificity:** The model is trained on behavioral coding data and may not generalize well to other text classification tasks
2. **Class Imbalance:** Training data has 87% coded vs 13% not coded examples, which may affect performance on datasets with different distributions
3. **Context Length:** Maximum sequence length is 3000 tokens; longer texts will be truncated
4. **Language:** Trained on English text only

### Potential Biases

- The model's performance may vary depending on the specific behavioral coding framework used
- Biases present in the training data may be reflected in predictions
- Performance may differ across different conversation types or domains

## Technical Specifications

### Model Architecture

- **Base:** ModernBERT-base (encoder-only transformer)
- **Classification Head:** Linear layer for binary classification
- **Attention:** Flash Attention 2 implementation
- **Parameters:** ~110M (inherited from base model)
- **Precision:** BFloat16

### Compute Infrastructure

- **Training:** Single GPU with CUDA
- **Inference:** CPU or GPU compatible
- **Memory:** ~500MB model size

## Environmental Impact

Training was conducted using mixed precision to optimize resource usage. Exact carbon footprint was not measured.

## Citation

If you use this model in your research, please cite:

```bibtex
@misc{lekhansh2025bcnotcoded,
  author = {Lekhansh},
  title = {Behavior Coding Not-Coded Classifier},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/lekhansh/bc-not-coded-classifier}}
}
```

## Model Card Authors

Lekhansh

## Model Card Contact

[Your contact information or GitHub profile]