File size: 8,149 Bytes
d336f78
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
---
language:
- en
license: apache-2.0
tags:
- text-classification
- binary-classification
- behavioral-coding
- modernbert
- transformers
base_model: answerdotai/ModernBERT-base
metrics:
- accuracy
- f1
- precision
- recall
model-index:
- name: bc-not-coded-classifier
  results:
  - task:
      type: text-classification
      name: Binary Text Classification
    metrics:
    - name: Accuracy
      type: accuracy
      value: 0.9642
    - name: F1 (Not Coded)
      type: f1
      value: 0.8584
    - name: Precision (Not Coded)
      type: precision
      value: 0.8742
    - name: Recall (Not Coded)
      type: recall
      value: 0.8431
    - name: F1 Macro
      type: f1_macro
      value: 0.9189
widget:
- text: "I don't understand what you're asking me to do."
- text: "Let me help you with that problem by explaining the steps."
- text: "Okay, I see."
---

# Behavior Coding Not-Coded Classifier

## Model Description

This model is a fine-tuned version of [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) for binary classification of behavioral coding utterances. It identifies whether utterances should be coded or marked as "not_coded" in behavioral analysis workflows.

**Developed by:** Lekhansh

**Model type:** Binary Text Classification

**Language:** English

**Base model:** [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base)

**License:** Apache 2.0

## Intended Uses

### Primary Use Case

This model is designed to automatically filter utterances in behavioral coding tasks, distinguishing between:
- **Coded (Label 0):** Utterances suitable for behavioral code assignment
- **Not Coded (Label 1):** Utterances that should not receive behavioral codes

### Potential Applications

- Pre-filtering in behavioral coding pipelines
- Quality control for behavioral analysis datasets
- Automated utterance classification in conversation analysis
- Research in human behavior and communication patterns

## Model Performance

### Test Set Metrics

The model was evaluated on a held-out test set of 3,713 examples with the following class distribution:
- Coded samples: 3,235 (87.1%)
- Not Coded samples: 478 (12.9%)

| Metric | Score |
|--------|------:|
| **Overall Accuracy** | **96.42%** |
| **F1 (Not Coded)** | **85.84%** |
| **Precision (Not Coded)** | 87.42% |
| **Recall (Not Coded)** | 84.31% |
| **F1 (Coded)** | 97.95% |
| **Precision (Coded)** | 97.69% |
| **Recall (Coded)** | 98.21% |
| **Macro F1** | 91.89% |

### Confusion Matrix

|           | Predicted Coded | Predicted Not Coded |
|-----------|----------------:|--------------------:|
| **Actual Coded** | 3,177 | 58 |
| **Actual Not Coded** | 75 | 403 |

The model shows strong performance on both classes, with particularly high accuracy on the majority class (coded utterances) while maintaining good F1 score (85.84%) on the minority class (not coded utterances).

## Training Details

### Training Data

- Source: Multilabel behavioral coding dataset reframed as binary classification
- Split: 70% train, 15% validation, 15% test (stratified)
- Preprocessing: Stratified splitting to maintain class balance across splits
- Context size: Three preceding utterances.

### Training Procedure

**Hardware:**
- GPU training with CUDA
- Mixed precision (BFloat16) training

**Hyperparameters:**

| Parameter | Value |
|-----------|-------|
| Learning Rate | 6e-5 |
| Batch Size (per device) | 12 |
| Gradient Accumulation | 2 steps |
| Effective Batch Size | 24 |
| Max Sequence Length | 3000 tokens |
| Epochs | 20 (early stopped at epoch 13) |
| Weight Decay | 0.01 |
| Warmup Ratio | 0.1 |
| LR Scheduler | Cosine |
| Optimizer | AdamW |

**Training Features:**
- **Class Weighting:** Balanced weights to address class imbalance (87:13 ratio)
- **Early Stopping:** Patience of 3 epochs on validation F1
- **Gradient Checkpointing:** Enabled for memory efficiency
- **Flash Attention 2:** For efficient attention computation
- **Best Model Selection:** Based on validation F1 score

**Loss Function:** Weighted Cross-Entropy Loss

## Usage

### Direct Use

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
model_name = "lekhansh/bc-not-coded-classifier"  
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Prepare input
text = "Your utterance text here"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=3000)

# Get prediction
with torch.no_grad():
    outputs = model(**inputs)
    prediction = torch.argmax(outputs.logits, dim=-1)

# Interpret result
label = "Not Coded" if prediction.item() == 1 else "Coded"
print(f"Prediction: {label}")
```

### Batch Prediction with Probabilities

```python
def classify_utterances(texts, model, tokenizer):
    """
    Classify multiple utterances with confidence scores.

    Returns:
        List of dicts with predictions and probabilities
    """
    inputs = tokenizer(
        texts,
        return_tensors="pt",
        truncation=True,
        max_length=3000,
        padding=True
    )

    with torch.no_grad():
        outputs = model(**inputs)
        probs = torch.softmax(outputs.logits, dim=-1)
        predictions = torch.argmax(outputs.logits, dim=-1)

    results = []
    for i in range(len(texts)):
        results.append({
            'text': texts[i],
            'label': 'not_coded' if predictions[i].item() == 1 else 'coded',
            'confidence': probs[i][predictions[i]].item(),
            'probabilities': {
                'coded': probs[i][0].item(),
                'not_coded': probs[i][1].item()
            }
        })

    return results

# Example
utterances = [
    "I don't know what to say.",
    "Let me explain the process step by step.",
    "Mmm-hmm."
]

results = classify_utterances(utterances, model, tokenizer)
for r in results:
    print(f"Text: {r['text']}")
    print(f"  Label: {r['label']} (confidence: {r['confidence']:.2%})")
```

### Pipeline Usage

```python
from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="lekhansh/bc-not-coded-classifier",  
    tokenizer="lekhansh/bc-not-coded-classifier"
)

result = classifier("Your utterance here", truncation=True, max_length=3000)
print(result)
# Output: [{'label': 'coded', 'score': 0.98}]
```

## Limitations and Bias

### Limitations

1. **Domain Specificity:** The model is trained on behavioral coding data and may not generalize well to other text classification tasks
2. **Class Imbalance:** Training data has 87% coded vs 13% not coded examples, which may affect performance on datasets with different distributions
3. **Context Length:** Maximum sequence length is 3000 tokens; longer texts will be truncated
4. **Language:** Trained on English text only

### Potential Biases

- The model's performance may vary depending on the specific behavioral coding framework used
- Biases present in the training data may be reflected in predictions
- Performance may differ across different conversation types or domains

## Technical Specifications

### Model Architecture

- **Base:** ModernBERT-base (encoder-only transformer)
- **Classification Head:** Linear layer for binary classification
- **Attention:** Flash Attention 2 implementation
- **Parameters:** ~110M (inherited from base model)
- **Precision:** BFloat16

### Compute Infrastructure

- **Training:** Single GPU with CUDA
- **Inference:** CPU or GPU compatible
- **Memory:** ~500MB model size

## Environmental Impact

Training was conducted using mixed precision to optimize resource usage. Exact carbon footprint was not measured.

## Citation

If you use this model in your research, please cite:

```bibtex
@misc{lekhansh2025bcnotcoded,
  author = {Lekhansh},
  title = {Behavior Coding Not-Coded Classifier},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/lekhansh/bc-not-coded-classifier}}
}
```

## Model Card Authors

Lekhansh

## Model Card Contact

[Your contact information or GitHub profile]