--- language: - en license: apache-2.0 tags: - text-classification - binary-classification - behavioral-coding - modernbert - transformers base_model: answerdotai/ModernBERT-base metrics: - accuracy - f1 - precision - recall model-index: - name: bc-not-coded-classifier results: - task: type: text-classification name: Binary Text Classification metrics: - name: Accuracy type: accuracy value: 0.9642 - name: F1 (Not Coded) type: f1 value: 0.8584 - name: Precision (Not Coded) type: precision value: 0.8742 - name: Recall (Not Coded) type: recall value: 0.8431 - name: F1 Macro type: f1_macro value: 0.9189 widget: - text: "I don't understand what you're asking me to do." - text: "Let me help you with that problem by explaining the steps." - text: "Okay, I see." --- # Behavior Coding Not-Coded Classifier ## Model Description This model is a fine-tuned version of [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) for binary classification of behavioral coding utterances. It identifies whether utterances should be coded or marked as "not_coded" in behavioral analysis workflows. **Developed by:** Lekhansh **Model type:** Binary Text Classification **Language:** English **Base model:** [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) **License:** Apache 2.0 ## Intended Uses ### Primary Use Case This model is designed to automatically filter utterances in behavioral coding tasks, distinguishing between: - **Coded (Label 0):** Utterances suitable for behavioral code assignment - **Not Coded (Label 1):** Utterances that should not receive behavioral codes ### Potential Applications - Pre-filtering in behavioral coding pipelines - Quality control for behavioral analysis datasets - Automated utterance classification in conversation analysis - Research in human behavior and communication patterns ## Model Performance ### Test Set Metrics The model was evaluated on a held-out test set of 3,713 examples with the following class distribution: - Coded samples: 3,235 (87.1%) - Not Coded samples: 478 (12.9%) | Metric | Score | |--------|------:| | **Overall Accuracy** | **96.42%** | | **F1 (Not Coded)** | **85.84%** | | **Precision (Not Coded)** | 87.42% | | **Recall (Not Coded)** | 84.31% | | **F1 (Coded)** | 97.95% | | **Precision (Coded)** | 97.69% | | **Recall (Coded)** | 98.21% | | **Macro F1** | 91.89% | ### Confusion Matrix | | Predicted Coded | Predicted Not Coded | |-----------|----------------:|--------------------:| | **Actual Coded** | 3,177 | 58 | | **Actual Not Coded** | 75 | 403 | The model shows strong performance on both classes, with particularly high accuracy on the majority class (coded utterances) while maintaining good F1 score (85.84%) on the minority class (not coded utterances). ## Training Details ### Training Data - Source: Multilabel behavioral coding dataset reframed as binary classification - Split: 70% train, 15% validation, 15% test (stratified) - Preprocessing: Stratified splitting to maintain class balance across splits - Context size: Three preceding utterances. ### Training Procedure **Hardware:** - GPU training with CUDA - Mixed precision (BFloat16) training **Hyperparameters:** | Parameter | Value | |-----------|-------| | Learning Rate | 6e-5 | | Batch Size (per device) | 12 | | Gradient Accumulation | 2 steps | | Effective Batch Size | 24 | | Max Sequence Length | 3000 tokens | | Epochs | 20 (early stopped at epoch 13) | | Weight Decay | 0.01 | | Warmup Ratio | 0.1 | | LR Scheduler | Cosine | | Optimizer | AdamW | **Training Features:** - **Class Weighting:** Balanced weights to address class imbalance (87:13 ratio) - **Early Stopping:** Patience of 3 epochs on validation F1 - **Gradient Checkpointing:** Enabled for memory efficiency - **Flash Attention 2:** For efficient attention computation - **Best Model Selection:** Based on validation F1 score **Loss Function:** Weighted Cross-Entropy Loss ## Usage ### Direct Use ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch # Load model and tokenizer model_name = "lekhansh/bc-not-coded-classifier" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name) # Prepare input text = "Your utterance text here" inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=3000) # Get prediction with torch.no_grad(): outputs = model(**inputs) prediction = torch.argmax(outputs.logits, dim=-1) # Interpret result label = "Not Coded" if prediction.item() == 1 else "Coded" print(f"Prediction: {label}") ``` ### Batch Prediction with Probabilities ```python def classify_utterances(texts, model, tokenizer): """ Classify multiple utterances with confidence scores. Returns: List of dicts with predictions and probabilities """ inputs = tokenizer( texts, return_tensors="pt", truncation=True, max_length=3000, padding=True ) with torch.no_grad(): outputs = model(**inputs) probs = torch.softmax(outputs.logits, dim=-1) predictions = torch.argmax(outputs.logits, dim=-1) results = [] for i in range(len(texts)): results.append({ 'text': texts[i], 'label': 'not_coded' if predictions[i].item() == 1 else 'coded', 'confidence': probs[i][predictions[i]].item(), 'probabilities': { 'coded': probs[i][0].item(), 'not_coded': probs[i][1].item() } }) return results # Example utterances = [ "I don't know what to say.", "Let me explain the process step by step.", "Mmm-hmm." ] results = classify_utterances(utterances, model, tokenizer) for r in results: print(f"Text: {r['text']}") print(f" Label: {r['label']} (confidence: {r['confidence']:.2%})") ``` ### Pipeline Usage ```python from transformers import pipeline classifier = pipeline( "text-classification", model="lekhansh/bc-not-coded-classifier", tokenizer="lekhansh/bc-not-coded-classifier" ) result = classifier("Your utterance here", truncation=True, max_length=3000) print(result) # Output: [{'label': 'coded', 'score': 0.98}] ``` ## Limitations and Bias ### Limitations 1. **Domain Specificity:** The model is trained on behavioral coding data and may not generalize well to other text classification tasks 2. **Class Imbalance:** Training data has 87% coded vs 13% not coded examples, which may affect performance on datasets with different distributions 3. **Context Length:** Maximum sequence length is 3000 tokens; longer texts will be truncated 4. **Language:** Trained on English text only ### Potential Biases - The model's performance may vary depending on the specific behavioral coding framework used - Biases present in the training data may be reflected in predictions - Performance may differ across different conversation types or domains ## Technical Specifications ### Model Architecture - **Base:** ModernBERT-base (encoder-only transformer) - **Classification Head:** Linear layer for binary classification - **Attention:** Flash Attention 2 implementation - **Parameters:** ~110M (inherited from base model) - **Precision:** BFloat16 ### Compute Infrastructure - **Training:** Single GPU with CUDA - **Inference:** CPU or GPU compatible - **Memory:** ~500MB model size ## Environmental Impact Training was conducted using mixed precision to optimize resource usage. Exact carbon footprint was not measured. ## Citation If you use this model in your research, please cite: ```bibtex @misc{lekhansh2025bcnotcoded, author = {Lekhansh}, title = {Behavior Coding Not-Coded Classifier}, year = {2025}, publisher = {HuggingFace}, howpublished = {\url{https://huggingface.co/lekhansh/bc-not-coded-classifier}} } ``` ## Model Card Authors Lekhansh ## Model Card Contact [Your contact information or GitHub profile]