Upload folder using huggingface_hub

d336f78 verified about 2 months ago

8.15 kB

	---
	language:
	- en
	license: apache-2.0
	tags:
	- text-classification
	- binary-classification
	- behavioral-coding
	- modernbert
	- transformers
	base_model: answerdotai/ModernBERT-base
	metrics:
	- accuracy
	- f1
	- precision
	- recall
	model-index:
	- name: bc-not-coded-classifier
	results:
	- task:
	type: text-classification
	name: Binary Text Classification
	metrics:
	- name: Accuracy
	type: accuracy
	value: 0.9642
	- name: F1 (Not Coded)
	type: f1
	value: 0.8584
	- name: Precision (Not Coded)
	type: precision
	value: 0.8742
	- name: Recall (Not Coded)
	type: recall
	value: 0.8431
	- name: F1 Macro
	type: f1_macro
	value: 0.9189
	widget:
	- text: "I don't understand what you're asking me to do."
	- text: "Let me help you with that problem by explaining the steps."
	- text: "Okay, I see."
	---

	# Behavior Coding Not-Coded Classifier

	## Model Description

	This model is a fine-tuned version of [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) for binary classification of behavioral coding utterances. It identifies whether utterances should be coded or marked as "not_coded" in behavioral analysis workflows.

	Developed by: Lekhansh

	Model type: Binary Text Classification

	Language: English

	Base model: [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base)

	License: Apache 2.0

	## Intended Uses

	### Primary Use Case

	This model is designed to automatically filter utterances in behavioral coding tasks, distinguishing between:
	- Coded (Label 0): Utterances suitable for behavioral code assignment
	- Not Coded (Label 1): Utterances that should not receive behavioral codes

	### Potential Applications

	- Pre-filtering in behavioral coding pipelines
	- Quality control for behavioral analysis datasets
	- Automated utterance classification in conversation analysis
	- Research in human behavior and communication patterns

	## Model Performance

	### Test Set Metrics

	The model was evaluated on a held-out test set of 3,713 examples with the following class distribution:
	- Coded samples: 3,235 (87.1%)
	- Not Coded samples: 478 (12.9%)

	\| Metric \| Score \|
	\|--------\|------:\|
	\| Overall Accuracy \| 96.42% \|
	\| F1 (Not Coded) \| 85.84% \|
	\| Precision (Not Coded) \| 87.42% \|
	\| Recall (Not Coded) \| 84.31% \|
	\| F1 (Coded) \| 97.95% \|
	\| Precision (Coded) \| 97.69% \|
	\| Recall (Coded) \| 98.21% \|
	\| Macro F1 \| 91.89% \|

	### Confusion Matrix

	\| \| Predicted Coded \| Predicted Not Coded \|
	\|-----------\|----------------:\|--------------------:\|
	\| Actual Coded \| 3,177 \| 58 \|
	\| Actual Not Coded \| 75 \| 403 \|

	The model shows strong performance on both classes, with particularly high accuracy on the majority class (coded utterances) while maintaining good F1 score (85.84%) on the minority class (not coded utterances).

	## Training Details

	### Training Data

	- Source: Multilabel behavioral coding dataset reframed as binary classification
	- Split: 70% train, 15% validation, 15% test (stratified)
	- Preprocessing: Stratified splitting to maintain class balance across splits
	- Context size: Three preceding utterances.

	### Training Procedure

	Hardware:
	- GPU training with CUDA
	- Mixed precision (BFloat16) training

	Hyperparameters:

	\| Parameter \| Value \|
	\|-----------\|-------\|
	\| Learning Rate \| 6e-5 \|
	\| Batch Size (per device) \| 12 \|
	\| Gradient Accumulation \| 2 steps \|
	\| Effective Batch Size \| 24 \|
	\| Max Sequence Length \| 3000 tokens \|
	\| Epochs \| 20 (early stopped at epoch 13) \|
	\| Weight Decay \| 0.01 \|
	\| Warmup Ratio \| 0.1 \|
	\| LR Scheduler \| Cosine \|
	\| Optimizer \| AdamW \|

	Training Features:
	- Class Weighting: Balanced weights to address class imbalance (87:13 ratio)
	- Early Stopping: Patience of 3 epochs on validation F1
	- Gradient Checkpointing: Enabled for memory efficiency
	- Flash Attention 2: For efficient attention computation
	- Best Model Selection: Based on validation F1 score

	Loss Function: Weighted Cross-Entropy Loss

	## Usage

	### Direct Use

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	import torch

	# Load model and tokenizer
	model_name = "lekhansh/bc-not-coded-classifier"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForSequenceClassification.from_pretrained(model_name)

	# Prepare input
	text = "Your utterance text here"
	inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=3000)

	# Get prediction
	with torch.no_grad():
	outputs = model(**inputs)
	prediction = torch.argmax(outputs.logits, dim=-1)

	# Interpret result
	label = "Not Coded" if prediction.item() == 1 else "Coded"
	print(f"Prediction: {label}")
	```

	### Batch Prediction with Probabilities

	```python
	def classify_utterances(texts, model, tokenizer):
	"""
	Classify multiple utterances with confidence scores.

	Returns:
	List of dicts with predictions and probabilities
	"""
	inputs = tokenizer(
	texts,
	return_tensors="pt",
	truncation=True,
	max_length=3000,
	padding=True
	)

	with torch.no_grad():
	outputs = model(**inputs)
	probs = torch.softmax(outputs.logits, dim=-1)
	predictions = torch.argmax(outputs.logits, dim=-1)

	results = []
	for i in range(len(texts)):
	results.append({
	'text': texts[i],
	'label': 'not_coded' if predictions[i].item() == 1 else 'coded',
	'confidence': probs[i][predictions[i]].item(),
	'probabilities': {
	'coded': probs[i][0].item(),
	'not_coded': probs[i][1].item()
	}
	})

	return results

	# Example
	utterances = [
	"I don't know what to say.",
	"Let me explain the process step by step.",
	"Mmm-hmm."
	]

	results = classify_utterances(utterances, model, tokenizer)
	for r in results:
	print(f"Text: {r['text']}")
	print(f" Label: {r['label']} (confidence: {r['confidence']:.2%})")
	```

	### Pipeline Usage

	```python
	from transformers import pipeline

	classifier = pipeline(
	"text-classification",
	model="lekhansh/bc-not-coded-classifier",
	tokenizer="lekhansh/bc-not-coded-classifier"
	)

	result = classifier("Your utterance here", truncation=True, max_length=3000)
	print(result)
	# Output: [{'label': 'coded', 'score': 0.98}]
	```

	## Limitations and Bias

	### Limitations

	1. Domain Specificity: The model is trained on behavioral coding data and may not generalize well to other text classification tasks
	2. Class Imbalance: Training data has 87% coded vs 13% not coded examples, which may affect performance on datasets with different distributions
	3. Context Length: Maximum sequence length is 3000 tokens; longer texts will be truncated
	4. Language: Trained on English text only

	### Potential Biases

	- The model's performance may vary depending on the specific behavioral coding framework used
	- Biases present in the training data may be reflected in predictions
	- Performance may differ across different conversation types or domains

	## Technical Specifications

	### Model Architecture

	- Base: ModernBERT-base (encoder-only transformer)
	- Classification Head: Linear layer for binary classification
	- Attention: Flash Attention 2 implementation
	- Parameters: ~110M (inherited from base model)
	- Precision: BFloat16

	### Compute Infrastructure

	- Training: Single GPU with CUDA
	- Inference: CPU or GPU compatible
	- Memory: ~500MB model size

	## Environmental Impact

	Training was conducted using mixed precision to optimize resource usage. Exact carbon footprint was not measured.

	## Citation

	If you use this model in your research, please cite:

	```bibtex
	@misc{lekhansh2025bcnotcoded,
	author = {Lekhansh},
	title = {Behavior Coding Not-Coded Classifier},
	year = {2025},
	publisher = {HuggingFace},
	howpublished = {\url{https://huggingface.co/lekhansh/bc-not-coded-classifier}}
	}
	```

	## Model Card Authors

	Lekhansh

	## Model Card Contact

	[Your contact information or GitHub profile]