Upload folder using huggingface_hub

Browse files

Files changed (7) hide show

README.md +434 -0
metrics.json +56 -0
model.safetensors +3 -0
special_tokens_map.json +37 -0
tokenizer.json +0 -0
tokenizer_config.json +945 -0
training_args.bin +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,434 @@

+---
+language:
+- en
+license: apache-2.0
+tags:
+- text-classification
+- multilabel-classification
+- behavioral-coding
+- motivational-interviewing
+- modernbert
+- transformers
+base_model: answerdotai/ModernBERT-base
+metrics:
+- f1
+- precision
+- recall
+- exact_match
+- hamming_loss
+model-index:
+- name: bc-multilabel-classifier
+  results:
+  - task:
+      type: text-classification
+      name: Multilabel Text Classification
+    metrics:
+    - name: Exact Match
+      type: exact_match
+      value: 0.8563
+    - name: Hamming Loss
+      type: hamming_loss
+      value: 0.0579
+    - name: F1 Macro
+      type: f1_macro
+      value: 0.8666
+    - name: F1 Micro
+      type: f1_micro
+      value: 0.9246
+    - name: Adherent F1
+      type: f1
+      value: 0.7429
+    - name: Non-Adherent F1
+      type: f1
+      value: 0.8932
+    - name: Neutral F1
+      type: f1
+      value: 0.9639
+widget:
+- text: "That's a great step you're taking to improve your health."
+- text: "You really should stop smoking, it's bad for you."
+- text: "What do you think about trying to quit?"
+---
+# Behavioral Coding Multilabel Classifier
+## Model Description
+This model is a fine-tuned version of [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) for multilabel classification of Motivational Interviewing (MI) behavioral codes. It classifies utterances into three non-mutually-exclusive categories used in behavioral coding of therapeutic conversations.
+**Developed by:** Lekhansh
+**Model type:** Multilabel Text Classification
+**Language:** English
+**Base model:** [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base)
+**License:** Apache 2.0
+## Intended Uses
+### Primary Use Case
+This model is designed for automated behavioral coding in Motivational Interviewing contexts, predicting three types of MI-consistent and MI-inconsistent behaviors:
+- **Adherent:** MI-adherent behaviors (e.g., affirmations, seek collaboration)
+- **Non-Adherent:** MI-non-adherent behaviors (e.g., confrontation, persuade without permission)
+- **Neutral:** Neutral behaviors (e.g., giving information, questions, reflections)
+### Key Features
+- **Multilabel Classification:** Utterances can have multiple labels simultaneously
+- **Therapeutic Context:** Specifically trained on Motivational Interviewing conversations
+- **Context-Aware:** Includes three preceding utterances for context
+### Potential Applications
+- Automated analysis of therapy session transcripts
+- Training and feedback for MI practitioners
+- Quality assurance in behavioral health interventions
+- Research in therapeutic communication patterns
+## Model Performance
+### Test Set Metrics
+The model was evaluated on a held-out test set of 3,235 coded utterances.
+#### Overall Performance
+| Metric | Score |
+|--------|------:|
+| **Exact Match Accuracy** | **85.63%** |
+| **Hamming Loss** | **0.0579** |
+| **F1 Macro** | **86.66%** |
+| **F1 Micro** | **92.46%** |
+| **Precision Macro** | 86.53% |
+| **Precision Micro** | 93.47% |
+| **Recall Macro** | 86.84% |
+| **Recall Micro** | 91.48% |
+**Exact Match:** Percentage of examples where all labels are predicted correctly
+**Hamming Loss:** Average fraction of labels that are incorrectly predicted (lower is better)
+#### Per-Label Performance
+| Label | F1 Score | Precision | Recall | Accuracy |
+|-------|----------|-----------|--------|----------|
+| **Adherent** | 74.29% | 74.47% | 74.10% | 90.26% |
+| **Non-Adherent** | 89.32% | 87.34% | 91.39% | 98.98% |
+| **Neutral** | 96.39% | 97.77% | 95.04% | 93.38% |
+### Class Distribution
+The training data exhibits class imbalance, addressed through positive class weighting:
+- **Neutral:** Most common (majority class)
+- **Non-Adherent:** Moderate frequency
+- **Adherent:** Least common (minority class)
+## Training Details
+### Training Data
+- **Source:** Multilabel behavioral coding dataset from Motivational Interviewing transcripts
+- **Preprocessing:**
+  - Excluded utterances marked as "not_coded" (no MI codes assigned)
+  - Included context from three preceding utterances
+  - Stratified splitting to maintain label distribution
+- **Split:** 70% train, 15% validation, 15% test
+### Training Procedure
+**Hardware:**
+- GPU training with CUDA
+- Mixed precision (BFloat16) training
+**Hyperparameters:**
+| Parameter | Value |
+|-----------|-------|
+| Learning Rate | 6e-5 |
+| Batch Size (per device) | 12 |
+| Gradient Accumulation | 2 steps |
+| Effective Batch Size | 24 |
+| Max Sequence Length | 3000 tokens |
+| Epochs | 20 (early stopped at epoch 14) |
+| Weight Decay | 0.01 |
+| Warmup Ratio | 0.1 |
+| LR Scheduler | Cosine |
+| Optimizer | AdamW |
+| Dropout | 0.1 |
+**Training Features:**
+- **Positive Class Weighting:** BCEWithLogitsLoss with computed pos_weights for each label
+- **Early Stopping:** Patience of 3 epochs on validation F1 macro
+- **Gradient Checkpointing:** Enabled for memory efficiency
+- **Flash Attention 2:** For efficient attention computation
+- **Best Model Selection:** Based on validation F1 macro score
+**Loss Function:** Binary Cross-Entropy with Logits Loss (BCEWithLogitsLoss) with per-label positive class weights
+### Model Architecture
+The model uses a custom architecture on top of ModernBERT:
+```
+ModernBERT-base (encoder)
+  → [CLS] token extraction
+  → Dropout (0.1)
+  → Linear layer (hidden_size → 3)
+  → Sigmoid activation (applied during inference)
+```
+## Usage
+### Direct Use
+```python
+import torch
+from transformers import AutoTokenizer, AutoModel
+import torch.nn as nn
+# Define the model class
+class MultiLabelBERTModel(nn.Module):
+    def __init__(self, model_name, num_labels=3, dropout=0.1):
+        super().__init__()
+        self.bert = AutoModel.from_pretrained(model_name)
+        self.dropout = nn.Dropout(dropout)
+        self.classifier = nn.Linear(self.bert.config.hidden_size, num_labels)
+        self.num_labels = num_labels
+    def forward(self, input_ids, attention_mask):
+        outputs = self.bert(input_ids=input_ids, attention_mask=attention_mask)
+        pooled_output = outputs.last_hidden_state[:, 0, :]  # [CLS] token
+        pooled_output = self.dropout(pooled_output)
+        logits = self.classifier(pooled_output)
+        return logits
+# Load model and tokenizer
+model_name = "Lekhansh/bc-multilabel-classifier"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+# Initialize model architecture
+model = MultiLabelBERTModel(model_name, num_labels=3)
+# Load trained weights
+# Note: You'll need to load the weights from the saved model
+model.eval()
+# Prepare input
+text = "That's a wonderful goal you've set for yourself."
+inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=3000)
+# Get predictions
+with torch.no_grad():
+    logits = model(inputs['input_ids'], inputs['attention_mask'])
+    probs = torch.sigmoid(logits)
+    predictions = (probs > 0.5).int()
+# Interpret results
+labels = ['adherent', 'non_adherent', 'neutral']
+print(f"Text: {text}")
+print("\nPredictions:")
+for i, label in enumerate(labels):
+    if predictions[0][i]:
+        print(f"  ✓ {label} (confidence: {probs[0][i]:.2%})")
+```
+### Batch Prediction with Confidence Scores
+```python
+def predict_multilabel(texts, model, tokenizer, threshold=0.5):
+    """
+    Predict multiple labels for each text with confidence scores.
+    Args:
+        texts: List of input texts
+        model: The multilabel classification model
+        tokenizer: The tokenizer
+        threshold: Probability threshold for positive prediction (default: 0.5)
+    Returns:
+        List of dicts with predictions and probabilities
+    """
+    inputs = tokenizer(
+        texts,
+        return_tensors="pt",
+        truncation=True,
+        max_length=3000,
+        padding=True
+    )
+    with torch.no_grad():
+        logits = model(inputs['input_ids'], inputs['attention_mask'])
+        probs = torch.sigmoid(logits)
+    labels = ['adherent', 'non_adherent', 'neutral']
+    results = []
+    for i in range(len(texts)):
+        predictions = (probs[i] > threshold).int()
+        result = {
+            'text': texts[i],
+            'labels': {},
+            'probabilities': {}
+        }
+        for j, label in enumerate(labels):
+            result['labels'][label] = bool(predictions[j])
+            result['probabilities'][label] = float(probs[i][j])
+        results.append(result)
+    return results
+# Example usage
+utterances = [
+    "I hear you saying that you want to change but you're not sure how.",
+    "You need to stop making excuses and just do it.",
+    "How many cigarettes do you smoke per day?"
+]
+results = predict_multilabel(utterances, model, tokenizer)
+for r in results:
+    print(f"\nText: {r['text'][:60]}...")
+    print("Predicted labels:")
+    for label in ['adherent', 'non_adherent', 'neutral']:
+        status = "✓" if r['labels'][label] else "✗"
+        conf = r['probabilities'][label]
+        print(f"  {status} {label}: {conf:.2%}")
+```
+### Custom Threshold Tuning
+```python
+# Adjust threshold for precision/recall trade-off
+def predict_with_custom_threshold(text, model, tokenizer, thresholds):
+    """
+    Predict with different thresholds for each label.
+    Args:
+        thresholds: Dict with keys 'adherent', 'non_adherent', 'neutral'
+    """
+    inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=3000)
+    with torch.no_grad():
+        logits = model(inputs['input_ids'], inputs['attention_mask'])
+        probs = torch.sigmoid(logits)
+    labels_list = ['adherent', 'non_adherent', 'neutral']
+    predictions = {}
+    for i, label in enumerate(labels_list):
+        threshold = thresholds.get(label, 0.5)
+        predictions[label] = {
+            'predicted': bool(probs[0][i] > threshold),
+            'probability': float(probs[0][i]),
+            'threshold': threshold
+        }
+    return predictions
+# Example: Higher threshold for adherent (higher precision)
+custom_thresholds = {
+    'adherent': 0.6,
+    'non_adherent': 0.5,
+    'neutral': 0.5
+}
+result = predict_with_custom_threshold(
+    "What are your thoughts on reducing your drinking?",
+    model,
+    tokenizer,
+    custom_thresholds
+)
+```
+## Limitations and Bias
+### Limitations
+1. **Domain Specificity:** Trained on Motivational Interviewing data; may not generalize to other therapeutic modalities
+2. **Context Dependency:** Performance may vary with utterances lacking proper conversational context
+3. **Class Imbalance:** Lower performance on "adherent" label due to class imbalance in training data
+4. **Multilabel Complexity:** Some utterances may have ambiguous or overlapping codes
+5. **Context Length:** Maximum 3000 tokens; longer texts will be truncated
+6. **Language:** Trained on English text only
+### Potential Biases
+- Training data may reflect biases from the original coding framework and human coders
+- Performance may vary across different MI contexts (e.g., substance use vs. health behavior change)
+- Cultural and linguistic variations in therapeutic communication may affect predictions
+- The model may be more accurate on populations/contexts similar to training data
+### Recommended Use
+- Use as a screening tool or preliminary analysis, not as definitive behavioral coding
+- Validate predictions with human expert review, especially for critical applications
+- Consider adjusting prediction thresholds based on your use case (precision vs. recall trade-off)
+- Be aware that multilabel predictions may sometimes conflict with clinical judgment
+## Technical Specifications
+### Model Architecture
+- **Base:** ModernBERT-base (encoder-only transformer)
+- **Custom Head:** Dropout (0.1) + Linear layer (hidden_size → 3 labels)
+- **Activation:** Sigmoid (for independent label probabilities)
+- **Attention:** Flash Attention 2 implementation
+- **Parameters:** ~110M (inherited from base model + classification head)
+- **Precision:** BFloat16
+### Compute Infrastructure
+- **Training:** Single GPU with CUDA
+- **Inference:** CPU or GPU compatible
+- **Memory:** ~500MB model size
+### Label Format
+```python
+# Output format
+{
+  "adherent": 0 or 1,
+  "non_adherent": 0 or 1,
+  "neutral": 0 or 1
+}
+# Example: An utterance can have multiple labels
+# "I hear that you're struggling, and I believe you can overcome this."
+# → adherent=1, non_adherent=0, neutral=0
+```
+## Environmental Impact
+Training was conducted using mixed precision to optimize resource usage. Exact carbon footprint was not measured.
+## Citation
+If you use this model in your research, please cite:
+```bibtex
+@misc{lekhansh2025bcmultilabel,
+  author = {Lekhansh},
+  title = {Behavioral Coding Multilabel Classifier for Motivational Interviewing},
+  year = {2025},
+  publisher = {HuggingFace},
+  howpublished = {\url{https://huggingface.co/Lekhansh/bc-multilabel-classifier}}
+}
+```
+## References
+For more information on Motivational Interviewing behavioral coding:
+- Miller, W. R., & Rollnick, S. (2013). *Motivational Interviewing: Helping People Change* (3rd ed.)
+- Moyers, T. B., et al. (2016). *Motivational Interviewing Treatment Integrity Coding Manual 4.2.1*
+## Model Card Authors
+Lekhansh
+## Model Card Contact
+[drlekhansh@gmail.com]

metrics.json ADDED Viewed

	@@ -0,0 +1,56 @@

+{
+  "validation": {
+    "eval_loss": 0.5891359448432922,
+    "eval_hamming_loss": 0.056362699639361157,
+    "eval_exact_match": 0.8615146831530139,
+    "eval_adherent_precision": 0.7469244288224957,
+    "eval_adherent_recall": 0.7227891156462585,
+    "eval_adherent_f1": 0.7346585998271391,
+    "eval_adherent_accuracy": 0.9051004636785163,
+    "eval_non_adherent_precision": 0.8695652173913043,
+    "eval_non_adherent_recall": 0.935672514619883,
+    "eval_non_adherent_f1": 0.9014084507042254,
+    "eval_non_adherent_accuracy": 0.9891808346213292,
+    "eval_neutral_precision": 0.978328173374613,
+    "eval_neutral_recall": 0.9524447421299397,
+    "eval_neutral_f1": 0.9652129645341931,
+    "eval_neutral_accuracy": 0.9366306027820711,
+    "eval_precision_macro": 0.8649392731961377,
+    "eval_recall_macro": 0.870302124132027,
+    "eval_f1_macro": 0.8670933383551859,
+    "eval_precision_micro": 0.9368852459016394,
+    "eval_recall_micro": 0.9156208277703605,
+    "eval_f1_micro": 0.9261309925725861,
+    "eval_runtime": 5.213,
+    "eval_samples_per_second": 620.566,
+    "eval_steps_per_second": 51.794,
+    "epoch": 14.0
+  },
+  "test": {
+    "eval_loss": 0.5622028708457947,
+    "eval_hamming_loss": 0.05790829469345698,
+    "eval_exact_match": 0.8562596599690881,
+    "eval_adherent_precision": 0.7446808510638298,
+    "eval_adherent_recall": 0.741042345276873,
+    "eval_adherent_f1": 0.7428571428571429,
+    "eval_adherent_accuracy": 0.9026275115919629,
+    "eval_non_adherent_precision": 0.8734177215189873,
+    "eval_non_adherent_recall": 0.9139072847682119,
+    "eval_non_adherent_f1": 0.8932038834951457,
+    "eval_non_adherent_accuracy": 0.9897990726429675,
+    "eval_neutral_precision": 0.9777321000342583,
+    "eval_neutral_recall": 0.9503829503829504,
+    "eval_neutral_f1": 0.9638635596082404,
+    "eval_neutral_accuracy": 0.9338485316846986,
+    "eval_precision_macro": 0.8652768908723584,
+    "eval_recall_macro": 0.8684441934760118,
+    "eval_f1_macro": 0.8666415286535097,
+    "eval_precision_micro": 0.934652928416486,
+    "eval_recall_micro": 0.9148089171974523,
+    "eval_f1_micro": 0.9246244635193133,
+    "eval_runtime": 5.6695,
+    "eval_samples_per_second": 570.599,
+    "eval_steps_per_second": 47.623,
+    "epoch": 14.0
+  }
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e06d3c7d8664869b9b707e662aac0d663cd80b13c508f9e089f97e45580bc631
+size 298051748

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,37 @@

+{
+  "cls_token": {
+    "content": "[CLS]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "mask_token": {
+    "content": "[MASK]",
+    "lstrip": true,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "[PAD]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "sep_token": {
+    "content": "[SEP]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "[UNK]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,945 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "|||IP_ADDRESS|||",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "1": {
+      "content": "<|padding|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "50254": {
+      "content": "                        ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50255": {
+      "content": "                       ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50256": {
+      "content": "                      ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50257": {
+      "content": "                     ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50258": {
+      "content": "                    ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50259": {
+      "content": "                   ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50260": {
+      "content": "                  ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50261": {
+      "content": "                 ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50262": {
+      "content": "                ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50263": {
+      "content": "               ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50264": {
+      "content": "              ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50265": {
+      "content": "             ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50266": {
+      "content": "            ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50267": {
+      "content": "           ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50268": {
+      "content": "          ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50269": {
+      "content": "         ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50270": {
+      "content": "        ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50271": {
+      "content": "       ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50272": {
+      "content": "      ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50273": {
+      "content": "     ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50274": {
+      "content": "    ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50275": {
+      "content": "   ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50276": {
+      "content": "  ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50277": {
+      "content": "|||EMAIL_ADDRESS|||",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50278": {
+      "content": "|||PHONE_NUMBER|||",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50279": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "50280": {
+      "content": "[UNK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "50281": {
+      "content": "[CLS]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "50282": {
+      "content": "[SEP]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "50283": {
+      "content": "[PAD]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "50284": {
+      "content": "[MASK]",
+      "lstrip": true,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "50285": {
+      "content": "[unused0]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50286": {
+      "content": "[unused1]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50287": {
+      "content": "[unused2]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50288": {
+      "content": "[unused3]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50289": {
+      "content": "[unused4]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50290": {
+      "content": "[unused5]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50291": {
+      "content": "[unused6]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50292": {
+      "content": "[unused7]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50293": {
+      "content": "[unused8]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50294": {
+      "content": "[unused9]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50295": {
+      "content": "[unused10]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50296": {
+      "content": "[unused11]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50297": {
+      "content": "[unused12]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50298": {
+      "content": "[unused13]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50299": {
+      "content": "[unused14]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50300": {
+      "content": "[unused15]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50301": {
+      "content": "[unused16]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50302": {
+      "content": "[unused17]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50303": {
+      "content": "[unused18]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50304": {
+      "content": "[unused19]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50305": {
+      "content": "[unused20]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50306": {
+      "content": "[unused21]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50307": {
+      "content": "[unused22]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50308": {
+      "content": "[unused23]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50309": {
+      "content": "[unused24]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50310": {
+      "content": "[unused25]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50311": {
+      "content": "[unused26]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50312": {
+      "content": "[unused27]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50313": {
+      "content": "[unused28]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50314": {
+      "content": "[unused29]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50315": {
+      "content": "[unused30]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50316": {
+      "content": "[unused31]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50317": {
+      "content": "[unused32]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50318": {
+      "content": "[unused33]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50319": {
+      "content": "[unused34]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50320": {
+      "content": "[unused35]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50321": {
+      "content": "[unused36]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50322": {
+      "content": "[unused37]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50323": {
+      "content": "[unused38]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50324": {
+      "content": "[unused39]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50325": {
+      "content": "[unused40]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50326": {
+      "content": "[unused41]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50327": {
+      "content": "[unused42]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50328": {
+      "content": "[unused43]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50329": {
+      "content": "[unused44]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50330": {
+      "content": "[unused45]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50331": {
+      "content": "[unused46]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50332": {
+      "content": "[unused47]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50333": {
+      "content": "[unused48]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50334": {
+      "content": "[unused49]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50335": {
+      "content": "[unused50]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50336": {
+      "content": "[unused51]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50337": {
+      "content": "[unused52]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50338": {
+      "content": "[unused53]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50339": {
+      "content": "[unused54]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50340": {
+      "content": "[unused55]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50341": {
+      "content": "[unused56]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50342": {
+      "content": "[unused57]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50343": {
+      "content": "[unused58]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50344": {
+      "content": "[unused59]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50345": {
+      "content": "[unused60]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50346": {
+      "content": "[unused61]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50347": {
+      "content": "[unused62]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50348": {
+      "content": "[unused63]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50349": {
+      "content": "[unused64]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50350": {
+      "content": "[unused65]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50351": {
+      "content": "[unused66]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50352": {
+      "content": "[unused67]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50353": {
+      "content": "[unused68]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50354": {
+      "content": "[unused69]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50355": {
+      "content": "[unused70]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50356": {
+      "content": "[unused71]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50357": {
+      "content": "[unused72]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50358": {
+      "content": "[unused73]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50359": {
+      "content": "[unused74]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50360": {
+      "content": "[unused75]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50361": {
+      "content": "[unused76]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50362": {
+      "content": "[unused77]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50363": {
+      "content": "[unused78]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50364": {
+      "content": "[unused79]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50365": {
+      "content": "[unused80]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50366": {
+      "content": "[unused81]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50367": {
+      "content": "[unused82]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    }
+  },
+  "clean_up_tokenization_spaces": true,
+  "cls_token": "[CLS]",
+  "extra_special_tokens": {},
+  "mask_token": "[MASK]",
+  "model_input_names": [
+    "input_ids",
+    "attention_mask"
+  ],
+  "model_max_length": 8192,
+  "pad_token": "[PAD]",
+  "sep_token": "[SEP]",
+  "tokenizer_class": "PreTrainedTokenizerFast",
+  "unk_token": "[UNK]"
+}

training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a62fb32dfb4437129f1fe77ede9079753da1af1a80a5d2cc3e88286adfea1970
+size 5368