File size: 6,719 Bytes
163ae2d 9b1ba8d faac1a0 163ae2d faac1a0 163ae2d 9b1ba8d 163ae2d 9b1ba8d 163ae2d faac1a0 9b1ba8d faac1a0 163ae2d 9b1ba8d 163ae2d 9b1ba8d 163ae2d 9b1ba8d 163ae2d 9b1ba8d 2fff001 9b1ba8d 65f785a 9b1ba8d 163ae2d 9b1ba8d 163ae2d 9b1ba8d 163ae2d faac1a0 163ae2d 9b1ba8d | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 | ---
library_name: transformers
license: mit
base_model: bert-base-uncased
tags:
- generated_from_trainer
metrics:
- accuracy
model-index:
- name: experiment_labels_bert_base
results: []
datasets:
- ADS509/full_experiment_labels
language:
- en
pipeline_tag: text-classification
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# Experiment_labels_bert_base
This model is a fine-tuned version of [bert-base-uncased](https://huggingface.co/bert-base-uncased) on dataset consisting of social media comments
from 5 separate sources.
It achieves the following results on the evaluation set:
- Loss: 0.6531
- Accuracy: 0.7444
- **F1 Macro: 0.7295**
- F1 Weighted: 0.7451
## Model description
We retrained the classification layer of Bert Base for a multi-label classification task on our self-labeled data. The model description
of the base model can be found at the link above and the description of the dataset can be found [here](ADS509/full_experiment_labels). The
fine-tuning parameters are listed below. This model was the inital model used in our experiment to see if there was any promise in our self-labeling approach
## Intended uses & limitations
Intended use for this model is to better understand the nature of different social media websites and the nature of the discourse on that site
beyond the usual "positive", "negative", "neutral" sentiment of most models. The labels for the commentary data are as follows:
- Argumentative
- Opinion
- Informational
- Expressive
- Neutral
We think there is promise in this approach, and as this is the initial step towards a deeper understanding of social commentary, there are
several limitations to outline
- As there were a total of 70k records, data was primarily labeled by language models, with the prompt including correctly labeled examples and
incorrectly labeled examples with the correct label. Three language models were tasked with labeling, and only the majority vote labels were
kept. Three-way tie samples were set aside. Future iterations would benefit from more models labeling, and more human labeled examples
- When reviewing records were ambiguous or that the classifier incorrectly predicted, it was clear that the labeling scheme is fuzzy in some instances.
For instance, many "Opinion" comments can be viewed as "Expressive" "Arguments", leading to ambiguous labeling from models. It would be worth
exploring a more nuanced labeling scheme, perhaps splitting "Expressive" into 2-3 labels and Opinion into another 1 or 2
- Due to the nature of the project, the commentary data used for training was subject to the following limitations
- Queries were isolated to "politics" or "US politics"
- With one exception, all comment data is dated from Jan 1, 2026 to Feb 12, 2026
- We set a ceiling and a floor for number of comments per post. No posts with under 10 comments were used, and for posts with several comments,
we only pulled the most recent 300
## Training and evaluation data
A full description of the data can be found [here](ADS509/full_experiment_labels)
## Training procedure
The full code used for training is below
```python
tokenizer = AutoTokenizer.from_pretrained("bert-base_uncased")
# Function to tokenize data with
def tokenize_function(batch):
return tokenizer(
batch['text'],
truncation=True,
max_length=512 # Can't be greater than model max length
)
# Tokenize Data
train_data = dataset['train'].map(tokenize_function, batched=True)
test_data = dataset['test'].map(tokenize_function, batched=True)
valid_data = dataset['valid'].map(tokenize_function, batched=True)
# Convert lists to tensors
train_data.set_format("torch", columns=['input_ids', "attention_mask", "label"])
test_data.set_format("torch", columns=['input_ids', "attention_mask", "label"])
valid_data.set_format("torch", columns=['input_ids', "attention_mask", "label"])
model = AutoModelForSequenceClassification.from_pretrained(
MODEL_ID,
num_labels=5, # adjust this based on number of labels you're training on
device_map='cuda',
dtype='auto',
label2id=label2id,
id2label=id2label
)
# Metric function for evaluation in Trainer
def compute_metrics(eval_pred):
predictions, labels = eval_pred
predictions = np.argmax(predictions, axis=1)
return {
'accuracy': accuracy_score(labels, predictions),
'f1_macro': f1_score(labels, predictions, average='macro'),
'f1_weighted': f1_score(labels, predictions, average='weighted')
}
# Data collator to handle padding dynamically per batch
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)
training_args = TrainingArguments(
output_dir='./bert-comment',
num_train_epochs=2,
per_device_train_batch_size=32,
per_device_eval_batch_size=64,
learning_rate=2e-5,
weight_decay=0.01,
warmup_steps=300,
# Evaluation & saving
eval_strategy='epoch',
save_strategy='epoch',
load_best_model_at_end=True,
metric_for_best_model='f1_macro',
# Logging
logging_steps=100,
report_to='tensorboard',
# Other
seed=42,
fp16=torch.cuda.is_available(), # Mixed precision if GPU available
)
# Set up Trainer
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_data,
eval_dataset=valid_data,
processing_class=tokenizer,
data_collator=data_collator,
compute_metrics=compute_metrics
)
# Train!
trainer.train()
# Evaluate
eval_results = trainer.evaluate()
print(eval_results)
```
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 32
- eval_batch_size: 64
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 300
- num_epochs: 2
- mixed_precision_training: Native AMP
### Training results
As this is a multi-label classification problem and there is class imbalance, the main metric we evaluate this model by is `f1_macro`
| Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 Macro | F1 Weighted |
|:-------------:|:-----:|:----:|:---------------:|:--------:|:--------:|:-----------:|
| 0.6645 | 1.0 | 1540 | 0.6703 | 0.7275 | 0.7134 | 0.7292 |
| 0.5152 | 2.0 | 3080 | 0.6531 | 0.7444 | 0.7295 | 0.7451 |
### Framework versions
- Transformers 5.0.0
- Pytorch 2.10.0+cu128
- Datasets 4.0.0
- Tokenizers 0.22.2 |