ADS509
/

experiment_labels_bert_base

@@ -1,6 +1,6 @@
 ---
 library_name: transformers
-license: apache-2.0
 base_model: bert-base-uncased
 tags:
 - generated_from_trainer
@@ -9,34 +9,155 @@ metrics:
 model-index:
 - name: experiment_labels_bert_base
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
-# experiment_labels_bert_base
-This model is a fine-tuned version of [bert-base-uncased](https://huggingface.co/bert-base-uncased) on an unknown dataset.
 It achieves the following results on the evaluation set:
 - Loss: 0.6531
 - Accuracy: 0.7444
-- F1 Macro: 0.7295
 - F1 Weighted: 0.7451
 ## Model description
-More information needed
 ## Intended uses & limitations
-More information needed
 ## Training and evaluation data
-More information needed
 ## Training procedure
 ### Training hyperparameters
 The following hyperparameters were used during training:
@@ -44,7 +165,7 @@ The following hyperparameters were used during training:
 - train_batch_size: 32
 - eval_batch_size: 64
 - seed: 42
-- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: linear
 - lr_scheduler_warmup_steps: 300
 - num_epochs: 2
@@ -52,6 +173,8 @@ The following hyperparameters were used during training:
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 Macro | F1 Weighted |
 |:-------------:|:-----:|:----:|:---------------:|:--------:|:--------:|:-----------:|
 | 0.6645        | 1.0   | 1540 | 0.6703          | 0.7275   | 0.7134   | 0.7292      |
@@ -63,4 +186,4 @@ The following hyperparameters were used during training:
 - Transformers 5.0.0
 - Pytorch 2.10.0+cu128
 - Datasets 4.0.0
-- Tokenizers 0.22.2

 ---
 library_name: transformers
+license: mit
 base_model: bert-base-uncased
 tags:
 - generated_from_trainer
 model-index:
 - name: experiment_labels_bert_base
   results: []
+datasets:
+- ADS509/full_experiment_labels
+language:
+- en
+pipeline_tag: text-classification
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
+# Experiment_labels_bert_base
+This model is a fine-tuned version of [bert-base-uncased](https://huggingface.co/bert-base-uncased) on dataset consisting of social media comments
+from 5 separate sources.
 It achieves the following results on the evaluation set:
 - Loss: 0.6531
 - Accuracy: 0.7444
+- **F1 Macro: 0.7295**
 - F1 Weighted: 0.7451
 ## Model description
+We retrained the classification layer of Bert Base for a multi-label classification task on our self-labeled data.  The model description
+of the base model can be found at the link above and the description of the dataset can be found [here](ADS509/full_experiment_labels).  The
+fine-tuning parameters are listed below.  This model was the inital model used in our experiment to see if there was any promise in our self-labeling approach
 ## Intended uses & limitations
+Intended use for this model is to better understand the nature of different social media websites and the nature of the discourse on that site
+beyond the usual "positive", "negative", "neutral" sentiment of most models. The labels for the commentary data are as follows:
+ - Argumentative
+ - Opinion
+ - Informational
+ - Expressive
+ - Neutral
+We think there is promise in this approach, and as this is the initial step towards a deeper understanding of social commentary, there are
+several limitations to outline
+  - As there were a total of 70k records, data was primarily labeled by language models, with the prompt including correctly labeled examples and
+    incorrectly labeled examples with the correct label.  Three language models were tasked with labeling, and only the majority vote labels were
+    kept.  Three-way tie samples were set aside.  Future iterations would benefit from more models labeling, and more human labeled examples
+  - When reviewing records were ambiguous or that the classifier incorrectly predicted, it was clear that the labeling scheme is fuzzy in some instances.
+    For instance, many "Opinion" comments can be viewed as "Expressive" "Arguments", leading to ambiguous labeling from models.  It would be worth
+    exploring a more nuanced labeling scheme, perhaps splitting "Expressive" into 2-3 labels and Opinion into another 1 or 2
+  - Due to the nature of the project, the commentary data used for training was subject to the following limitations
+    - Queries were isolated to "politics" or "US politics"
+    - With one exception, all comment data is dated from Jan 1, 2026 to Feb 12, 2026
+    - We set a ceiling and a floor for number of comments per post.  No posts with under 10 comments were used, and for posts with several comments,
+      we only pulled the most recent 300
 ## Training and evaluation data
+A full description of the data can be found [here](ADS509/full_experiment_labels)
 ## Training procedure
+The full code used for training is below
+```
+tokenizer = AutoTokenizer.from_pretrained("bert-base_uncased")
+# Function to tokenize data with
+def tokenize_function(batch):
+    return tokenizer(
+        batch['text'],
+        truncation=True,
+        max_length=512 # Can't be greater than model max length
+    )
+# Tokenize Data
+train_data = dataset['train'].map(tokenize_function, batched=True)
+test_data = dataset['test'].map(tokenize_function, batched=True)
+valid_data = dataset['valid'].map(tokenize_function, batched=True)
+# Convert lists to tensors
+train_data.set_format("torch", columns=['input_ids', "attention_mask", "label"])
+test_data.set_format("torch", columns=['input_ids', "attention_mask", "label"])
+valid_data.set_format("torch", columns=['input_ids', "attention_mask", "label"])
+model = AutoModelForSequenceClassification.from_pretrained(
+    MODEL_ID,
+    num_labels=5, # adjust this based on number of labels you're training on
+    device_map='cuda',
+    dtype='auto',
+    label2id=label2id,
+    id2label=id2label
+)
+# Metric function for evaluation in Trainer
+def compute_metrics(eval_pred):
+    predictions, labels = eval_pred
+    predictions = np.argmax(predictions, axis=1)
+    return {
+        'accuracy': accuracy_score(labels, predictions),
+        'f1_macro': f1_score(labels, predictions, average='macro'),
+        'f1_weighted': f1_score(labels, predictions, average='weighted')
+    }
+# Data collator to handle padding dynamically per batch
+data_collator = DataCollatorWithPadding(tokenizer=tokenizer)
+training_args = TrainingArguments(
+    output_dir='./bert-comment',
+    num_train_epochs=2,
+    per_device_train_batch_size=32,
+    per_device_eval_batch_size=64,
+    learning_rate=2e-5,
+    weight_decay=0.01,
+    warmup_steps=200,
+    # Evaluation & saving
+    eval_strategy='epoch',
+    save_strategy='epoch',
+    load_best_model_at_end=True,
+    metric_for_best_model='f1_macro',
+    # Logging
+    logging_steps=100,
+    report_to='tensorboard',
+    # Other
+    seed=42,
+    fp16=torch.cuda.is_available(),  # Mixed precision if GPU available
+)
+# Set up Trainer
+trainer = Trainer(
+    model=model,
+    args=training_args,
+    train_dataset=train_data,
+    eval_dataset=valid_data,
+    processing_class=tokenizer,
+    data_collator=data_collator,
+    compute_metrics=compute_metrics
+)
+# Train!
+trainer.train()
+# Evaluate
+eval_results = trainer.evaluate()
+print(eval_results)
+```
 ### Training hyperparameters
 The following hyperparameters were used during training:
 - train_batch_size: 32
 - eval_batch_size: 64
 - seed: 42
+- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
 - lr_scheduler_warmup_steps: 300
 - num_epochs: 2
 ### Training results
+As this is a multi-label classification problem and there is class imbalance, the main metric we evaluate this model by is `f1_macro`
 | Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 Macro | F1 Weighted |
 |:-------------:|:-----:|:----:|:---------------:|:--------:|:--------:|:-----------:|
 | 0.6645        | 1.0   | 1540 | 0.6703          | 0.7275   | 0.7134   | 0.7292      |
 - Transformers 5.0.0
 - Pytorch 2.10.0+cu128
 - Datasets 4.0.0
+- Tokenizers 0.22.2