| | --- |
| | library_name: transformers |
| | license: mit |
| | base_model: bert-base-uncased |
| | tags: |
| | - generated_from_trainer |
| | metrics: |
| | - accuracy |
| | model-index: |
| | - name: experiment_labels_bert_base |
| | results: [] |
| | datasets: |
| | - ADS509/full_experiment_labels |
| | language: |
| | - en |
| | pipeline_tag: text-classification |
| | --- |
| | |
| | <!-- This model card has been generated automatically according to the information the Trainer had access to. You |
| | should probably proofread and complete it, then remove this comment. --> |
| |
|
| | # Experiment_labels_bert_base |
| | |
| | This model is a fine-tuned version of [bert-base-uncased](https://huggingface.co/bert-base-uncased) on dataset consisting of social media comments |
| | from 5 separate sources. |
| | |
| | It achieves the following results on the evaluation set: |
| | - Loss: 0.6531 |
| | - Accuracy: 0.7444 |
| | - **F1 Macro: 0.7295** |
| | - F1 Weighted: 0.7451 |
| | |
| | ## Model description |
| | |
| | We retrained the classification layer of Bert Base for a multi-label classification task on our self-labeled data. The model description |
| | of the base model can be found at the link above and the description of the dataset can be found [here](ADS509/full_experiment_labels). The |
| | fine-tuning parameters are listed below. This model was the inital model used in our experiment to see if there was any promise in our self-labeling approach |
| | |
| | ## Intended uses & limitations |
| | |
| | Intended use for this model is to better understand the nature of different social media websites and the nature of the discourse on that site |
| | beyond the usual "positive", "negative", "neutral" sentiment of most models. The labels for the commentary data are as follows: |
| | |
| | - Argumentative |
| | - Opinion |
| | - Informational |
| | - Expressive |
| | - Neutral |
| | |
| | We think there is promise in this approach, and as this is the initial step towards a deeper understanding of social commentary, there are |
| | several limitations to outline |
| | |
| | - As there were a total of 70k records, data was primarily labeled by language models, with the prompt including correctly labeled examples and |
| | incorrectly labeled examples with the correct label. Three language models were tasked with labeling, and only the majority vote labels were |
| | kept. Three-way tie samples were set aside. Future iterations would benefit from more models labeling, and more human labeled examples |
| | - When reviewing records were ambiguous or that the classifier incorrectly predicted, it was clear that the labeling scheme is fuzzy in some instances. |
| | For instance, many "Opinion" comments can be viewed as "Expressive" "Arguments", leading to ambiguous labeling from models. It would be worth |
| | exploring a more nuanced labeling scheme, perhaps splitting "Expressive" into 2-3 labels and Opinion into another 1 or 2 |
| | - Due to the nature of the project, the commentary data used for training was subject to the following limitations |
| | - Queries were isolated to "politics" or "US politics" |
| | - With one exception, all comment data is dated from Jan 1, 2026 to Feb 12, 2026 |
| | - We set a ceiling and a floor for number of comments per post. No posts with under 10 comments were used, and for posts with several comments, |
| | we only pulled the most recent 300 |
| | |
| | |
| | ## Training and evaluation data |
| | |
| | A full description of the data can be found [here](ADS509/full_experiment_labels) |
| | |
| | ## Training procedure |
| | |
| | The full code used for training is below |
| | |
| | ```python |
| | tokenizer = AutoTokenizer.from_pretrained("bert-base_uncased") |
| | |
| | # Function to tokenize data with |
| | def tokenize_function(batch): |
| | return tokenizer( |
| | batch['text'], |
| | truncation=True, |
| | max_length=512 # Can't be greater than model max length |
| | ) |
| | |
| | # Tokenize Data |
| | train_data = dataset['train'].map(tokenize_function, batched=True) |
| | test_data = dataset['test'].map(tokenize_function, batched=True) |
| | valid_data = dataset['valid'].map(tokenize_function, batched=True) |
| |
|
| | # Convert lists to tensors |
| | train_data.set_format("torch", columns=['input_ids', "attention_mask", "label"]) |
| | test_data.set_format("torch", columns=['input_ids', "attention_mask", "label"]) |
| | valid_data.set_format("torch", columns=['input_ids', "attention_mask", "label"]) |
| |
|
| | model = AutoModelForSequenceClassification.from_pretrained( |
| | MODEL_ID, |
| | num_labels=5, # adjust this based on number of labels you're training on |
| | device_map='cuda', |
| | dtype='auto', |
| | label2id=label2id, |
| | id2label=id2label |
| | ) |
| | |
| | # Metric function for evaluation in Trainer |
| | def compute_metrics(eval_pred): |
| | predictions, labels = eval_pred |
| | predictions = np.argmax(predictions, axis=1) |
| | |
| | return { |
| | 'accuracy': accuracy_score(labels, predictions), |
| | 'f1_macro': f1_score(labels, predictions, average='macro'), |
| | 'f1_weighted': f1_score(labels, predictions, average='weighted') |
| | } |
| | |
| | # Data collator to handle padding dynamically per batch |
| | data_collator = DataCollatorWithPadding(tokenizer=tokenizer) |
| | |
| | training_args = TrainingArguments( |
| | output_dir='./bert-comment', |
| | num_train_epochs=2, |
| | per_device_train_batch_size=32, |
| | per_device_eval_batch_size=64, |
| | learning_rate=2e-5, |
| | weight_decay=0.01, |
| | warmup_steps=300, |
| | |
| | # Evaluation & saving |
| | eval_strategy='epoch', |
| | save_strategy='epoch', |
| | load_best_model_at_end=True, |
| | metric_for_best_model='f1_macro', |
| | |
| | # Logging |
| | logging_steps=100, |
| | report_to='tensorboard', |
| | |
| | # Other |
| | seed=42, |
| | fp16=torch.cuda.is_available(), # Mixed precision if GPU available |
| | ) |
| | |
| | # Set up Trainer |
| | trainer = Trainer( |
| | model=model, |
| | args=training_args, |
| | train_dataset=train_data, |
| | eval_dataset=valid_data, |
| | processing_class=tokenizer, |
| | data_collator=data_collator, |
| | compute_metrics=compute_metrics |
| | ) |
| | |
| | # Train! |
| | trainer.train() |
| |
|
| | # Evaluate |
| | eval_results = trainer.evaluate() |
| | print(eval_results) |
| | ``` |
| | |
| | ### Training hyperparameters |
| | |
| | The following hyperparameters were used during training: |
| | - learning_rate: 2e-05 |
| | - train_batch_size: 32 |
| | - eval_batch_size: 64 |
| | - seed: 42 |
| | - optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 |
| | - lr_scheduler_type: linear |
| | - lr_scheduler_warmup_steps: 300 |
| | - num_epochs: 2 |
| | - mixed_precision_training: Native AMP |
| | |
| | ### Training results |
| | |
| | As this is a multi-label classification problem and there is class imbalance, the main metric we evaluate this model by is `f1_macro` |
| | |
| | | Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 Macro | F1 Weighted | |
| | |:-------------:|:-----:|:----:|:---------------:|:--------:|:--------:|:-----------:| |
| | | 0.6645 | 1.0 | 1540 | 0.6703 | 0.7275 | 0.7134 | 0.7292 | |
| | | 0.5152 | 2.0 | 3080 | 0.6531 | 0.7444 | 0.7295 | 0.7451 | |
| | |
| | |
| | ### Framework versions |
| | |
| | - Transformers 5.0.0 |
| | - Pytorch 2.10.0+cu128 |
| | - Datasets 4.0.0 |
| | - Tokenizers 0.22.2 |