You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Text Classification Model - Do Two Sentences Share Same Meaning

Introduction

This is a fine-tuned version of the google - bert-base-uncased optimized for text classification (do two sentences share the same meaning) using the GLUE MRPC data set. The model was saved after epoch 3 of 4 to capture peak performance to balance efficiency and accuracy for real-world text analysis.

Key Features

  • Efficient Fine-Tuning: Trained with low-resource setup (single GPU - NVIDIA GeForce RTX 3070 (8 GB Ram), 4 epochs).
  • Strong Performance: Achieves 88.0% accuracy and F1 of 91.7% on validation, rivaling the best models for the task using this small data set.

Training Details

Even through the bert model already incorporates dropout, adding additional measures improved performance. In addition, adding a 10% warm up to the learning rate schedule also helped. I found that having 4 epochs but using weights from 3rd improved performance.

  • Hyperparameters:
    • Batch Size: 8
    • Learning Rate: 3e-5
    • lr_scheduler = get_scheduler("linear", optimizer=optimizer, num_warmup_steps=int(0.1 * num_training_steps), num_training_steps=num_training_steps)
    • Weight Decay: 0.03
    • Dropout:hidden_dropout_prob=0.3, attention_probs_dropout_prob=0.2, classifier_dropout=0.2)
    • Epochs: 4 (saved after 3)
    • Optimizer: AdamW

Training Progress

Training and Validation Loss - Validations Accuracy and F1

Validation Metrics Summary:

Epoch Training Loss Validation Loss Accuracy F1 Score
1 0.5929 0.4899 0.7647 0.8514
2 0.4233 0.4552 0.8113 0.8760
3 0.2679 0.3728 0.8799 0.9171
4 0.1774 0.4750 0.8554 0.9031

Test Data Metrics Summary:

Test Accuracy Test F1 Score
0.8296 0.8793

Here are three sentence pairs the model got correct from the test data:

Correctly Classified Test Sentence Pairs:

  • Idx: 0

    • Sentence 1: PCCW 's chief operating officer , Mike Butcher , and Alex Arena , the chief financial officer , will report directly to Mr So .
    • Sentence 2: Current Chief Operating Officer Mike Butcher and Group Chief Financial Officer Alex Arena will report to So .
    • Prediction: Paraphrase
    • True Label: Paraphrase
    • Correct: True
    • Logits: [-2.2242250442504883, 2.586000680923462]
  • Idx: 1

    • Sentence 1: The world 's two largest automakers said their U.S. sales declined more than predicted last month as a late summer sales frenzy caused more of an industry backlash than expected .
    • Sentence 2: Domestic sales at both GM and No. 2 Ford Motor Co. declined more than predicted as a late summer sales frenzy prompted a larger-than-expected industry backlash .
    • Prediction: Paraphrase
    • True Label: Paraphrase
    • Correct: True
    • Logits: [-0.5495507717132568, 1.2153574228286743]
  • Idx: 2

    • Sentence 1: According to the federal Centers for Disease Control and Prevention ( news - web sites ) , there were 19 reported cases of measles in the United States in 2002 .
    • Sentence 2: The Centers for Disease Control and Prevention said there were 19 reported cases of measles in the United States in 2002 .
    • Prediction: Paraphrase
    • True Label: Paraphrase
    • Correct: True
    • Logits: [-2.498159170150757, 2.9100100994110107]

Here are three sentence pairs the model got incorrect from the test data:

Incorrectly Classified Test Sentence Pairs:

  • Idx: 3

    • Sentence 1: A tropical storm rapidly developed in the Gulf of Mexico Sunday and was expected to hit somewhere along the Texas or Louisiana coasts by Monday night .
    • Sentence 2: A tropical storm rapidly developed in the Gulf of Mexico on Sunday and could have hurricane-force winds when it hits land somewhere along the Louisiana coast Monday night .
    • Prediction: Paraphrase
    • True Label: Not a paraphrase
    • Correct: False
    • Logits: [-1.446120023727417, 2.0181071758270264]
  • Idx: 13

    • Sentence 1: Hong Kong was flat , Australia , Singapore and South Korea lost 0.2-0.4 percent .
    • Sentence 2: Australia was flat , Singapore was down 0.3 percent by midday and South Korea added 0.2 percent .
    • Prediction: Paraphrase
    • True Label: Not a paraphrase
    • Correct: False
    • Logits: [-0.11220673471689224, 0.7146934866905212]
  • Idx: 15

    • Sentence 1: Ballmer has been vocal in the past warning that Linux is a threat to Microsoft .
    • Sentence 2: In the memo , Ballmer reiterated the open-source threat to Microsoft .
    • Prediction: Paraphrase
    • True Label: Not a paraphrase
    • Correct: False
    • Logits: [-1.2966821193695068, 1.964686393737793]

This highlights some limitations:

  • IDX 15) Been vocal could be in memo form but it also could have been just talked about in meetings and not in written form. Likewise, Linux is open-source but open-source may not be Linux.
  • IDX 13) Singapore was down 0.3 percent which is in the range of 0.2-0.4 but Singapore could have also flucuated between 0.2-0.4 and not been stagnant. Hence the second sentence does not explain 0.4 percent.
  • IDX 3) The second sentence fails to highlight Texas and hence could not be considered a paraphase.

Four Sentences Created by Grok to Test the model:

  • Sentence_Pairs

    • {'sentence1': 'The company announced a major merger with its biggest competitor last week.',

    • 'sentence2': 'Last week, the firm revealed plans to merge with its primary rival.',

    • 'label': 1,

    • 'idx': 102500}

    • {'sentence1': 'The new policy aims to reduce emissions by 20% over the next five years through incentives for green technology.',

    • 'sentence2': 'Over the coming five years, this initiative seeks to cut pollution levels by a fifth by promoting eco-friendly innovations',

    • 'label': 1,

    • 'idx': 102501}

    • {'sentence1': 'The team won the championship after a dramatic comeback in the final quarter.',

    • 'sentence2': 'The players celebrated their victory following an intense practice session before the big game.',

    • 'label': 0,

    • 'idx': 102502}

    • {'sentence1': 'The legislation seeks to curb carbon emissions by 25% within a decade through tax incentives for renewable energy.',

    • 'sentence2': 'This bill aims to reduce CO2 output by a quarter over ten years by offering tax breaks for green energy solutions.',

    • 'label': 1,

    • 'idx': 102503}]

  • Here are the results (All Correct):

    • Predicted class for pair 1: 1, true value is 1, and idx is 102500
    • Predicted class for pair 2: 1, true value is 1, and idx is 102501
    • Predicted class for pair 3: 0, true value is 0, and idx is 102502
    • Predicted class for pair 4: 1, true value is 1, and idx is 102503
Downloads last month
-
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ddipaola/Fine-tune-bert-base-uncased

Finetuned
(6257)
this model

Dataset used to train ddipaola/Fine-tune-bert-base-uncased