Text Classification Model - Do Two Sentences Share Same Meaning
Introduction
This is a fine-tuned version of the google - bert-base-uncased optimized for text classification (do two sentences share the same meaning) using the GLUE MRPC data set. The model was saved after epoch 3 of 4 to capture peak performance to balance efficiency and accuracy for real-world text analysis.
Key Features
- Efficient Fine-Tuning: Trained with low-resource setup (single GPU - NVIDIA GeForce RTX 3070 (8 GB Ram), 4 epochs).
- Strong Performance: Achieves 88.0% accuracy and F1 of 91.7% on validation, rivaling the best models for the task using this small data set.
Training Details
Even through the bert model already incorporates dropout, adding additional measures improved performance. In addition, adding a 10% warm up to the learning rate schedule also helped. I found that having 4 epochs but using weights from 3rd improved performance.
- Hyperparameters:
- Batch Size: 8
- Learning Rate: 3e-5
- lr_scheduler = get_scheduler("linear", optimizer=optimizer, num_warmup_steps=int(0.1 * num_training_steps), num_training_steps=num_training_steps)
- Weight Decay: 0.03
- Dropout:hidden_dropout_prob=0.3, attention_probs_dropout_prob=0.2, classifier_dropout=0.2)
- Epochs: 4 (saved after 3)
- Optimizer: AdamW
Training Progress
Validation Metrics Summary:
| Epoch | Training Loss | Validation Loss | Accuracy | F1 Score |
|---|---|---|---|---|
| 1 | 0.5929 | 0.4899 | 0.7647 | 0.8514 |
| 2 | 0.4233 | 0.4552 | 0.8113 | 0.8760 |
| 3 | 0.2679 | 0.3728 | 0.8799 | 0.9171 |
| 4 | 0.1774 | 0.4750 | 0.8554 | 0.9031 |
Test Data Metrics Summary:
| Test Accuracy | Test F1 Score |
|---|---|
| 0.8296 | 0.8793 |
Here are three sentence pairs the model got correct from the test data:
Correctly Classified Test Sentence Pairs:
Idx: 0
- Sentence 1: PCCW 's chief operating officer , Mike Butcher , and Alex Arena , the chief financial officer , will report directly to Mr So .
- Sentence 2: Current Chief Operating Officer Mike Butcher and Group Chief Financial Officer Alex Arena will report to So .
- Prediction: Paraphrase
- True Label: Paraphrase
- Correct: True
- Logits: [-2.2242250442504883, 2.586000680923462]
Idx: 1
- Sentence 1: The world 's two largest automakers said their U.S. sales declined more than predicted last month as a late summer sales frenzy caused more of an industry backlash than expected .
- Sentence 2: Domestic sales at both GM and No. 2 Ford Motor Co. declined more than predicted as a late summer sales frenzy prompted a larger-than-expected industry backlash .
- Prediction: Paraphrase
- True Label: Paraphrase
- Correct: True
- Logits: [-0.5495507717132568, 1.2153574228286743]
Idx: 2
- Sentence 1: According to the federal Centers for Disease Control and Prevention ( news - web sites ) , there were 19 reported cases of measles in the United States in 2002 .
- Sentence 2: The Centers for Disease Control and Prevention said there were 19 reported cases of measles in the United States in 2002 .
- Prediction: Paraphrase
- True Label: Paraphrase
- Correct: True
- Logits: [-2.498159170150757, 2.9100100994110107]
Here are three sentence pairs the model got incorrect from the test data:
Incorrectly Classified Test Sentence Pairs:
Idx: 3
- Sentence 1: A tropical storm rapidly developed in the Gulf of Mexico Sunday and was expected to hit somewhere along the Texas or Louisiana coasts by Monday night .
- Sentence 2: A tropical storm rapidly developed in the Gulf of Mexico on Sunday and could have hurricane-force winds when it hits land somewhere along the Louisiana coast Monday night .
- Prediction: Paraphrase
- True Label: Not a paraphrase
- Correct: False
- Logits: [-1.446120023727417, 2.0181071758270264]
Idx: 13
- Sentence 1: Hong Kong was flat , Australia , Singapore and South Korea lost 0.2-0.4 percent .
- Sentence 2: Australia was flat , Singapore was down 0.3 percent by midday and South Korea added 0.2 percent .
- Prediction: Paraphrase
- True Label: Not a paraphrase
- Correct: False
- Logits: [-0.11220673471689224, 0.7146934866905212]
Idx: 15
- Sentence 1: Ballmer has been vocal in the past warning that Linux is a threat to Microsoft .
- Sentence 2: In the memo , Ballmer reiterated the open-source threat to Microsoft .
- Prediction: Paraphrase
- True Label: Not a paraphrase
- Correct: False
- Logits: [-1.2966821193695068, 1.964686393737793]
This highlights some limitations:
- IDX 15) Been vocal could be in memo form but it also could have been just talked about in meetings and not in written form. Likewise, Linux is open-source but open-source may not be Linux.
- IDX 13) Singapore was down 0.3 percent which is in the range of 0.2-0.4 but Singapore could have also flucuated between 0.2-0.4 and not been stagnant. Hence the second sentence does not explain 0.4 percent.
- IDX 3) The second sentence fails to highlight Texas and hence could not be considered a paraphase.
Four Sentences Created by Grok to Test the model:
Sentence_Pairs
{'sentence1': 'The company announced a major merger with its biggest competitor last week.',
'sentence2': 'Last week, the firm revealed plans to merge with its primary rival.',
'label': 1,
'idx': 102500}
{'sentence1': 'The new policy aims to reduce emissions by 20% over the next five years through incentives for green technology.',
'sentence2': 'Over the coming five years, this initiative seeks to cut pollution levels by a fifth by promoting eco-friendly innovations',
'label': 1,
'idx': 102501}
{'sentence1': 'The team won the championship after a dramatic comeback in the final quarter.',
'sentence2': 'The players celebrated their victory following an intense practice session before the big game.',
'label': 0,
'idx': 102502}
{'sentence1': 'The legislation seeks to curb carbon emissions by 25% within a decade through tax incentives for renewable energy.',
'sentence2': 'This bill aims to reduce CO2 output by a quarter over ten years by offering tax breaks for green energy solutions.',
'label': 1,
'idx': 102503}]
Here are the results (All Correct):
- Predicted class for pair 1: 1, true value is 1, and idx is 102500
- Predicted class for pair 2: 1, true value is 1, and idx is 102501
- Predicted class for pair 3: 0, true value is 0, and idx is 102502
- Predicted class for pair 4: 1, true value is 1, and idx is 102503
- Downloads last month
- -
Model tree for ddipaola/Fine-tune-bert-base-uncased
Base model
google-bert/bert-base-uncased