ParitKansal
/

BERT_Paraphrase_Detection_GLUE_MRPC

+# Model Card for Paraphrase Detection Model
+This model is fine-tuned for the **paraphrase detection** task on the GLUE MRPC dataset. It determines whether two given sentences are paraphrases (i.e., if they have the same meaning or not). This is a binary classification task with the following labels:
+- **1**: Paraphrase
+- **0**: Not a paraphrase
+## Model Overview
+- **Developer**: Parit Kasnal
+- **Model Type**: Sequence Classification (Binary)
+- **Language(s)**: English
+- **Pre-trained Model**: BERT (bert-base-uncased)
+## Intended Use
+This model is designed to assess whether two sentences convey the same meaning. It can be applied in various scenarios, including:
+- **Duplicate Question Detection**: Identifying similar questions in QA systems.
+- **Plagiarism Detection**: Detecting if content is copied and rephrased.
+- **Summarization Alignment**: Matching sentences from summaries to the original content.
+## Example Usage
+```python
+from transformers import AutoModelForSequenceClassification, AutoTokenizer
+import torch
+# Load the fine-tuned model and tokenizer
+model = AutoModelForSequenceClassification.from_pretrained("Parit1/dummy")
+tokenizer = AutoTokenizer.from_pretrained("Parit1/dummy")
+def make_prediction(text1, text2):
+    device = "cuda" if torch.cuda.is_available() else "cpu"
+    inputs = tokenizer(text1, text2, truncation=True, padding=True, return_tensors="pt")
+    inputs = {k: v.to(device) for k, v in inputs.items()}
+    model.to(device)
+    with torch.no_grad():
+        outputs = model(**inputs)
+    logits = outputs.logits
+    prediction = torch.argmax(logits, dim=-1).item()
+    return prediction
+# Example usage
+text1 = "The quick brown fox jumps over the lazy dog."
+text2 = "A fast brown fox leaps over a lazy dog."
+prediction = make_prediction(text1, text2)
+print(f"Prediction: {prediction}")
+```
+## Training Details
+### Training Data
+The model was fine-tuned on the **GLUE MRPC** dataset, which contains pairs of sentences labeled as either paraphrases or not.
+### Training Procedure
+- **Number of Epochs**: 2
+- **Metrics Used**:
+  - Accuracy
+  - Precision
+  - Recall
+  - F1 Score
+#### Training Logs (Summary)
+- **Epoch 1**:
+  - Training: Avg Loss = 0.5443, Accuracy = 73.45%, Precision = 72.28%, Recall = 73.45%, F1 Score = 70.83%
+  - Testing: Avg Loss = 0.3976, Accuracy = 82.60%, Precision = 82.26%, Recall = 82.60%, F1 Score = 81.93%
+- **Epoch 2**:
+  - Training: Avg Loss = 0.2756, Accuracy = 89.34%, Precision = 89.25%, Recall = 89.34%, F1 Score = 89.27%
+  - Testing: Avg Loss = 0.3596, Accuracy = 84.80%, Precision = 84.94%, Recall = 84.80%, F1 Score = 84.87%
+## Evaluation
+### Performance Metrics
+The model's performance was evaluated using the following metrics:
+- **Accuracy**: Percentage of correct predictions.
+- **Precision**: Proportion of positive identifications that were actually correct.
+- **Recall**: Proportion of actual positives that were correctly identified.
+- **F1 Score**: The harmonic mean of Precision and Recall.
+### Test Set Results
+- **Epoch 1**:
+  - Avg Loss: 0.3976
+  - Accuracy: 82.60%
+  - Precision: 82.26%
+  - Recall: 82.60%
+  - F1 Score: 81.93%
+- **Epoch 2**:
+  - Avg Loss: 0.3596
+  - Accuracy: 84.80%
+  - Precision: 84.94%
+  - Recall: 84.80%
+  - F1 Score: 84.87%
+---
+This updated model card improves clarity, structure, and consistency, providing a more detailed explanation of each section while maintaining a professional tone.