| # Sarcasm Detection with BERT | |
| This repository contains a fine-tuned BERT model for detecting sarcasm in headlines and text. The model achieves high accuracy in distinguishing between sarcastic and non-sarcastic content using natural language processing techniques. | |
| --- | |
| ## Model Details | |
| - **Model Name:** BERT-Base-Uncased Fine-tuned for Sarcasm Detection | |
| - **Model Architecture:** BERT Base (110M parameters) | |
| - **Task:** Binary Classification (Sarcastic vs Non-Sarcastic) | |
| - **Dataset:** Sarcasm Headlines Dataset | |
| - **Quantization:** Float16 (for optimized deployment) | |
| - **Fine-tuning Framework:** Hugging Face Transformers | |
| --- | |
| ## Dataset | |
| The model was trained on the **Sarcasm Headlines Dataset** which contains: | |
| - **Total Samples:** 26,709 headlines | |
| - **Features:** | |
| - `headline`: The text content to classify | |
| - `is_sarcastic`: Binary label (1 for sarcastic, 0 for non-sarcastic) | |
| - **Train/Test Split:** 90% training, 10% evaluation | |
| --- | |
| ## Performance Metrics | |
| | Epoch | Training Loss | Validation Loss | Accuracy | | |
| |-------|---------------|-----------------|----------| | |
| | 1 | 0.2048 | 0.1821 | 92.96% | | |
| | 2 | 0.1138 | 0.2792 | 91.01% | | |
| | 3 | 0.0586 | 0.2372 | **93.86%** | | |
| **Final Model Performance:** | |
| - **Best Accuracy:** 93.86% | |
| - **Final Training Loss:** 0.146 | |
| --- | |
| ## Installation | |
| ```bash | |
| pip install transformers datasets evaluate scikit-learn torch | |
| ``` | |
| --- | |
| ## Usage | |
| ### Quick Start | |
| ```python | |
| from transformers import pipeline | |
| import torch | |
| # Load the trained model | |
| classifier = pipeline("text-classification", | |
| model="./sarcasm_model", | |
| tokenizer="./sarcasm_model") | |
| # Test examples | |
| test_inputs = [ | |
| "I'm absolutely thrilled to be stuck in traffic again.", | |
| "The weather is nice and sunny today.", | |
| "Oh great, another email from the boss with more tasks." | |
| ] | |
| for sentence in test_inputs: | |
| result = classifier(sentence)[0] | |
| label = "Sarcastic" if result["label"] == "LABEL_1" else "Not Sarcastic" | |
| print(f"'{sentence}' β {label} (Confidence: {result['score']:.2f})") | |
| ``` | |
| ### Manual Model Loading | |
| ```python | |
| from transformers import AutoTokenizer, AutoModelForSequenceClassification | |
| import torch | |
| # Load model and tokenizer | |
| model = AutoModelForSequenceClassification.from_pretrained("./sarcasm_model") | |
| tokenizer = AutoTokenizer.from_pretrained("./sarcasm_model") | |
| # Tokenize input | |
| text = "Oh wonderful, another Monday morning!" | |
| inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=128) | |
| # Inference | |
| with torch.no_grad(): | |
| outputs = model(**inputs) | |
| predictions = torch.nn.functional.softmax(outputs.logits, dim=-1) | |
| predicted_class = outputs.logits.argmax(dim=1).item() | |
| label_mapping = {0: "Not Sarcastic", 1: "Sarcastic"} | |
| confidence = predictions[0][predicted_class].item() | |
| print(f"Prediction: {label_mapping[predicted_class]} (Confidence: {confidence:.2f})") | |
| ``` | |
| --- | |
| ## Training Configuration | |
| ### Model Parameters | |
| - **Base Model:** `bert-base-uncased` | |
| - **Number of Labels:** 2 (binary classification) | |
| - **Max Sequence Length:** 128 tokens | |
| - **Tokenization:** WordPiece with padding and truncation | |
| ### Training Arguments | |
| - **Learning Rate:** 2e-5 | |
| - **Batch Size:** 16 (training), 32 (evaluation) | |
| - **Epochs:** 3 | |
| - **Weight Decay:** 0.01 | |
| - **Evaluation Strategy:** Every epoch | |
| - **Optimizer:** AdamW (default) | |
| ### Hardware Requirements | |
| - **GPU:** NVIDIA Tesla T4 (or equivalent) | |
| - **Memory:** ~4GB GPU memory for training | |
| - **Training Time:** ~18 minutes for 3 epochs | |
| --- | |
| ## Model Architecture | |
| The model uses BERT's transformer architecture with: | |
| - **Encoder Layers:** 12 | |
| - **Attention Heads:** 12 | |
| - **Hidden Size:** 768 | |
| - **Vocabulary Size:** 30,522 | |
| - **Classification Head:** Linear layer (768 β 2) | |
| --- | |
| ## File Structure | |
| ``` | |
| sarcasm-detection/ | |
| βββ sarcasm_model/ # Main fine-tuned model | |
| β βββ config.json | |
| β βββ model.safetensors | |
| β βββ tokenizer_config.json | |
| β βββ special_tokens_map.json | |
| β βββ vocab.txt | |
| β βββ tokenizer.json | |
| βββ quantized-model/ # Float16 quantized version | |
| β βββ config.json | |
| β βββ model.safetensors | |
| β βββ tokenizer files... | |
| βββ logs/ # Training logs | |
| βββ sarcasm-detection.ipynb # Training notebook | |
| βββ README.md # This file | |
| ``` | |
| --- | |
| ## Quantization | |
| A quantized version of the model is available for deployment optimization: | |
| ```python | |
| # Load quantized model (Float16) | |
| quantized_model = AutoModelForSequenceClassification.from_pretrained("./quantized-model") | |
| quantized_model = quantized_model.to(dtype=torch.float16) | |
| ``` | |
| **Benefits of Quantization:** | |
| - **Reduced Memory Usage:** ~50% smaller model size | |
| - **Faster Inference:** Improved speed on compatible hardware | |
| - **Minimal Accuracy Loss:** Maintains classification performance | |
| --- | |
| ## Limitations | |
| - **Domain Specificity:** Trained primarily on headlines; may not generalize perfectly to other text types | |
| - **Context Dependency:** Sarcasm detection can be highly context-dependent and subjective | |
| - **Cultural Nuances:** May not capture sarcasm patterns from different cultural contexts | |
| - **Short Text Focus:** Optimized for headline-length text (typically under 128 tokens) | |
| --- | |
| ## Potential Improvements | |
| - **Data Augmentation:** Include more diverse sarcasm examples | |
| - **Ensemble Methods:** Combine multiple models for better accuracy | |
| - **Context Integration:** Incorporate additional context beyond the headline | |
| - **Multi-language Support:** Extend to other languages | |
| - **Real-time Processing:** Optimize for streaming applications | |
| --- | |
| ## Applications | |
| - **Social Media Monitoring:** Detect sarcastic comments and posts | |
| - **Content Moderation:** Identify potentially misleading sarcastic content | |
| - **Sentiment Analysis Enhancement:** Improve sentiment classification accuracy | |
| - **News Analysis:** Analyze editorial tone and bias in headlines | |
| - **Customer Feedback:** Better understand customer sentiment in reviews | |
| --- | |
| ## Citation | |
| If you use this model in your research, please cite: | |
| ```bibtex | |
| @misc{sarcasm_detection_bert, | |
| title={BERT-based Sarcasm Detection for Headlines}, | |
| author={Your Name}, | |
| year={2025}, | |
| note={Fine-tuned BERT model for binary sarcasm classification} | |
| } | |
| ``` | |
| --- | |
| ## Contributing | |
| Contributions are welcome! Please feel free to: | |
| - Report bugs or issues | |
| - Suggest improvements | |
| - Add new features | |
| - Improve documentation | |
| --- | |
| ## License | |
| This project is licensed under the MIT License. The underlying BERT model follows Google's Apache 2.0 license. | |
| --- | |
| ## Acknowledgments | |
| - **Hugging Face** for the Transformers library | |
| - **Google Research** for the original BERT model | |
| - **Kaggle** for providing the Sarcasm Headlines Dataset | |
| - **PyTorch** for the deep learning framework |