DistilBERT for Twitter Sentiment Analysis π¦
This model is a fine-tuned version of distilbert-base-uncased for sentiment classification on Twitter/X data using LoRA (Low-Rank Adaptation) for parameter-efficient fine-tuning.
Model Description
- Base Model: DistilBERT (66M parameters)
- Fine-tuning Method: LoRA/PEFT (only ~1.5M parameters trained)
- Task: 3-class sentiment classification
- π Positive
- π Neutral
- π‘ Negative
- Dataset: tweet_eval sentiment subset
- Language: English
- Training Framework: Hugging Face Transformers + PEFT
π― Performance
The model achieves the following results on the test set:
| Metric | Score |
|---|---|
| Accuracy | 67.84% |
| F1 Score (weighted) | 0.6785 |
Per-Class Performance
| Class | Precision | Recall | F1-Score | Support |
|---|---|---|---|---|
| Negative π‘ | 0.71 | 0.65 | 0.67 | 3,972 |
| Neutral π | 0.69 | 0.70 | 0.69 | 5,937 |
| Positive π | 0.62 | 0.67 | 0.65 | 2,375 |
| Overall | 0.68 | 0.68 | 0.68 | 12,284 |
Confusion Matrix
Predicted
Neg Neu Pos
Actual Neg [2562 1210 200]
Neu [ 987 4170 780]
Pos [ 77 697 1601]
π Usage
Quick Start (Recommended)
# Install required packages
!pip install transformers peft torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from peft import PeftModel, PeftConfig
import torch
# Load model
model_name = "SeifElislamm/distilbert-sentiment-twitter"
# Load PEFT config
config = PeftConfig.from_pretrained(model_name)
# Load base model
base_model = AutoModelForSequenceClassification.from_pretrained(
config.base_model_name_or_path,
num_labels=3,
id2label={0: "negative", 1: "neutral", 2: "positive"},
label2id={"negative": 0, "neutral": 1, "positive": 2}
)
# Load LoRA adapters
model = PeftModel.from_pretrained(base_model, model_name)
model = model.merge_and_unload() # Merge for faster inference
model.eval()
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Predict sentiment
def predict_sentiment(text):
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
with torch.no_grad():
outputs = model(**inputs)
probs = torch.softmax(outputs.logits, dim=-1)
pred_class = torch.argmax(probs).item()
confidence = probs[0][pred_class].item()
labels = {0: "negative", 1: "neutral", 2: "positive"}
return labels[pred_class], confidence
# Test it
text = "I love this product! It's amazing!"
sentiment, confidence = predict_sentiment(text)
print(f"Sentiment: {sentiment.upper()} (confidence: {confidence:.1%})")
Batch Prediction
texts = [
"I love this so much! π",
"This is terrible. π‘",
"It's okay, nothing special. π"
]
for text in texts:
sentiment, confidence = predict_sentiment(text)
print(f"{text} β {sentiment.upper()} ({confidence:.1%})")
Expected Output
I love this so much! π β POSITIVE (85.3%)
This is terrible. π‘ β NEGATIVE (79.2%)
It's okay, nothing special. π β NEUTRAL (71.5%)
π§ͺ Quick Test in Google Colab
Want to test the model immediately? Copy this into a new Colab notebook:
!pip install -q transformers peft torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from peft import PeftModel, PeftConfig
import torch
model_name = "SeifElislamm/distilbert-sentiment-twitter"
config = PeftConfig.from_pretrained(model_name)
base_model = AutoModelForSequenceClassification.from_pretrained(
config.base_model_name_or_path, num_labels=3
)
model = PeftModel.from_pretrained(base_model, model_name).merge_and_unload()
tokenizer = AutoTokenizer.from_pretrained(model_name)
def predict(text):
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
with torch.no_grad():
outputs = model(**inputs)
probs = torch.softmax(outputs.logits, dim=-1)
pred = torch.argmax(probs).item()
labels = {0: "NEGATIVE", 1: "NEUTRAL", 2: "POSITIVE"}
return labels[pred], probs[0][pred].item()
# Test it!
text = input("Enter text: ")
sentiment, conf = predict(text)
print(f"β {sentiment} ({conf:.1%})")
π Training Details
Training Hyperparameters
| Parameter | Value |
|---|---|
| Base Model | distilbert-base-uncased |
| Learning Rate | 2e-5 |
| Batch Size | 32 |
| Epochs | 3 |
| Weight Decay | 0.01 |
| Max Sequence Length | 128 |
| Optimizer | AdamW |
| LR Scheduler | Linear |
LoRA Configuration
| Parameter | Value |
|---|---|
| LoRA Rank (r) | 16 |
| LoRA Alpha | 32 |
| LoRA Dropout | 0.1 |
| Target Modules | q_lin, v_lin |
| Trainable Parameters | ~1.5M / 66M (2.3%) |
Training Results
| Epoch | Training Loss | Validation Loss | Accuracy | F1 Score |
|---|---|---|---|---|
| 1 | 0.6845 | 0.7014 | 0.6805 | 0.6817 |
| 2 | 0.6841 | 0.6861 | 0.6925 | 0.6936 |
| 3 | 0.6718 | 0.6819 | 0.6975 | 0.6985 |
β Model converged successfully with decreasing loss and improving metrics!
π Training Data
The model was trained on the tweet_eval sentiment dataset:
| Split | Samples |
|---|---|
| Training | 45,615 |
| Validation | 2,000 |
| Test | 12,284 |
Dataset characteristics:
- Short text (typical tweets: 10-50 words)
- Informal language with emojis, hashtags, and mentions
- Balanced across negative, neutral, and positive sentiments
- Real-world social media data
π‘ Intended Uses
β Recommended Uses
- Social Media Monitoring: Analyze sentiment of tweets, posts, and comments
- Customer Feedback Analysis: Classify product reviews and feedback
- Brand Reputation Tracking: Monitor public opinion about brands
- Market Research: Understand customer sentiment trends
- Content Moderation: Flag potentially negative content
- Academic Research: Study sentiment patterns in social media
β οΈ Limitations
- Domain-specific: Trained on Twitter data; may not generalize well to:
- Formal documents (legal, academic)
- Long-form content (articles, essays)
- Domain-specific language (medical, technical)
- English only: Not suitable for other languages
- Context limitations:
- May struggle with sarcasm and irony
- Limited understanding of cultural context
- Can misinterpret complex or nuanced sentiments
- Bias: May reflect biases present in Twitter data
- Temporal: Trained on data up to 2024; may not capture emerging slang
β Out of Scope
- Multi-lingual sentiment analysis
- Emotion detection beyond positive/neutral/negative
- Aspect-based sentiment analysis
- Spam detection or content classification
- Real-time critical decision making
π§ Technical Details
Model Architecture
- Base: DistilBERT (distilled version of BERT)
- Layers: 6 transformer layers
- Hidden Size: 768
- Attention Heads: 12
- Parameters: 66M total, ~1.5M trained (LoRA)
- Classification Head: Linear layer (768 β 3)
Preprocessing
- Tokenization: WordPiece tokenization
- Max Length: 128 tokens
- Padding: Dynamic padding to max length in batch
- Truncation: Enabled for sequences > 128 tokens
Inference Speed
On GPU (T4):
- Single prediction: ~10-15ms
- Batch of 32: ~50-80ms
On CPU:
- Single prediction: ~50-100ms
- Batch of 32: ~500-800ms
π Citation
If you use this model in your research or application, please cite:
@misc{seif2025distilbert-sentiment,
author = {Seif Elislam},
title = {DistilBERT Fine-tuned for Twitter Sentiment Analysis},
year = {2025},
publisher = {Hugging Face},
journal = {Hugging Face Model Hub},
howpublished = {\url{https://huggingface.co/SeifElislamm/distilbert-sentiment-twitter}}
}
π License
This model is released under the Apache 2.0 License. The base DistilBERT model is also Apache 2.0 licensed.
π Acknowledgments
- Base Model: DistilBERT by Hugging Face
- Dataset: tweet_eval by Cardiff NLP
- Framework: Hugging Face Transformers
- PEFT: Hugging Face PEFT for LoRA implementation
- Compute: Google Colab (free tier with T4 GPU)
π Contact
For questions or issues, please open an issue on the model's discussion page.
Model Card Authors: Seif Elislam
Last Updated: November 2025
Model Version: 1.0
- Downloads last month
- -
Dataset used to train SeifElislamm/distilbert-sentiment-twitter
Evaluation results
- Accuracy on tweet_evaltest set self-reported0.678
- F1 Score (weighted) on tweet_evaltest set self-reported0.678