DistilBERT for Twitter Sentiment Analysis 🐦

This model is a fine-tuned version of distilbert-base-uncased for sentiment classification on Twitter/X data using LoRA (Low-Rank Adaptation) for parameter-efficient fine-tuning.

Model Description

Base Model: DistilBERT (66M parameters)
Fine-tuning Method: LoRA/PEFT (only ~1.5M parameters trained)
Task: 3-class sentiment classification
- 😊 Positive
- 😐 Neutral
- 😡 Negative
Dataset: tweet_eval sentiment subset
Language: English
Training Framework: Hugging Face Transformers + PEFT

🎯 Performance

The model achieves the following results on the test set:

Metric	Score
Accuracy	67.84%
F1 Score (weighted)	0.6785

Per-Class Performance

Class	Precision	Recall	F1-Score	Support
Negative 😡	0.71	0.65	0.67	3,972
Neutral 😐	0.69	0.70	0.69	5,937
Positive 😊	0.62	0.67	0.65	2,375
Overall	0.68	0.68	0.68	12,284

Confusion Matrix

              Predicted
           Neg    Neu    Pos
Actual Neg [2562  1210   200]
       Neu [ 987  4170   780]
       Pos [  77   697  1601]

🚀 Usage

Quick Start (Recommended)

# Install required packages
!pip install transformers peft torch

from transformers import AutoTokenizer, AutoModelForSequenceClassification
from peft import PeftModel, PeftConfig
import torch

# Load model
model_name = "SeifElislamm/distilbert-sentiment-twitter"

# Load PEFT config
config = PeftConfig.from_pretrained(model_name)

# Load base model
base_model = AutoModelForSequenceClassification.from_pretrained(
    config.base_model_name_or_path,
    num_labels=3,
    id2label={0: "negative", 1: "neutral", 2: "positive"},
    label2id={"negative": 0, "neutral": 1, "positive": 2}
)

# Load LoRA adapters
model = PeftModel.from_pretrained(base_model, model_name)
model = model.merge_and_unload()  # Merge for faster inference
model.eval()

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Predict sentiment
def predict_sentiment(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
    
    with torch.no_grad():
        outputs = model(**inputs)
        probs = torch.softmax(outputs.logits, dim=-1)
        pred_class = torch.argmax(probs).item()
        confidence = probs[0][pred_class].item()
    
    labels = {0: "negative", 1: "neutral", 2: "positive"}
    return labels[pred_class], confidence

# Test it
text = "I love this product! It's amazing!"
sentiment, confidence = predict_sentiment(text)
print(f"Sentiment: {sentiment.upper()} (confidence: {confidence:.1%})")

Batch Prediction

texts = [
    "I love this so much! 😍",
    "This is terrible. 😡",
    "It's okay, nothing special. 😐"
]

for text in texts:
    sentiment, confidence = predict_sentiment(text)
    print(f"{text} → {sentiment.upper()} ({confidence:.1%})")

Expected Output

I love this so much! 😍 → POSITIVE (85.3%)
This is terrible. 😡 → NEGATIVE (79.2%)
It's okay, nothing special. 😐 → NEUTRAL (71.5%)

🧪 Quick Test in Google Colab

Want to test the model immediately? Copy this into a new Colab notebook:

!pip install -q transformers peft torch

from transformers import AutoTokenizer, AutoModelForSequenceClassification
from peft import PeftModel, PeftConfig
import torch

model_name = "SeifElislamm/distilbert-sentiment-twitter"
config = PeftConfig.from_pretrained(model_name)
base_model = AutoModelForSequenceClassification.from_pretrained(
    config.base_model_name_or_path, num_labels=3
)
model = PeftModel.from_pretrained(base_model, model_name).merge_and_unload()
tokenizer = AutoTokenizer.from_pretrained(model_name)

def predict(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
    with torch.no_grad():
        outputs = model(**inputs)
        probs = torch.softmax(outputs.logits, dim=-1)
        pred = torch.argmax(probs).item()
    labels = {0: "NEGATIVE", 1: "NEUTRAL", 2: "POSITIVE"}
    return labels[pred], probs[0][pred].item()

# Test it!
text = input("Enter text: ")
sentiment, conf = predict(text)
print(f"→ {sentiment} ({conf:.1%})")

📊 Training Details

Training Hyperparameters

Parameter	Value
Base Model	distilbert-base-uncased
Learning Rate	2e-5
Batch Size	32
Epochs	3
Weight Decay	0.01
Max Sequence Length	128
Optimizer	AdamW
LR Scheduler	Linear

LoRA Configuration

Parameter	Value
LoRA Rank (r)	16
LoRA Alpha	32
LoRA Dropout	0.1
Target Modules	q_lin, v_lin
Trainable Parameters	~1.5M / 66M (2.3%)

Training Results

Epoch	Training Loss	Validation Loss	Accuracy	F1 Score
1	0.6845	0.7014	0.6805	0.6817
2	0.6841	0.6861	0.6925	0.6936
3	0.6718	0.6819	0.6975	0.6985

✅ Model converged successfully with decreasing loss and improving metrics!

📚 Training Data

The model was trained on the tweet_eval sentiment dataset:

Split	Samples
Training	45,615
Validation	2,000
Test	12,284

Dataset characteristics:

Short text (typical tweets: 10-50 words)
Informal language with emojis, hashtags, and mentions
Balanced across negative, neutral, and positive sentiments
Real-world social media data

💡 Intended Uses

✅ Recommended Uses

Social Media Monitoring: Analyze sentiment of tweets, posts, and comments
Customer Feedback Analysis: Classify product reviews and feedback
Brand Reputation Tracking: Monitor public opinion about brands
Market Research: Understand customer sentiment trends
Content Moderation: Flag potentially negative content
Academic Research: Study sentiment patterns in social media

⚠️ Limitations

Domain-specific: Trained on Twitter data; may not generalize well to:
- Formal documents (legal, academic)
- Long-form content (articles, essays)
- Domain-specific language (medical, technical)
English only: Not suitable for other languages
Context limitations:
- May struggle with sarcasm and irony
- Limited understanding of cultural context
- Can misinterpret complex or nuanced sentiments
Bias: May reflect biases present in Twitter data
Temporal: Trained on data up to 2024; may not capture emerging slang

❌ Out of Scope

Multi-lingual sentiment analysis
Emotion detection beyond positive/neutral/negative
Aspect-based sentiment analysis
Spam detection or content classification
Real-time critical decision making

🔧 Technical Details

Model Architecture

Base: DistilBERT (distilled version of BERT)
Layers: 6 transformer layers
Hidden Size: 768
Attention Heads: 12
Parameters: 66M total, ~1.5M trained (LoRA)
Classification Head: Linear layer (768 → 3)

Preprocessing

Tokenization: WordPiece tokenization
Max Length: 128 tokens
Padding: Dynamic padding to max length in batch
Truncation: Enabled for sequences > 128 tokens

Inference Speed

On GPU (T4):

Single prediction: ~10-15ms
Batch of 32: ~50-80ms

On CPU:

Single prediction: ~50-100ms
Batch of 32: ~500-800ms

🎓 Citation

If you use this model in your research or application, please cite:

@misc{seif2025distilbert-sentiment,
  author = {Seif Elislam},
  title = {DistilBERT Fine-tuned for Twitter Sentiment Analysis},
  year = {2025},
  publisher = {Hugging Face},
  journal = {Hugging Face Model Hub},
  howpublished = {\url{https://huggingface.co/SeifElislamm/distilbert-sentiment-twitter}}
}

📜 License

This model is released under the Apache 2.0 License. The base DistilBERT model is also Apache 2.0 licensed.

🙏 Acknowledgments

Base Model: DistilBERT by Hugging Face
Dataset: tweet_eval by Cardiff NLP
Framework: Hugging Face Transformers
PEFT: Hugging Face PEFT for LoRA implementation
Compute: Google Colab (free tier with T4 GPU)

📞 Contact

For questions or issues, please open an issue on the model's discussion page.

Model Card Authors: Seif Elislam
Last Updated: November 2025
Model Version: 1.0

Downloads last month: 1

Dataset used to train SeifElislamm/distilbert-sentiment-twitter

Evaluation results

Accuracy on tweet_eval
test set self-reported

0.678
F1 Score (weighted) on tweet_eval
test set self-reported

0.678