🐦 Tweet Tone Classifier

A fine-tuned DistilBERT model for binary sentiment classification of tweets β€” predicts whether a tweet is Positive or Negative.

Part of a larger project that also rewrites tweets in different tones (formal, casual, empathetic, assertive) using the Gemini API.


πŸ“Š Model Details

Property Details
Base model distilbert-base-uncased
Task Binary Sentiment Classification
Dataset Sentiment140 (50,000 samples)
Training epochs 3
Batch size 32
Max token length 64
Accuracy ~87%
Language English

πŸš€ Quick Start

Installation

pip install transformers torch

Using the pipeline (easiest)

from transformers import pipeline

classifier = pipeline( "text-classification", model="KinSlay3rs/tweet-tone-classifier" )

result = classifier("I can't believe my flight got cancelled again!!") print(result)

[{'label': 'NEGATIVE', 'score': 0.97}]

result = classifier("Just got promoted!! Best day ever πŸŽ‰") print(result)

[{'label': 'POSITIVE', 'score': 0.98}]

Using the model directly

from transformers import DistilBertTokenizerFast, DistilBertForSequenceClassification import torch

tokenizer = DistilBertTokenizerFast.from_pretrained("KinSlay3rs/tweet-tone-classifier") model = DistilBertForSequenceClassification.from_pretrained("KinSlay3rs/tweet-tone-classifier") model.eval()

LABELS = {0: "NEGATIVE", 1: "POSITIVE"}

def predict(tweet: str) -> str: inputs = tokenizer(tweet, return_tensors="pt", truncation=True, max_length=64) with torch.no_grad(): logits = model(**inputs).logits label = LABELS[logits.argmax().item()] score = torch.softmax(logits, dim=1).max().item() return f"{label} (confidence: {score:.2f})"

print(predict("This is the worst experience I've ever had."))

NEGATIVE (confidence: 0.96)

print(predict("Absolutely loving the new update!"))

POSITIVE (confidence: 0.94)


πŸ“ Dataset

Trained on a 50,000 sample subset of the Sentiment140 dataset, which contains 1.6 million tweets labelled as positive or negative.

Preprocessing applied:

  • Removed URLs (http://...)
  • Removed Twitter handles (@username)
  • Removed special characters
  • Truncated to 64 tokens

πŸ‹οΈ Training Details

from transformers import TrainingArguments

args = TrainingArguments( output_dir="./results", num_train_epochs=3, per_device_train_batch_size=32, eval_strategy="epoch", save_strategy="epoch", load_best_model_at_end=True, )

Hardware used: Intel i5 7th Gen CPU / Kaggle T4 GPU
Training time: ~25 minutes on GPU


⚠️ Limitations

  • Trained only on English tweets β€” may not generalize to other languages
  • Sarcasm and irony are often misclassified (a known challenge in sentiment analysis)
  • Trained on tweets from 2009 β€” modern slang and emojis may reduce accuracy
  • Only binary classification β€” does not detect neutral sentiment

πŸ”­ Future Work

  • Add neutral class (3-class classification)
  • Train on more recent tweet data
  • Add emoji-aware preprocessing
  • Multilingual support using xlm-roberta-base

πŸ“¦ Full Project

This model is part of the Tweet Tone Classifier & Rewriter project which includes:

  • βœ… Sentiment classification (this model)
  • βœ… Tone rewriting using Gemini API (formal / casual / empathetic / assertive)
  • βœ… Gradio web interface
  • βœ… Deployed on Hugging Face Spaces

πŸ”— GitHub: github.com/KinSlay3rS/GenAI-Projects/Sentement-Analysis-DistilBERT
πŸ”— Live Demo: huggingface.co/spaces/KinSlay3rs/tweet-tone-classifier


πŸ™‹ Author

Made by KinSlay3rs
πŸ”— Hugging Face Profile

Downloads last month
4
Safetensors
Model size
67M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train KinSlay3rs/tweet-tone-classifier

Space using KinSlay3rs/tweet-tone-classifier 1

Evaluation results