mteb/tweet_sentiment_extraction
Viewer • Updated • 30.2k • 3.84k • 38
How to use pascalrai/hinglish-twitter-roberta-base-sentiment with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-classification", model="pascalrai/hinglish-twitter-roberta-base-sentiment") # Load model directly
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("pascalrai/hinglish-twitter-roberta-base-sentiment")
model = AutoModelForSequenceClassification.from_pretrained("pascalrai/hinglish-twitter-roberta-base-sentiment")# Load model directly
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("pascalrai/hinglish-twitter-roberta-base-sentiment")
model = AutoModelForSequenceClassification.from_pretrained("pascalrai/hinglish-twitter-roberta-base-sentiment")The model is more optimized for hinglish + emojis and emojis seem to take more attention than the hinglish words. This may be due to the base model being trained for emoji classification and then later trained for sentiment analysis.
This model is better if emojis are to be also included for sentiment analysis. No Evaluation is done for data with only text and no emojis.
The model was fine-tuned with the dataset: mteb/tweet_sentiment_extraction from hugging face converted to Hinglish text.
The model has a test loss of 0.6 and an f1 score of 0.74 on the unseen data from the dataset.
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-classification", model="pascalrai/hinglish-twitter-roberta-base-sentiment")
pipe("tu mujhe pasandh heh")
[{'label': 'positive', 'score': 0.7615439891815186}]
# Load model directly
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
tokenizer = AutoTokenizer.from_pretrained("pascalrai/hinglish-twitter-roberta-base-sentiment")
model = AutoModelForSequenceClassification.from_pretrained("pascalrai/hinglish-twitter-roberta-base-sentiment")
inputs = ["tum kon ho bhai","tu mujhe pasandh heh"]
outputs = model(**tokenizer(inputs, return_tensors='pt', padding=True))
p = torch.nn.Softmax(dim = 1)(outputs.logits)
for index, each in enumerate(p.detach().numpy()):
print(f"Text: {inputs[index]}")
print(f"Negative: {round(float(each[0]),2)}\nNeutral: {round(float(each[1]),2)}\nPositive: {round(float(each[2]),2)}\n")
Text: tum kon ho bhai
Negative: 0.02
Neutral: 0.91
Positive: 0.07
Text: tu mujhe pasandh heh
Negative: 0.01
Neutral: 0.22
Positive: 0.76
Possible Future Direction:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="pascalrai/hinglish-twitter-roberta-base-sentiment")