Spaces:

Jet-12138
/

CommentResponse

Runtime error

App Files Files Community

CommentResponse / README.md

Jet-12138

Update README.md

e9007ce verified 9 months ago

preview code

raw

history blame contribute delete

3.2 kB

A newer version of the Gradio SDK is available: 6.5.1

Upgrade

metadata

title: Comment Sentiment and Toxicity Classifier
emoji: 📝
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 5.27.0
app_file: app.py
pinned: false

Comment Sentiment and Toxicity Classifier

This Space utilises a custom fine-tuned BERT model to classify the sentiment and toxicity of comments. Developed for academic purposes in Australia.

Comment MTL BERT Model

This is a BERT-based multi-task learning model capable of performing sentiment analysis and toxicity detection simultaneously.

Model Architecture

The model is based on the bert-base-uncased pre-trained model with two separate classification heads:

Sentiment Analysis Head: 3-class classification (Negative, Neutral, Positive)
Toxicity Detection Head: 6-class multi-label classification (toxic, severe_toxic, obscene, threat, insult, identity_hate)

Technical Parameters

Hidden size: 768
Number of attention heads: 12
Number of hidden layers: 12
Vocabulary size: 30522
Maximum position embeddings: 512
Hidden activation function: gelu
Dropout probability: 0.1

Usage

Loading the Model

from transformers import AutoTokenizer
from src.model import CommentMTLModel
import torch

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

# Load model
model = CommentMTLModel(
    model_name="bert-base-uncased",
    num_sentiment_labels=3,
    num_toxicity_labels=6
)

# Load pre-trained weights
state_dict = torch.load("model.bin", map_location=torch.device('cpu'))
model.load_state_dict(state_dict)
model.eval()

Model Inference

# Prepare input
text = "This is a test comment."
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=128)

# Model inference
with torch.no_grad():
    outputs = model(input_ids=inputs["input_ids"], attention_mask=inputs["attention_mask"])

# Get results
sentiment_logits = outputs["sentiment_logits"]
toxicity_logits = outputs["toxicity_logits"]

# Process sentiment analysis results
sentiment_probs = torch.softmax(sentiment_logits, dim=1)
sentiment_labels = {0: "Negative", 1: "Neutral", 2: "Positive"}
sentiment_prediction = sentiment_labels[sentiment_probs.argmax().item()]

# Process toxicity detection results
toxicity_probs = torch.sigmoid(toxicity_logits)
toxicity_cols = ["toxic", "severe_toxic", "obscene", "threat", "insult", "identity_hate"]
toxicity_results = {label: prob.item() for label, prob in zip(toxicity_cols, toxicity_probs[0])}

print(f"Sentiment: {sentiment_prediction}")
print(f"Toxicity probabilities: {toxicity_results}")

Limitations

This model was trained on English data only and is not suitable for other languages.
The toxicity detection may produce false positives or negatives in edge cases.
The model may lose information when processing long texts as the maximum input length is limited to 128 tokens.

Citation

If you use this model, please cite our repository:

@misc{comment-mtl-bert,
  author = {Aseem},
  title = {Comment MTL BERT: Multi-Task Learning for Comment Analysis},
  year = {2023},
  publisher = {GitHub},
  url = {https://huggingface.co/Aseemks07/comment_mtl_bert_best}
}