Spaces:

Jet-12138
/

CommentResponse

Runtime error

App Files Files Community

CommentResponse / README.md

Jet-12138

Update README.md

e9007ce verified 9 months ago

preview code

raw

history blame contribute delete

3.2 kB

	---
	title: Comment Sentiment and Toxicity Classifier
	emoji: 📝
	colorFrom: blue
	colorTo: green
	sdk: gradio
	sdk_version: 5.27.0
	app_file: app.py
	pinned: false
	---

	# Comment Sentiment and Toxicity Classifier

	This Space utilises a custom fine-tuned BERT model to classify the sentiment and toxicity of comments. Developed for academic purposes in Australia.

	# Comment MTL BERT Model

	This is a BERT-based multi-task learning model capable of performing sentiment analysis and toxicity detection simultaneously.

	## Model Architecture

	The model is based on the `bert-base-uncased` pre-trained model with two separate classification heads:
	- Sentiment Analysis Head: 3-class classification (Negative, Neutral, Positive)
	- Toxicity Detection Head: 6-class multi-label classification (toxic, severe_toxic, obscene, threat, insult, identity_hate)

	### Technical Parameters

	- Hidden size: 768
	- Number of attention heads: 12
	- Number of hidden layers: 12
	- Vocabulary size: 30522
	- Maximum position embeddings: 512
	- Hidden activation function: gelu
	- Dropout probability: 0.1

	## Usage

	### Loading the Model

	```python
	from transformers import AutoTokenizer
	from src.model import CommentMTLModel
	import torch

	# Load tokenizer
	tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

	# Load model
	model = CommentMTLModel(
	model_name="bert-base-uncased",
	num_sentiment_labels=3,
	num_toxicity_labels=6
	)

	# Load pre-trained weights
	state_dict = torch.load("model.bin", map_location=torch.device('cpu'))
	model.load_state_dict(state_dict)
	model.eval()
	```

	### Model Inference

	```python
	# Prepare input
	text = "This is a test comment."
	inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=128)

	# Model inference
	with torch.no_grad():
	outputs = model(input_ids=inputs["input_ids"], attention_mask=inputs["attention_mask"])

	# Get results
	sentiment_logits = outputs["sentiment_logits"]
	toxicity_logits = outputs["toxicity_logits"]

	# Process sentiment analysis results
	sentiment_probs = torch.softmax(sentiment_logits, dim=1)
	sentiment_labels = {0: "Negative", 1: "Neutral", 2: "Positive"}
	sentiment_prediction = sentiment_labels[sentiment_probs.argmax().item()]

	# Process toxicity detection results
	toxicity_probs = torch.sigmoid(toxicity_logits)
	toxicity_cols = ["toxic", "severe_toxic", "obscene", "threat", "insult", "identity_hate"]
	toxicity_results = {label: prob.item() for label, prob in zip(toxicity_cols, toxicity_probs[0])}

	print(f"Sentiment: {sentiment_prediction}")
	print(f"Toxicity probabilities: {toxicity_results}")
	```

	## Limitations

	- This model was trained on English data only and is not suitable for other languages.
	- The toxicity detection may produce false positives or negatives in edge cases.
	- The model may lose information when processing long texts as the maximum input length is limited to 128 tokens.

	## Citation

	If you use this model, please cite our repository:

	```
	@misc{comment-mtl-bert,
	author = {Aseem},
	title = {Comment MTL BERT: Multi-Task Learning for Comment Analysis},
	year = {2023},
	publisher = {GitHub},
	url = {https://huggingface.co/Aseemks07/comment_mtl_bert_best}
	}
	```