attention-sentiment-classifier / README.md

Initial model upload

850ae65 verified 9 months ago

3.69 kB

	# Attention-based Sentiment Classifier

	This repository contains an attention-based sentiment classification model that demonstrates how attention mechanisms can enhance interpretability in NLP tasks.

	![Attention Visualization Example](./assets/attention_visualization.png)

	## Model Overview

	This model uses a bidirectional GRU with an attention mechanism to classify text sentiment (positive/negative). The attention mechanism allows the model to focus on the most relevant parts of the input text, providing insight into which words influence the classification the most.

	### Key Features

	- Bidirectional GRU architecture
	- Additive attention mechanism for interpretability
	- Binary sentiment classification (positive/negative)
	- Visualization tools for attention weights

	## Quick Start

	```python
	from transformers import pipeline
	import matplotlib.pyplot as plt
	import seaborn as sns

	# Load model directly from Hugging Face
	classifier = pipeline(
	"text-classification",
	model="ericwei/attention-sentiment-classifier"
	)

	# Standard prediction
	result = classifier("I absolutely loved this movie! The acting was superb.")
	print(f"Sentiment: {result[0]['label']}, Score: {result[0]['score']:.4f}")

	# For attention visualization, use the model directly
	from transformers import AutoTokenizer, AutoModel
	import torch

	tokenizer = AutoTokenizer.from_pretrained("ericwei/attention-sentiment-classifier")
	model = AutoModel.from_pretrained("weicwei/attention-sentiment-classifier")

	text = "I absolutely loved this movie! The acting was superb."
	inputs = tokenizer(text, return_tensors="pt")

	# Get prediction with attention weights
	model.eval()
	with torch.no_grad():
	outputs = model(inputs["input_ids"], return_attention=True, return_dict=True)

	# Get prediction results
	logits = outputs["logits"]
	attention_weights = outputs["attention_weights"]

	# Visualize attention
	tokens = [tokenizer.convert_ids_to_tokens(id.item()) for id in inputs["input_ids"][0]]

	plt.figure(figsize=(10, 2))
	sns.heatmap(
	attention_weights.squeeze(0).cpu().numpy().reshape(1, -1),
	cmap="YlOrRd",
	annot=True,
	fmt=".2f",
	cbar=False,
	xticklabels=tokens,
	yticklabels=["Attention"]
	)
	plt.xticks(rotation=45, ha="right", rotation_mode="anchor")
	plt.title("Attention Weights Visualization")
	plt.tight_layout()
	plt.show()
	```

	## Demo App

	This model includes a Streamlit demo app that can be launched directly on Hugging Face Spaces.

	## Model Architecture

	The model consists of:

	1. Embedding Layer: Converts token IDs to dense vectors
	2. Bidirectional GRU: Processes the text in both directions
	3. Attention Mechanism: Focuses on the most relevant parts of the text
	4. Classifier Head: Makes the final sentiment prediction

	## Training

	The model was trained on the SST-2 (Stanford Sentiment Treebank) dataset using the following hyperparameters:

	- Learning rate: 1e-3
	- Epochs: 12
	- Optimizer: Adam
	- Loss function: Cross Entropy Loss
	- Embedding dimension: 100
	- Hidden dimension: 256

	## Limitations

	- Only trained on movie reviews, may not generalize to other domains
	- Limited to English text
	- Binary classification only (positive/negative)
	- Not suitable for multi-lingual content
	- Performance may degrade on texts significantly different from movie reviews

	## Citation

	If you use this model, please cite:

	```
	@misc{attention-sentiment-classifier,
	author = {Lantian Wei},
	title = {Attention-based Sentiment Classifier},
	year = {2025},
	publisher = {Hugging Face},
	howpublished = {\url{https://huggingface.co/your-username/attention-sentiment-classifier}}
	}
	```

	## License

	This model is licensed under the GNU General Public License v3.0.