# Attention-based Sentiment Classifier This repository contains an attention-based sentiment classification model that demonstrates how attention mechanisms can enhance interpretability in NLP tasks. ![Attention Visualization Example](./assets/attention_visualization.png) ## Model Overview This model uses a bidirectional GRU with an attention mechanism to classify text sentiment (positive/negative). The attention mechanism allows the model to focus on the most relevant parts of the input text, providing insight into which words influence the classification the most. ### Key Features - Bidirectional GRU architecture - Additive attention mechanism for interpretability - Binary sentiment classification (positive/negative) - Visualization tools for attention weights ## Quick Start ```python from transformers import pipeline import matplotlib.pyplot as plt import seaborn as sns # Load model directly from Hugging Face classifier = pipeline( "text-classification", model="ericwei/attention-sentiment-classifier" ) # Standard prediction result = classifier("I absolutely loved this movie! The acting was superb.") print(f"Sentiment: {result[0]['label']}, Score: {result[0]['score']:.4f}") # For attention visualization, use the model directly from transformers import AutoTokenizer, AutoModel import torch tokenizer = AutoTokenizer.from_pretrained("ericwei/attention-sentiment-classifier") model = AutoModel.from_pretrained("weicwei/attention-sentiment-classifier") text = "I absolutely loved this movie! The acting was superb." inputs = tokenizer(text, return_tensors="pt") # Get prediction with attention weights model.eval() with torch.no_grad(): outputs = model(inputs["input_ids"], return_attention=True, return_dict=True) # Get prediction results logits = outputs["logits"] attention_weights = outputs["attention_weights"] # Visualize attention tokens = [tokenizer.convert_ids_to_tokens(id.item()) for id in inputs["input_ids"][0]] plt.figure(figsize=(10, 2)) sns.heatmap( attention_weights.squeeze(0).cpu().numpy().reshape(1, -1), cmap="YlOrRd", annot=True, fmt=".2f", cbar=False, xticklabels=tokens, yticklabels=["Attention"] ) plt.xticks(rotation=45, ha="right", rotation_mode="anchor") plt.title("Attention Weights Visualization") plt.tight_layout() plt.show() ``` ## Demo App This model includes a Streamlit demo app that can be launched directly on Hugging Face Spaces. ## Model Architecture The model consists of: 1. **Embedding Layer**: Converts token IDs to dense vectors 2. **Bidirectional GRU**: Processes the text in both directions 3. **Attention Mechanism**: Focuses on the most relevant parts of the text 4. **Classifier Head**: Makes the final sentiment prediction ## Training The model was trained on the SST-2 (Stanford Sentiment Treebank) dataset using the following hyperparameters: - Learning rate: 1e-3 - Epochs: 12 - Optimizer: Adam - Loss function: Cross Entropy Loss - Embedding dimension: 100 - Hidden dimension: 256 ## Limitations - Only trained on movie reviews, may not generalize to other domains - Limited to English text - Binary classification only (positive/negative) - Not suitable for multi-lingual content - Performance may degrade on texts significantly different from movie reviews ## Citation If you use this model, please cite: ``` @misc{attention-sentiment-classifier, author = {Lantian Wei}, title = {Attention-based Sentiment Classifier}, year = {2025}, publisher = {Hugging Face}, howpublished = {\url{https://huggingface.co/your-username/attention-sentiment-classifier}} } ``` ## License This model is licensed under the GNU General Public License v3.0.