| # Attention-based Sentiment Classifier | |
| This repository contains an attention-based sentiment classification model that demonstrates how attention mechanisms can enhance interpretability in NLP tasks. | |
|  | |
| ## Model Overview | |
| This model uses a bidirectional GRU with an attention mechanism to classify text sentiment (positive/negative). The attention mechanism allows the model to focus on the most relevant parts of the input text, providing insight into which words influence the classification the most. | |
| ### Key Features | |
| - Bidirectional GRU architecture | |
| - Additive attention mechanism for interpretability | |
| - Binary sentiment classification (positive/negative) | |
| - Visualization tools for attention weights | |
| ## Quick Start | |
| ```python | |
| from transformers import pipeline | |
| import matplotlib.pyplot as plt | |
| import seaborn as sns | |
| # Load model directly from Hugging Face | |
| classifier = pipeline( | |
| "text-classification", | |
| model="ericwei/attention-sentiment-classifier" | |
| ) | |
| # Standard prediction | |
| result = classifier("I absolutely loved this movie! The acting was superb.") | |
| print(f"Sentiment: {result[0]['label']}, Score: {result[0]['score']:.4f}") | |
| # For attention visualization, use the model directly | |
| from transformers import AutoTokenizer, AutoModel | |
| import torch | |
| tokenizer = AutoTokenizer.from_pretrained("ericwei/attention-sentiment-classifier") | |
| model = AutoModel.from_pretrained("weicwei/attention-sentiment-classifier") | |
| text = "I absolutely loved this movie! The acting was superb." | |
| inputs = tokenizer(text, return_tensors="pt") | |
| # Get prediction with attention weights | |
| model.eval() | |
| with torch.no_grad(): | |
| outputs = model(inputs["input_ids"], return_attention=True, return_dict=True) | |
| # Get prediction results | |
| logits = outputs["logits"] | |
| attention_weights = outputs["attention_weights"] | |
| # Visualize attention | |
| tokens = [tokenizer.convert_ids_to_tokens(id.item()) for id in inputs["input_ids"][0]] | |
| plt.figure(figsize=(10, 2)) | |
| sns.heatmap( | |
| attention_weights.squeeze(0).cpu().numpy().reshape(1, -1), | |
| cmap="YlOrRd", | |
| annot=True, | |
| fmt=".2f", | |
| cbar=False, | |
| xticklabels=tokens, | |
| yticklabels=["Attention"] | |
| ) | |
| plt.xticks(rotation=45, ha="right", rotation_mode="anchor") | |
| plt.title("Attention Weights Visualization") | |
| plt.tight_layout() | |
| plt.show() | |
| ``` | |
| ## Demo App | |
| This model includes a Streamlit demo app that can be launched directly on Hugging Face Spaces. | |
| ## Model Architecture | |
| The model consists of: | |
| 1. **Embedding Layer**: Converts token IDs to dense vectors | |
| 2. **Bidirectional GRU**: Processes the text in both directions | |
| 3. **Attention Mechanism**: Focuses on the most relevant parts of the text | |
| 4. **Classifier Head**: Makes the final sentiment prediction | |
| ## Training | |
| The model was trained on the SST-2 (Stanford Sentiment Treebank) dataset using the following hyperparameters: | |
| - Learning rate: 1e-3 | |
| - Epochs: 12 | |
| - Optimizer: Adam | |
| - Loss function: Cross Entropy Loss | |
| - Embedding dimension: 100 | |
| - Hidden dimension: 256 | |
| ## Limitations | |
| - Only trained on movie reviews, may not generalize to other domains | |
| - Limited to English text | |
| - Binary classification only (positive/negative) | |
| - Not suitable for multi-lingual content | |
| - Performance may degrade on texts significantly different from movie reviews | |
| ## Citation | |
| If you use this model, please cite: | |
| ``` | |
| @misc{attention-sentiment-classifier, | |
| author = {Lantian Wei}, | |
| title = {Attention-based Sentiment Classifier}, | |
| year = {2025}, | |
| publisher = {Hugging Face}, | |
| howpublished = {\url{https://huggingface.co/your-username/attention-sentiment-classifier}} | |
| } | |
| ``` | |
| ## License | |
| This model is licensed under the GNU General Public License v3.0. | |