# Attention-based Sentiment Classifier This model is an attention-based sentiment classification model that uses a bidirectional GRU with an attention mechanism to classify text sentiment as positive or negative. ## Model Description - **Developed by:** Lantian Wei - **Model type:** Sentiment Classification - **Language(s):** English - **License:** GNU General Public License v3.0 - **Finetuned from model:** Trained from scratch, using pre-trained BERT tokenizer This sentiment classifier uses a bidirectional GRU architecture with an attention mechanism to focus on the most sentiment-relevant parts of a sentence. The model was trained on the SST-2 (Stanford Sentiment Treebank) dataset, a collection of movie reviews with binary sentiment labels. ### Model Architecture - Embedding layer (100 dimensions) - Bidirectional GRU (256 hidden dimensions) - Attention mechanism - Fully connected layers - Output: 2 classes (positive/negative) ## Intended Uses & Limitations ### Intended Uses - Sentiment analysis of short to medium-length English text - Educational purposes to understand attention mechanisms - Research on interpretability in NLP models ### Limitations - Only trained on movie reviews, may not generalize to other domains - Limited to English text - Binary classification only (positive/negative) - Not suitable for multi-lingual content - Performance may degrade on texts significantly different from movie reviews ## Training Data The model was trained on the SST-2 (Stanford Sentiment Treebank) dataset, which consists of movie reviews labeled as positive or negative. The dataset is commonly used as a benchmark for sentiment analysis models. - Dataset: SST-2 from the GLUE benchmark - Training examples: 30,000 - Validation examples: 500 ## Training Procedure ### Training Hyperparameters - Learning rate: 1e-3 - Epochs: 12 - Optimizer: Adam - Loss function: Cross Entropy Loss - Embedding dimension: 100 - Hidden dimension: 256 - Dropout: 0.3 ## Evaluation Results - Validation accuracy: [Insert your validation accuracy here] - Test accuracy: [Insert your test accuracy here] ## Visualization Examples One of the key features of this model is its interpretability through attention visualization. The model can output attention weights that highlight which parts of the input text it focused on to make its prediction. ![Attention Visualization Example](./assets/attention_visualization.png) ## Usage Examples ```python from transformers import AutoTokenizer from models.huggingface_model import SentimentClassifierForHuggingFace, SentimentClassifierConfig import torch import matplotlib.pyplot as plt import seaborn as sns # Load the model config = SentimentClassifierConfig() model = SentimentClassifierForHuggingFace(config) model.load_state_dict(torch.load("path_to_weights.pth")) model.eval() # Load the tokenizer tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased") # Function to make predictions with attention visualization def predict_with_attention(text): # Tokenize tokens = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=128) input_ids = tokens["input_ids"] # Get prediction and attention weights with torch.no_grad(): outputs = model(input_ids, return_attention=True, return_dict=True) logits = outputs["logits"] attention_weights = outputs["attention_weights"] # Get prediction and confidence probs = torch.nn.functional.softmax(logits, dim=1) prediction = torch.argmax(probs, dim=1).item() confidence = probs[0][prediction].item() sentiment = "Positive" if prediction == 1 else "Negative" # Visualize attention weights tokens_list = [tokenizer.convert_ids_to_tokens(id.item()) for id in input_ids[0]] # Plot attention heatmap plt.figure(figsize=(10, 2)) sns.heatmap( attention_weights.squeeze(0).cpu().numpy(), cmap="YlOrRd", annot=True, fmt=".2f", cbar=False, xticklabels=tokens_list, yticklabels=["Attention"] ) plt.title(f"Prediction: {sentiment} (Confidence: {confidence:.4f})") plt.tight_layout() plt.show() return { "text": text, "sentiment": sentiment, "confidence": confidence, "attention": attention_weights.squeeze(0).cpu().numpy() } # Example usage result = predict_with_attention("I absolutely loved this movie! The acting was superb.") print(f"Sentiment: {result['sentiment']} (Confidence: {result['confidence']:.4f})") ``` ## Citations ``` @inproceedings{socher2013recursive, title={Recursive deep models for semantic compositionality over a sentiment treebank}, author={Socher, Richard and Perelygin, Alex and Wu, Jean and Chuang, Jason and Manning, Christopher D and Ng, Andrew Y and Potts, Christopher}, booktitle={Proceedings of the 2013 conference on empirical methods in natural language processing}, pages={1631--1642}, year={2013} } ```