File size: 4,974 Bytes
f3ff1d7 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 |
# Attention-based Sentiment Classifier
This model is an attention-based sentiment classification model that uses a bidirectional GRU with an attention mechanism to classify text sentiment as positive or negative.
## Model Description
- **Developed by:** Lantian Wei
- **Model type:** Sentiment Classification
- **Language(s):** English
- **License:** GNU General Public License v3.0
- **Finetuned from model:** Trained from scratch, using pre-trained BERT tokenizer
This sentiment classifier uses a bidirectional GRU architecture with an attention mechanism to focus on the most sentiment-relevant parts of a sentence. The model was trained on the SST-2 (Stanford Sentiment Treebank) dataset, a collection of movie reviews with binary sentiment labels.
### Model Architecture
- Embedding layer (100 dimensions)
- Bidirectional GRU (256 hidden dimensions)
- Attention mechanism
- Fully connected layers
- Output: 2 classes (positive/negative)
## Intended Uses & Limitations
### Intended Uses
- Sentiment analysis of short to medium-length English text
- Educational purposes to understand attention mechanisms
- Research on interpretability in NLP models
### Limitations
- Only trained on movie reviews, may not generalize to other domains
- Limited to English text
- Binary classification only (positive/negative)
- Not suitable for multi-lingual content
- Performance may degrade on texts significantly different from movie reviews
## Training Data
The model was trained on the SST-2 (Stanford Sentiment Treebank) dataset, which consists of movie reviews labeled as positive or negative. The dataset is commonly used as a benchmark for sentiment analysis models.
- Dataset: SST-2 from the GLUE benchmark
- Training examples: 30,000
- Validation examples: 500
## Training Procedure
### Training Hyperparameters
- Learning rate: 1e-3
- Epochs: 12
- Optimizer: Adam
- Loss function: Cross Entropy Loss
- Embedding dimension: 100
- Hidden dimension: 256
- Dropout: 0.3
## Evaluation Results
- Validation accuracy: [Insert your validation accuracy here]
- Test accuracy: [Insert your test accuracy here]
## Visualization Examples
One of the key features of this model is its interpretability through attention visualization. The model can output attention weights that highlight which parts of the input text it focused on to make its prediction.

## Usage Examples
```python
from transformers import AutoTokenizer
from models.huggingface_model import SentimentClassifierForHuggingFace, SentimentClassifierConfig
import torch
import matplotlib.pyplot as plt
import seaborn as sns
# Load the model
config = SentimentClassifierConfig()
model = SentimentClassifierForHuggingFace(config)
model.load_state_dict(torch.load("path_to_weights.pth"))
model.eval()
# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
# Function to make predictions with attention visualization
def predict_with_attention(text):
# Tokenize
tokens = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=128)
input_ids = tokens["input_ids"]
# Get prediction and attention weights
with torch.no_grad():
outputs = model(input_ids, return_attention=True, return_dict=True)
logits = outputs["logits"]
attention_weights = outputs["attention_weights"]
# Get prediction and confidence
probs = torch.nn.functional.softmax(logits, dim=1)
prediction = torch.argmax(probs, dim=1).item()
confidence = probs[0][prediction].item()
sentiment = "Positive" if prediction == 1 else "Negative"
# Visualize attention weights
tokens_list = [tokenizer.convert_ids_to_tokens(id.item()) for id in input_ids[0]]
# Plot attention heatmap
plt.figure(figsize=(10, 2))
sns.heatmap(
attention_weights.squeeze(0).cpu().numpy(),
cmap="YlOrRd",
annot=True,
fmt=".2f",
cbar=False,
xticklabels=tokens_list,
yticklabels=["Attention"]
)
plt.title(f"Prediction: {sentiment} (Confidence: {confidence:.4f})")
plt.tight_layout()
plt.show()
return {
"text": text,
"sentiment": sentiment,
"confidence": confidence,
"attention": attention_weights.squeeze(0).cpu().numpy()
}
# Example usage
result = predict_with_attention("I absolutely loved this movie! The acting was superb.")
print(f"Sentiment: {result['sentiment']} (Confidence: {result['confidence']:.4f})")
```
## Citations
```
@inproceedings{socher2013recursive,
title={Recursive deep models for semantic compositionality over a sentiment treebank},
author={Socher, Richard and Perelygin, Alex and Wu, Jean and Chuang, Jason and Manning, Christopher D and Ng, Andrew Y and Potts, Christopher},
booktitle={Proceedings of the 2013 conference on empirical methods in natural language processing},
pages={1631--1642},
year={2013}
}
```
|