# Attention-based Sentiment Classifier

This repository contains an attention-based sentiment classification model that demonstrates how attention mechanisms can enhance interpretability in NLP tasks.

![Attention Visualization Example](./assets/attention_visualization.png)

## Model Overview

This model uses a bidirectional GRU with an attention mechanism to classify text sentiment (positive/negative). The attention mechanism allows the model to focus on the most relevant parts of the input text, providing insight into which words influence the classification the most.

### Key Features

- Bidirectional GRU architecture
- Additive attention mechanism for interpretability
- Binary sentiment classification (positive/negative)
- Visualization tools for attention weights

## Quick Start

```python
from transformers import pipeline
import matplotlib.pyplot as plt
import seaborn as sns

# Load model directly from Hugging Face
classifier = pipeline(
    "text-classification",
    model="ericwei/attention-sentiment-classifier"
)

# Standard prediction
result = classifier("I absolutely loved this movie! The acting was superb.")
print(f"Sentiment: {result[0]['label']}, Score: {result[0]['score']:.4f}")

# For attention visualization, use the model directly
from transformers import AutoTokenizer, AutoModel
import torch

tokenizer = AutoTokenizer.from_pretrained("ericwei/attention-sentiment-classifier")
model = AutoModel.from_pretrained("weicwei/attention-sentiment-classifier")

text = "I absolutely loved this movie! The acting was superb."
inputs = tokenizer(text, return_tensors="pt")

# Get prediction with attention weights
model.eval()
with torch.no_grad():
    outputs = model(inputs["input_ids"], return_attention=True, return_dict=True)

# Get prediction results
logits = outputs["logits"]
attention_weights = outputs["attention_weights"]

# Visualize attention
tokens = [tokenizer.convert_ids_to_tokens(id.item()) for id in inputs["input_ids"][0]]

plt.figure(figsize=(10, 2))
sns.heatmap(
    attention_weights.squeeze(0).cpu().numpy().reshape(1, -1),
    cmap="YlOrRd",
    annot=True,
    fmt=".2f",
    cbar=False,
    xticklabels=tokens,
    yticklabels=["Attention"]
)
plt.xticks(rotation=45, ha="right", rotation_mode="anchor")
plt.title("Attention Weights Visualization")
plt.tight_layout()
plt.show()
```

## Demo App

This model includes a Streamlit demo app that can be launched directly on Hugging Face Spaces.

## Model Architecture

The model consists of:

1. **Embedding Layer**: Converts token IDs to dense vectors
2. **Bidirectional GRU**: Processes the text in both directions
3. **Attention Mechanism**: Focuses on the most relevant parts of the text
4. **Classifier Head**: Makes the final sentiment prediction

## Training

The model was trained on the SST-2 (Stanford Sentiment Treebank) dataset using the following hyperparameters:

- Learning rate: 1e-3
- Epochs: 12
- Optimizer: Adam
- Loss function: Cross Entropy Loss
- Embedding dimension: 100
- Hidden dimension: 256

## Limitations

- Only trained on movie reviews, may not generalize to other domains
- Limited to English text
- Binary classification only (positive/negative)
- Not suitable for multi-lingual content
- Performance may degrade on texts significantly different from movie reviews

## Citation

If you use this model, please cite:

```
@misc{attention-sentiment-classifier,
  author = {Lantian Wei},
  title = {Attention-based Sentiment Classifier},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/your-username/attention-sentiment-classifier}}
}
```

## License

This model is licensed under the GNU General Public License v3.0.