# Attention-based Sentiment Classifier

This model is an attention-based sentiment classification model that uses a bidirectional GRU with an attention mechanism to classify text sentiment as positive or negative.

## Model Description

- **Developed by:** Lantian Wei
- **Model type:** Sentiment Classification
- **Language(s):** English
- **License:** GNU General Public License v3.0
- **Finetuned from model:** Trained from scratch, using pre-trained BERT tokenizer

This sentiment classifier uses a bidirectional GRU architecture with an attention mechanism to focus on the most sentiment-relevant parts of a sentence. The model was trained on the SST-2 (Stanford Sentiment Treebank) dataset, a collection of movie reviews with binary sentiment labels.

### Model Architecture

- Embedding layer (100 dimensions)
- Bidirectional GRU (256 hidden dimensions)
- Attention mechanism
- Fully connected layers
- Output: 2 classes (positive/negative)

## Intended Uses & Limitations

### Intended Uses

- Sentiment analysis of short to medium-length English text
- Educational purposes to understand attention mechanisms
- Research on interpretability in NLP models

### Limitations

- Only trained on movie reviews, may not generalize to other domains
- Limited to English text
- Binary classification only (positive/negative)
- Not suitable for multi-lingual content
- Performance may degrade on texts significantly different from movie reviews

## Training Data

The model was trained on the SST-2 (Stanford Sentiment Treebank) dataset, which consists of movie reviews labeled as positive or negative. The dataset is commonly used as a benchmark for sentiment analysis models.

- Dataset: SST-2 from the GLUE benchmark
- Training examples: 30,000
- Validation examples: 500

## Training Procedure

### Training Hyperparameters

- Learning rate: 1e-3
- Epochs: 12
- Optimizer: Adam
- Loss function: Cross Entropy Loss
- Embedding dimension: 100
- Hidden dimension: 256
- Dropout: 0.3

## Evaluation Results

- Validation accuracy: [Insert your validation accuracy here]
- Test accuracy: [Insert your test accuracy here]

## Visualization Examples

One of the key features of this model is its interpretability through attention visualization. The model can output attention weights that highlight which parts of the input text it focused on to make its prediction.

![Attention Visualization Example](./assets/attention_visualization.png)

## Usage Examples

```python
from transformers import AutoTokenizer
from models.huggingface_model import SentimentClassifierForHuggingFace, SentimentClassifierConfig
import torch
import matplotlib.pyplot as plt
import seaborn as sns

# Load the model
config = SentimentClassifierConfig()
model = SentimentClassifierForHuggingFace(config)
model.load_state_dict(torch.load("path_to_weights.pth"))
model.eval()

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

# Function to make predictions with attention visualization
def predict_with_attention(text):
    # Tokenize
    tokens = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=128)
    input_ids = tokens["input_ids"]

    # Get prediction and attention weights
    with torch.no_grad():
        outputs = model(input_ids, return_attention=True, return_dict=True)

    logits = outputs["logits"]
    attention_weights = outputs["attention_weights"]

    # Get prediction and confidence
    probs = torch.nn.functional.softmax(logits, dim=1)
    prediction = torch.argmax(probs, dim=1).item()
    confidence = probs[0][prediction].item()
    sentiment = "Positive" if prediction == 1 else "Negative"

    # Visualize attention weights
    tokens_list = [tokenizer.convert_ids_to_tokens(id.item()) for id in input_ids[0]]

    # Plot attention heatmap
    plt.figure(figsize=(10, 2))
    sns.heatmap(
        attention_weights.squeeze(0).cpu().numpy(),
        cmap="YlOrRd",
        annot=True,
        fmt=".2f",
        cbar=False,
        xticklabels=tokens_list,
        yticklabels=["Attention"]
    )
    plt.title(f"Prediction: {sentiment} (Confidence: {confidence:.4f})")
    plt.tight_layout()
    plt.show()

    return {
        "text": text,
        "sentiment": sentiment,
        "confidence": confidence,
        "attention": attention_weights.squeeze(0).cpu().numpy()
    }

# Example usage
result = predict_with_attention("I absolutely loved this movie! The acting was superb.")
print(f"Sentiment: {result['sentiment']} (Confidence: {result['confidence']:.4f})")
```

## Citations

```
@inproceedings{socher2013recursive,
  title={Recursive deep models for semantic compositionality over a sentiment treebank},
  author={Socher, Richard and Perelygin, Alex and Wu, Jean and Chuang, Jason and Manning, Christopher D and Ng, Andrew Y and Potts, Christopher},
  booktitle={Proceedings of the 2013 conference on empirical methods in natural language processing},
  pages={1631--1642},
  year={2013}
}
```