Initial model upload

Browse files

Files changed (15) hide show

.gitattributes +3 -34
MODEL_CARD.md +147 -0
README.md +121 -0
app.py +129 -0
assets/attention_visualization.png +3 -0
assets/example_negative.png +0 -0
assets/exmaple_positive.png +0 -0
config.json +16 -0
inference_example.py +34 -0
model-index.json +15 -0
model.safetensors +3 -0
special_tokens_map.json +7 -0
tokenizer.json +0 -0
tokenizer_config.json +56 -0
vocab.txt +0 -0

.gitattributes CHANGED Viewed

@@ -1,35 +1,4 @@
-*.7z filter=lfs diff=lfs merge=lfs -text
-*.arrow filter=lfs diff=lfs merge=lfs -text
-*.bin filter=lfs diff=lfs merge=lfs -text
-*.bz2 filter=lfs diff=lfs merge=lfs -text
-*.ckpt filter=lfs diff=lfs merge=lfs -text
-*.ftz filter=lfs diff=lfs merge=lfs -text
-*.gz filter=lfs diff=lfs merge=lfs -text
-*.h5 filter=lfs diff=lfs merge=lfs -text
-*.joblib filter=lfs diff=lfs merge=lfs -text
-*.lfs.* filter=lfs diff=lfs merge=lfs -text
-*.mlmodel filter=lfs diff=lfs merge=lfs -text
-*.model filter=lfs diff=lfs merge=lfs -text
-*.msgpack filter=lfs diff=lfs merge=lfs -text
-*.npy filter=lfs diff=lfs merge=lfs -text
-*.npz filter=lfs diff=lfs merge=lfs -text
-*.onnx filter=lfs diff=lfs merge=lfs -text
-*.ot filter=lfs diff=lfs merge=lfs -text
-*.parquet filter=lfs diff=lfs merge=lfs -text
-*.pb filter=lfs diff=lfs merge=lfs -text
-*.pickle filter=lfs diff=lfs merge=lfs -text
-*.pkl filter=lfs diff=lfs merge=lfs -text
-*.pt filter=lfs diff=lfs merge=lfs -text
 *.pth filter=lfs diff=lfs merge=lfs -text
-*.rar filter=lfs diff=lfs merge=lfs -text
-*.safetensors filter=lfs diff=lfs merge=lfs -text
-saved_model/**/* filter=lfs diff=lfs merge=lfs -text
-*.tar.* filter=lfs diff=lfs merge=lfs -text
-*.tar filter=lfs diff=lfs merge=lfs -text
-*.tflite filter=lfs diff=lfs merge=lfs -text
-*.tgz filter=lfs diff=lfs merge=lfs -text
-*.wasm filter=lfs diff=lfs merge=lfs -text
-*.xz filter=lfs diff=lfs merge=lfs -text
-*.zip filter=lfs diff=lfs merge=lfs -text
-*.zst filter=lfs diff=lfs merge=lfs -text
-*tfevents* filter=lfs diff=lfs merge=lfs -text

 *.pth filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+assets/attention_visualization.png filter=lfs diff=lfs merge=lfs -text
+model.safetensors filter=lfs diff=lfs merge=lfs -text

MODEL_CARD.md ADDED Viewed

	@@ -0,0 +1,147 @@

+# Attention-based Sentiment Classifier
+This model is an attention-based sentiment classification model that uses a bidirectional GRU with an attention mechanism to classify text sentiment as positive or negative.
+## Model Description
+- **Developed by:** Lantian Wei
+- **Model type:** Sentiment Classification
+- **Language(s):** English
+- **License:** GNU General Public License v3.0
+- **Finetuned from model:** Trained from scratch, using pre-trained BERT tokenizer
+This sentiment classifier uses a bidirectional GRU architecture with an attention mechanism to focus on the most sentiment-relevant parts of a sentence. The model was trained on the SST-2 (Stanford Sentiment Treebank) dataset, a collection of movie reviews with binary sentiment labels.
+### Model Architecture
+- Embedding layer (100 dimensions)
+- Bidirectional GRU (256 hidden dimensions)
+- Attention mechanism
+- Fully connected layers
+- Output: 2 classes (positive/negative)
+## Intended Uses & Limitations
+### Intended Uses
+- Sentiment analysis of short to medium-length English text
+- Educational purposes to understand attention mechanisms
+- Research on interpretability in NLP models
+### Limitations
+- Only trained on movie reviews, may not generalize to other domains
+- Limited to English text
+- Binary classification only (positive/negative)
+- Not suitable for multi-lingual content
+- Performance may degrade on texts significantly different from movie reviews
+## Training Data
+The model was trained on the SST-2 (Stanford Sentiment Treebank) dataset, which consists of movie reviews labeled as positive or negative. The dataset is commonly used as a benchmark for sentiment analysis models.
+- Dataset: SST-2 from the GLUE benchmark
+- Training examples: 30,000
+- Validation examples: 500
+## Training Procedure
+### Training Hyperparameters
+- Learning rate: 1e-3
+- Epochs: 12
+- Optimizer: Adam
+- Loss function: Cross Entropy Loss
+- Embedding dimension: 100
+- Hidden dimension: 256
+- Dropout: 0.3
+## Evaluation Results
+- Validation accuracy: [Insert your validation accuracy here]
+- Test accuracy: [Insert your test accuracy here]
+## Visualization Examples
+One of the key features of this model is its interpretability through attention visualization. The model can output attention weights that highlight which parts of the input text it focused on to make its prediction.
+![Attention Visualization Example](./assets/attention_visualization.png)
+## Usage Examples
+```python
+from transformers import AutoTokenizer
+from models.huggingface_model import SentimentClassifierForHuggingFace, SentimentClassifierConfig
+import torch
+import matplotlib.pyplot as plt
+import seaborn as sns
+# Load the model
+config = SentimentClassifierConfig()
+model = SentimentClassifierForHuggingFace(config)
+model.load_state_dict(torch.load("path_to_weights.pth"))
+model.eval()
+# Load the tokenizer
+tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
+# Function to make predictions with attention visualization
+def predict_with_attention(text):
+    # Tokenize
+    tokens = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=128)
+    input_ids = tokens["input_ids"]
+    # Get prediction and attention weights
+    with torch.no_grad():
+        outputs = model(input_ids, return_attention=True, return_dict=True)
+    logits = outputs["logits"]
+    attention_weights = outputs["attention_weights"]
+    # Get prediction and confidence
+    probs = torch.nn.functional.softmax(logits, dim=1)
+    prediction = torch.argmax(probs, dim=1).item()
+    confidence = probs[0][prediction].item()
+    sentiment = "Positive" if prediction == 1 else "Negative"
+    # Visualize attention weights
+    tokens_list = [tokenizer.convert_ids_to_tokens(id.item()) for id in input_ids[0]]
+    # Plot attention heatmap
+    plt.figure(figsize=(10, 2))
+    sns.heatmap(
+        attention_weights.squeeze(0).cpu().numpy(),
+        cmap="YlOrRd",
+        annot=True,
+        fmt=".2f",
+        cbar=False,
+        xticklabels=tokens_list,
+        yticklabels=["Attention"]
+    )
+    plt.title(f"Prediction: {sentiment} (Confidence: {confidence:.4f})")
+    plt.tight_layout()
+    plt.show()
+    return {
+        "text": text,
+        "sentiment": sentiment,
+        "confidence": confidence,
+        "attention": attention_weights.squeeze(0).cpu().numpy()
+    }
+# Example usage
+result = predict_with_attention("I absolutely loved this movie! The acting was superb.")
+print(f"Sentiment: {result['sentiment']} (Confidence: {result['confidence']:.4f})")
+```
+## Citations
+```
+@inproceedings{socher2013recursive,
+  title={Recursive deep models for semantic compositionality over a sentiment treebank},
+  author={Socher, Richard and Perelygin, Alex and Wu, Jean and Chuang, Jason and Manning, Christopher D and Ng, Andrew Y and Potts, Christopher},
+  booktitle={Proceedings of the 2013 conference on empirical methods in natural language processing},
+  pages={1631--1642},
+  year={2013}
+}
+```

README.md ADDED Viewed

	@@ -0,0 +1,121 @@

+# Attention-based Sentiment Classifier
+This repository contains an attention-based sentiment classification model that demonstrates how attention mechanisms can enhance interpretability in NLP tasks.
+![Attention Visualization Example](./assets/attention_visualization.png)
+## Model Overview
+This model uses a bidirectional GRU with an attention mechanism to classify text sentiment (positive/negative). The attention mechanism allows the model to focus on the most relevant parts of the input text, providing insight into which words influence the classification the most.
+### Key Features
+- Bidirectional GRU architecture
+- Additive attention mechanism for interpretability
+- Binary sentiment classification (positive/negative)
+- Visualization tools for attention weights
+## Quick Start
+```python
+from transformers import pipeline
+import matplotlib.pyplot as plt
+import seaborn as sns
+# Load model directly from Hugging Face
+classifier = pipeline(
+    "text-classification",
+    model="your-username/attention-sentiment-classifier"
+)
+# Standard prediction
+result = classifier("I absolutely loved this movie! The acting was superb.")
+print(f"Sentiment: {result[0]['label']}, Score: {result[0]['score']:.4f}")
+# For attention visualization, use the model directly
+from transformers import AutoTokenizer, AutoModel
+import torch
+tokenizer = AutoTokenizer.from_pretrained("your-username/attention-sentiment-classifier")
+model = AutoModel.from_pretrained("your-username/attention-sentiment-classifier")
+text = "I absolutely loved this movie! The acting was superb."
+inputs = tokenizer(text, return_tensors="pt")
+# Get prediction with attention weights
+model.eval()
+with torch.no_grad():
+    outputs = model(inputs["input_ids"], return_attention=True, return_dict=True)
+# Get prediction results
+logits = outputs["logits"]
+attention_weights = outputs["attention_weights"]
+# Visualize attention
+tokens = [tokenizer.convert_ids_to_tokens(id.item()) for id in inputs["input_ids"][0]]
+plt.figure(figsize=(10, 2))
+sns.heatmap(
+    attention_weights.squeeze(0).cpu().numpy().reshape(1, -1),
+    cmap="YlOrRd",
+    annot=True,
+    fmt=".2f",
+    cbar=False,
+    xticklabels=tokens,
+    yticklabels=["Attention"]
+)
+plt.xticks(rotation=45, ha="right", rotation_mode="anchor")
+plt.title("Attention Weights Visualization")
+plt.tight_layout()
+plt.show()
+```
+## Demo App
+This model includes a Streamlit demo app that can be launched directly on Hugging Face Spaces.
+## Model Architecture
+The model consists of:
+1. **Embedding Layer**: Converts token IDs to dense vectors
+2. **Bidirectional GRU**: Processes the text in both directions
+3. **Attention Mechanism**: Focuses on the most relevant parts of the text
+4. **Classifier Head**: Makes the final sentiment prediction
+## Training
+The model was trained on the SST-2 (Stanford Sentiment Treebank) dataset using the following hyperparameters:
+- Learning rate: 1e-3
+- Epochs: 12
+- Optimizer: Adam
+- Loss function: Cross Entropy Loss
+- Embedding dimension: 100
+- Hidden dimension: 256
+## Limitations
+- Only trained on movie reviews, may not generalize to other domains
+- Limited to English text
+- Binary classification only (positive/negative)
+- Not suitable for multi-lingual content
+- Performance may degrade on texts significantly different from movie reviews
+## Citation
+If you use this model, please cite:
+```
+@misc{attention-sentiment-classifier,
+  author = {Lantian Wei},
+  title = {Attention-based Sentiment Classifier},
+  year = {2025},
+  publisher = {Hugging Face},
+  howpublished = {\url{https://huggingface.co/your-username/attention-sentiment-classifier}}
+}
+```
+## License
+This model is licensed under the GNU General Public License v3.0.

app.py ADDED Viewed

	@@ -0,0 +1,129 @@

+import streamlit as st
+import torch
+import matplotlib.pyplot as plt
+import seaborn as sns
+from transformers import AutoTokenizer
+from models.huggingface_model import SentimentClassifierForHuggingFace
+import numpy as np
+import io
+from PIL import Image
+# Load model and tokenizer
+@st.cache_resource
+def load_model():
+    model = SentimentClassifierForHuggingFace.from_pretrained("./")
+    tokenizer = AutoTokenizer.from_pretrained("./")
+    return model, tokenizer
+def predict_sentiment(text, model, tokenizer):
+    # Tokenize the input
+    tokens = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=128)
+    input_ids = tokens["input_ids"]
+    # Run inference
+    model.eval()
+    with torch.no_grad():
+        outputs = model(input_ids, return_attention=True, return_dict=True)
+    # Get prediction results
+    logits = outputs["logits"]
+    attention_weights = outputs["attention_weights"]
+    # Convert to probabilities and get prediction
+    probs = torch.nn.functional.softmax(logits, dim=1)
+    prediction = torch.argmax(probs, dim=1).item()
+    confidence = probs[0][prediction].item()
+    sentiment = "Positive" if prediction == 1 else "Negative"
+    # Get token strings for visualization
+    tokens_list = []
+    for id in input_ids[0]:
+        token = tokenizer.convert_ids_to_tokens(id.item())
+        tokens_list.append(token)
+    # Create visualization
+    fig, ax = plt.subplots(figsize=(10, 2))
+    sns.heatmap(
+        attention_weights.squeeze(0).cpu().numpy().reshape(1, -1),
+        cmap="YlOrRd",
+        annot=True,
+        fmt=".2f",
+        cbar=False,
+        xticklabels=tokens_list,
+        yticklabels=["Attention"],
+        ax=ax
+    )
+    # Rotate x-axis labels for better readability
+    plt.xticks(rotation=45, ha="right", rotation_mode="anchor")
+    plt.title(f"Prediction: {sentiment} (Confidence: {confidence:.4f})")
+    plt.tight_layout()
+    # Convert plot to image
+    buf = io.BytesIO()
+    fig.savefig(buf, format="png", dpi=150, bbox_inches="tight")
+    buf.seek(0)
+    img = Image.open(buf)
+    plt.close(fig)
+    return sentiment, confidence, img
+# Streamlit app
+def main():
+    st.set_page_config(
+        page_title="Sentiment Analysis with Attention",
+        page_icon="🧠",
+        layout="wide"
+    )
+    st.title("Sentiment Analysis with Attention Visualization")
+    st.write("This model classifies text sentiment as positive or negative and visualizes which parts of the text it focused on using an attention mechanism.")
+    # Load model and tokenizer
+    try:
+        model, tokenizer = load_model()
+        model_loaded = True
+    except Exception as e:
+        st.error(f"Error loading model: {e}")
+        model_loaded = False
+    # Text input
+    text_input = st.text_area(
+        "Enter text to analyze:",
+        value="I absolutely loved this movie! The acting was superb.",
+        height=100,
+    )
+    # Process when button is clicked
+    if st.button("Analyze Sentiment") and model_loaded:
+        with st.spinner("Analyzing..."):
+            sentiment, confidence, viz_img = predict_sentiment(text_input, model, tokenizer)
+            # Display results
+            col1, col2 = st.columns([1, 3])
+            with col1:
+                st.subheader("Prediction:")
+                sentiment_color = "#5FD068" if sentiment == "Positive" else "#D21312"
+                st.markdown(
+                    f"<div style='background-color:{sentiment_color}; padding:10px; border-radius:5px;"
+                    f"color:white; text-align:center; font-size:24px;'>{sentiment}</div>",
+                    unsafe_allow_html=True
+                )
+                st.metric("Confidence", f"{confidence:.2%}")
+            with col2:
+                st.subheader("Attention Visualization:")
+                st.image(viz_img, use_column_width=True)
+                st.caption("The heatmap shows which words the model focused on most when making its prediction.")
+            st.markdown("---")
+            st.subheader("How to interpret the visualization:")
+            st.write(
+                "The attention heatmap shows the weight assigned to each token in the text. "
+                "Darker colors indicate where the model focused more attention when making its prediction. "
+                "This can help identify which parts of the text were most influential for sentiment classification."
+            )
+if __name__ == "__main__":
+    main()

assets/attention_visualization.png ADDED Viewed

Git LFS Details

SHA256: 3679a3f51a014999060011cc9b7af0ce51c79eb54a9b50d96772949124be0001
Pointer size: 131 Bytes
Size of remote file: 755 kB

assets/example_negative.png ADDED Viewed

assets/exmaple_positive.png ADDED Viewed

config.json ADDED Viewed

	@@ -0,0 +1,16 @@

+{
+  "architectures": [
+    "SentimentClassifierForHuggingFace"
+  ],
+  "bidirectional": true,
+  "dropout": 0.3,
+  "embedding_dim": 100,
+  "hidden_dim": 256,
+  "model_type": "sentiment_classifier",
+  "n_layers": 1,
+  "output_dim": 2,
+  "pad_idx": 0,
+  "torch_dtype": "float32",
+  "transformers_version": "4.51.2",
+  "vocab_size": 30522
+}

inference_example.py ADDED Viewed

	@@ -0,0 +1,34 @@

+import torch
+from transformers import AutoTokenizer
+from models.huggingface_model import SentimentClassifierForHuggingFace
+# Load the model and tokenizer
+model = SentimentClassifierForHuggingFace.from_pretrained("./")
+tokenizer = AutoTokenizer.from_pretrained("./")
+# Prepare text input
+text = "I absolutely loved this movie! The acting was superb."
+inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=128)
+# Run inference
+model.eval()
+with torch.no_grad():
+    outputs = model(inputs["input_ids"], return_attention=True, return_dict=True)
+# Process results
+logits = outputs["logits"]
+attention_weights = outputs["attention_weights"]
+# Get prediction and confidence
+probs = torch.nn.functional.softmax(logits, dim=1)
+prediction = torch.argmax(probs, dim=1).item()
+confidence = probs[0][prediction].item()
+sentiment = "Positive" if prediction == 1 else "Negative"
+print(f"Text: {text}")
+print(f"Sentiment: {sentiment}")
+print(f"Confidence: {confidence:.4f}")
+# To visualize attention weights, add matplotlib and seaborn imports
+# and use attention_weights to create a heatmap

model-index.json ADDED Viewed

	@@ -0,0 +1,15 @@

+{
+  "name": "attention-sentiment-classifier",
+  "description": "Attention-based sentiment classification model that visualizes which parts of text influence predictions",
+  "tags": [
+    "pytorch",
+    "text-classification",
+    "sentiment-analysis",
+    "attention-mechanism",
+    "english",
+    "sst2"
+  ],
+  "license": "gpl-3.0",
+  "library_name": "transformers",
+  "pipeline_tag": "text-classification"
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:914de6219991e18c1a6be6138555ff8cfe4c5b3d238eca3cb8c920dfc261968d
+size 17040676

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+  "cls_token": "[CLS]",
+  "mask_token": "[MASK]",
+  "pad_token": "[PAD]",
+  "sep_token": "[SEP]",
+  "unk_token": "[UNK]"
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,56 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "[PAD]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100": {
+      "content": "[UNK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "101": {
+      "content": "[CLS]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "102": {
+      "content": "[SEP]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "103": {
+      "content": "[MASK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "clean_up_tokenization_spaces": false,
+  "cls_token": "[CLS]",
+  "do_lower_case": true,
+  "extra_special_tokens": {},
+  "mask_token": "[MASK]",
+  "model_max_length": 512,
+  "pad_token": "[PAD]",
+  "sep_token": "[SEP]",
+  "strip_accents": null,
+  "tokenize_chinese_chars": true,
+  "tokenizer_class": "BertTokenizer",
+  "unk_token": "[UNK]"
+}

vocab.txt ADDED Viewed

The diff for this file is too large to render. See raw diff