|
|
--- |
|
|
license: mit |
|
|
language: |
|
|
- en |
|
|
- vi |
|
|
metrics: |
|
|
- accuracy |
|
|
- f1 |
|
|
- recall |
|
|
- precision |
|
|
pipeline_tag: text-classification |
|
|
tags: |
|
|
- analysis |
|
|
- sentiment |
|
|
- text-classification |
|
|
--- |
|
|
# Sentiment Analysis Using LSTM and CNN |
|
|
This project implements a hybrid deep learning model combining **Long Short-Term Memory (LSTM)** networks and **Convolutional Neural Networks (CNN)** for sentiment analysis. The architecture leverages the strengths of both LSTM and CNN to process textual data and classify sentiments effectively. |
|
|
|
|
|
--- |
|
|
|
|
|
## Model Architecture |
|
|
 |
|
|
The architecture consists of two parallel branches that process the input text sequences and merge their outputs for final classification: |
|
|
|
|
|
### **Branch 1: CNN-Based Processing** |
|
|
1. **Embedding Layer**: Converts input sequences into dense vector representations. |
|
|
2. **Conv1D + Activation**: Extracts local features from the text using convolutional filters. |
|
|
3. **MaxPooling1D**: Reduces the spatial dimensions while retaining the most important features. |
|
|
4. **BatchNormalization**: Normalizes the activations to stabilize and accelerate training. |
|
|
5. **Conv1D + MaxPooling1D + BatchNormalization**: Repeats the convolution and pooling process to extract deeper features. |
|
|
6. **Flatten**: Converts the 2D feature maps into a 1D vector. |
|
|
|
|
|
### **Branch 2: LSTM-Based Processing** |
|
|
1. **Embedding Layer**: Similar to the CNN branch, converts input sequences into dense vector representations. |
|
|
2. **Bidirectional LSTM**: Captures long-term dependencies in the text by processing it in both forward and backward directions. |
|
|
3. **LayerNormalization**: Normalizes the outputs of the LSTM layer. |
|
|
4. **Bidirectional GRU**: Further processes the sequence with Gated Recurrent Units for efficiency. |
|
|
5. **LayerNormalization**: Normalizes the GRU outputs. |
|
|
6. **Flatten**: Converts the sequence outputs into a 1D vector. |
|
|
|
|
|
### **Merging and Classification** |
|
|
1. **Concatenate**: Combines the outputs of the CNN and LSTM branches. |
|
|
2. **Dense Layers with Dropout**: Fully connected layers with ReLU activation and dropout for regularization. |
|
|
3. **Output Layer**: A dense layer with a softmax activation function to classify the sentiment into three categories: Positive, Neutral, and Negative. |
|
|
|
|
|
--- |
|
|
|
|
|
## Why LSTM + CNN for Sentiment Analysis? |
|
|
|
|
|
### **LSTM Strengths** |
|
|
- LSTMs are well-suited for capturing long-term dependencies in sequential data, such as text. |
|
|
- They excel at understanding the context and relationships between words in a sentence. |
|
|
|
|
|
### **CNN Strengths** |
|
|
- CNNs are effective at extracting local patterns and features, such as n-grams, from text data. |
|
|
- They are computationally efficient and can process data in parallel. |
|
|
|
|
|
### **Hybrid Approach** |
|
|
By combining LSTM and CNN, the model benefits from: |
|
|
- **Contextual Understanding**: LSTM captures the sequential nature of text. |
|
|
- **Feature Extraction**: CNN identifies important local patterns. |
|
|
- **Robustness**: The merged architecture ensures better generalization and performance on sentiment classification tasks. |
|
|
|
|
|
--- |
|
|
|
|
|
## Applications |
|
|
This model can be used for: |
|
|
- Social media sentiment analysis (e.g., Twitter, Reddit). |
|
|
- Customer feedback classification. |
|
|
- Opinion mining in reviews and surveys. |
|
|
|
|
|
--- |
|
|
|
|
|
## Training and Evaluation |
|
|
The model is trained on labeled datasets with text and sentiment labels. It uses: |
|
|
- **Sparse Categorical Crossentropy** as the loss function. |
|
|
- **AdamW Optimizer** for efficient training. |
|
|
- **Early Stopping** and **Model Checkpoints** to prevent overfitting and save the best model. |
|
|
|
|
|
The performance is evaluated using metrics like accuracy, confusion matrix, and classification report. |
|
|
|
|
|
--- |
|
|
|
|
|
## Conclusion |
|
|
The hybrid LSTM + CNN architecture provides a powerful framework for sentiment analysis, combining the strengths of sequential modeling and feature extraction. This approach is versatile and can be adapted to various text classification tasks. |
|
|
|
|
|
## Lisence |
|
|
MIT Lisence |