You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

๐Ÿ‡ป๐Ÿ‡ณ Vietnamese Emotion Recognition (PhoBERT-based)

This repository provides a full pipeline for Vietnamese emotion recognition, including:

  • ๐Ÿ“Š Processed datasets
  • ๐Ÿง  Training scripts for multiple models
  • ๐Ÿ’พ Trained model checkpoints
  • ๐Ÿ“ˆ Evaluation results

๐Ÿ“Œ Overview

This project focuses on emotion classification in Vietnamese using both traditional and deep learning models, with a strong emphasis on PhoBERT-base-v2. Key contributions:

  • Build a high-quality Vietnamese emotion dataset
  • Handle class imbalance via oversampling
  • Compare multiple models (SVM โ†’ RNN โ†’ BiLSTM โ†’ CNN-LSTM โ†’ PhoBERT)
  • Achieve 94.22% accuracy with PhoBERT-base-v2

๐Ÿ“‚ Repository Structure

.
โ”œโ”€โ”€ bilstm_emotion_model/        # Saved BiLSTM model
โ”œโ”€โ”€ cnn_lstm_emotion_model/      # Saved CNN-LSTM model
โ”œโ”€โ”€ phobert_emotion_model/       # Saved PhoBERT model
โ”œโ”€โ”€ rnn_emotion_model/           # Saved RNN model
โ”œโ”€โ”€ svm_emotion_model/           # Saved SVM model
โ”œโ”€โ”€ flagged/                     # Flagged or filtered samples

โ”œโ”€โ”€ bilstm_best.keras            # Best BiLSTM checkpoint
โ”œโ”€โ”€ cnn_lstm_best.keras          # Best CNN-LSTM checkpoint

โ”œโ”€โ”€ main_BILSTM.py               # Train BiLSTM
โ”œโ”€โ”€ main_RNN_CNN-LSTM.py         # Train RNN & CNN-LSTM
โ”œโ”€โ”€ main_lstm.py                 # LSTM training script
โ”œโ”€โ”€ main_phobert.py              # Train PhoBERT
โ”œโ”€โ”€ main_svm.py                  # Train SVM
โ”œโ”€โ”€ main_v1.py                   # Legacy / combined script
โ”œโ”€โ”€ run.py                       # Main runner script

โ”œโ”€โ”€ processed.xlsx               # Main processed dataset
โ”œโ”€โ”€ processed_phobert.xlsx       # Dataset for PhoBERT
โ”œโ”€โ”€ processed_svm.xlsx           # Dataset for SVM
โ”œโ”€โ”€ train.xlsx                   # Training data

โ”œโ”€โ”€ abbreviations.json           # Text normalization rules
โ”œโ”€โ”€ word2vec_vi_syllables_100dims.txt   # Word embeddings

โ”œโ”€โ”€ requirements.txt
โ””โ”€โ”€ README.md

๐Ÿ“Š Dataset

๐Ÿ”น Sources

  • Social media
  • Product reviews
  • Conversations

๐Ÿ”น Format

sentence,emotion
"Tรดi rแบฅt vui hรดm nay",enjoyment

๐Ÿ”น Labels

  • enjoyment
  • anger
  • sadness
  • disgust
  • fear
  • surprise
  • other

๐Ÿ”น Preprocessing

  • Text cleaning and normalization
  • Abbreviation expansion (abbreviations.json)
  • Tokenization (Vietnamese-specific)
  • Oversampling for class balance

๐Ÿง  Models

Model Script Output folder
SVM main_svm.py svm_emotion_model/
RNN main_RNN_CNN-LSTM.py rnn_emotion_model/
BiLSTM main_BILSTM.py bilstm_emotion_model/
CNN-LSTM main_RNN_CNN-LSTM.py cnn_lstm_emotion_model/
PhoBERT main_phobert.py phobert_emotion_model/

๐Ÿ‘‰ PhoBERT performs best due to strong contextual understanding of Vietnamese language.

๐Ÿš€ Training

๐Ÿ”ง Install dependencies

pip install -r requirements.txt

โ–ถ๏ธ Run models

PhoBERT

python main_phobert.py

SVM

python main_svm.py

BiLSTM

python main_BILSTM.py

RNN / CNN-LSTM

python main_RNN_CNN-LSTM.py

Run all (if configured)

python run.py

๐Ÿ“ˆ Results

Model Accuracy
PhoBERT 94.22%
SVM 78.69%
CNN-LSTM 62.47%
BiLSTM 59.56%
RNN 30.02%

๐Ÿ’พ Checkpoints

Pretrained models are stored in:

*_emotion_model/

Example load:

from tensorflow.keras.models import load_model

model = load_model("bilstm_best.keras")

๐Ÿงช Example

text = "Hรดm nay tรดi rแบฅt vui"
prediction = model.predict(text)
print(prediction)

๐Ÿ“š Publication

If you use this work, please cite: Advancing Emotion Recognition in Vietnamese: A PhoBERT-Based Approach for Enhanced Interaction ๐Ÿ“„ DOI: https://doi.org/10.34238/tnu-jst.12889

๐Ÿ”ฎ Future Work

  • Multimodal emotion recognition (text + speech)
  • Larger and more diverse datasets
  • Real-time optimization
  • Deployment in production systems

๐Ÿ“„ License

MIT License

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support