PaulTran
/

distil_multilabel_vi_essay_categorizer

Text Classification

Model card Files Files and versions

This is a finetuned DistilBERT model for Vietnamese essay categories classification.

Overview

At primary levels of education in Vietnam, students are introduced to 5 categories of essays:
- Argumentative - Nghị luận
- Anecdote - Biểu cảm
- Descriptive - Miêu tả
- Narrative - Tự sự
- Expository - Thuyết minh
This model will classify sentences into these 5 categories

Pretrained model used in this pipeline:

This pipeline includes pre-trained distilbert-base-multilingual-cased and a Multi-label Classification head trained on 8000 manually labeled sample essay sentences.
The dataset can be found on Kaggle
Usage of distilbert-base-multilingual-cased can be found on Huggingface

Citation:

@article{Sanh2019DistilBERTAD,
  title={DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter},
  author={Victor Sanh and Lysandre Debut and Julien Chaumond and Thomas Wolf},
  journal={ArXiv},
  year={2019},
  volume={abs/1910.01108}
}

Downloads last month: 2