Text Classification
Transformers
Safetensors
Vietnamese
uni_vsfc_transformer
vietnamese
custom-code
multilingual-e5
uni_vsfc
uit-vsfc
education
multitask
custom_code
Instructions to use NeoCyber/m-e5-small-uit-vsfc-uni with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use NeoCyber/m-e5-small-uit-vsfc-uni with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="NeoCyber/m-e5-small-uit-vsfc-uni", trust_remote_code=True)# Load model directly from transformers import AutoModelForSequenceClassification model = AutoModelForSequenceClassification.from_pretrained("NeoCyber/m-e5-small-uit-vsfc-uni", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
| language: | |
| - vi | |
| license: mit | |
| library_name: transformers | |
| pipeline_tag: text-classification | |
| base_model: intfloat/multilingual-e5-small | |
| tags: | |
| - vietnamese | |
| - custom-code | |
| - transformers | |
| - multilingual-e5 | |
| - uni_vsfc | |
| - uit-vsfc | |
| - education | |
| - multitask | |
| - text-classification | |
| # m-e5-small-uit-vsfc-uni | |
| ## Overview | |
| Vietnamese multi-task text classification model for student feedback. The model jointly predicts sentiment and topic labels from a single sentence. | |
| ## Model Details | |
| - Base model: `intfloat/multilingual-e5-small` | |
| - Architecture: `uni_vsfc` | |
| - Checkpoint source: `uit-vsfc-uni-e5-small-best.pt` | |
| - Sequence length used during training/inference pipeline: `256` | |
| - Tasks: `sentiment, topic` | |
| ## Label Schema | |
| - `sentiment`: `0 = negative`, `1 = neutral`, `2 = positive` | |
| - `topic`: `0 = lecturer`, `1 = training_program`, `2 = facility`, `3 = others` | |
| ## Task Heads | |
| - `sentiment`: `3` classes | |
| - `topic`: `4` classes | |
| ## Dataset | |
| - Dataset: `Vietnamese Students' Feedback Corpus (UIT-VSFC)` | |
| Vietnamese Students' Feedback Corpus (UIT-VSFC) contains more than 16,000 human-annotated student feedback sentences with sentiment and topic labels. | |
| ### Data Format | |
| - `sentence` is the input text column. | |
| - `sentiment` is a 3-class label and `topic` is a 4-class label. | |
| ### Splits | |
| - Train: `11426` samples | |
| - Validation: `1583` samples | |
| - Test: `3166` samples | |
| ## Checkpoint Metrics | |
| - `loss`: `0.2894` | |
| - `accuracy`: `0.9005` | |
| ## Usage | |
| Load the model with `trust_remote_code=True` because this repository contains custom modeling code. | |
| ```python | |
| from transformers import AutoModelForSequenceClassification, AutoTokenizer | |
| repo_id = "NeoCyber/m-e5-small-uit-vsfc-uni" | |
| tokenizer = AutoTokenizer.from_pretrained(repo_id) | |
| model = AutoModelForSequenceClassification.from_pretrained( | |
| repo_id, | |
| trust_remote_code=True, | |
| ) | |
| texts = ["slide giáo trình đầy đủ ."] | |
| inputs = tokenizer(texts, return_tensors="pt", truncation=True, padding=True) | |
| outputs = model(**inputs) | |
| predictions = model.decode_predictions(outputs.logits_by_task) | |
| print(predictions) | |
| ``` | |
| ## Notes | |
| - The repository includes custom `configuration_*.py` and `modeling_*.py` files required by `transformers` AutoClasses. | |
| - `outputs.logits_by_task` contains one tensor per task, and `outputs.logits` is the concatenated tensor. | |