You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

m-e5-small-uit-vsfc-uni

Overview

Vietnamese multi-task text classification model for student feedback. The model jointly predicts sentiment and topic labels from a single sentence.

Model Details

Base model: intfloat/multilingual-e5-small
Architecture: uni_vsfc
Checkpoint source: uit-vsfc-uni-e5-small-best.pt
Sequence length used during training/inference pipeline: 256
Tasks: sentiment, topic

Label Schema

sentiment: 0 = negative, 1 = neutral, 2 = positive
topic: 0 = lecturer, 1 = training_program, 2 = facility, 3 = others

Task Heads

sentiment: 3 classes
topic: 4 classes

Dataset

Dataset: Vietnamese Students' Feedback Corpus (UIT-VSFC) Vietnamese Students' Feedback Corpus (UIT-VSFC) contains more than 16,000 human-annotated student feedback sentences with sentiment and topic labels.

Data Format

sentence is the input text column.
sentiment is a 3-class label and topic is a 4-class label.

Splits

Train: 11426 samples
Validation: 1583 samples
Test: 3166 samples

Checkpoint Metrics

loss: 0.2894
accuracy: 0.9005

Usage

Load the model with trust_remote_code=True because this repository contains custom modeling code.

from transformers import AutoModelForSequenceClassification, AutoTokenizer

repo_id = "NeoCyber/m-e5-small-uit-vsfc-uni"
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForSequenceClassification.from_pretrained(
    repo_id,
    trust_remote_code=True,
)

texts = ["slide giáo trình đầy đủ ."]
inputs = tokenizer(texts, return_tensors="pt", truncation=True, padding=True)
outputs = model(**inputs)
predictions = model.decode_predictions(outputs.logits_by_task)
print(predictions)

Notes

The repository includes custom configuration_*.py and modeling_*.py files required by transformers AutoClasses.
outputs.logits_by_task contains one tensor per task, and outputs.logits is the concatenated tensor.

Downloads last month: 26

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for NeoCyber/m-e5-small-uit-vsfc-uni

Base model

intfloat/multilingual-e5-small

Finetuned

(161)

this model