NeoCyber's picture
Update model card
5ec43c9 verified
metadata
language:
  - vi
license: mit
library_name: transformers
pipeline_tag: text-classification
base_model: intfloat/multilingual-e5-small
tags:
  - vietnamese
  - custom-code
  - transformers
  - multilingual-e5
  - uni_vsfc
  - uit-vsfc
  - education
  - multitask
  - text-classification

m-e5-small-uit-vsfc-uni

Overview

Vietnamese multi-task text classification model for student feedback. The model jointly predicts sentiment and topic labels from a single sentence.

Model Details

  • Base model: intfloat/multilingual-e5-small
  • Architecture: uni_vsfc
  • Checkpoint source: uit-vsfc-uni-e5-small-best.pt
  • Sequence length used during training/inference pipeline: 256
  • Tasks: sentiment, topic

Label Schema

  • sentiment: 0 = negative, 1 = neutral, 2 = positive
  • topic: 0 = lecturer, 1 = training_program, 2 = facility, 3 = others

Task Heads

  • sentiment: 3 classes
  • topic: 4 classes

Dataset

  • Dataset: Vietnamese Students' Feedback Corpus (UIT-VSFC) Vietnamese Students' Feedback Corpus (UIT-VSFC) contains more than 16,000 human-annotated student feedback sentences with sentiment and topic labels.

Data Format

  • sentence is the input text column.
  • sentiment is a 3-class label and topic is a 4-class label.

Splits

  • Train: 11426 samples
  • Validation: 1583 samples
  • Test: 3166 samples

Checkpoint Metrics

  • loss: 0.2894
  • accuracy: 0.9005

Usage

Load the model with trust_remote_code=True because this repository contains custom modeling code.

from transformers import AutoModelForSequenceClassification, AutoTokenizer

repo_id = "NeoCyber/m-e5-small-uit-vsfc-uni"
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForSequenceClassification.from_pretrained(
    repo_id,
    trust_remote_code=True,
)

texts = ["slide giáo trình đầy đủ ."]
inputs = tokenizer(texts, return_tensors="pt", truncation=True, padding=True)
outputs = model(**inputs)
predictions = model.decode_predictions(outputs.logits_by_task)
print(predictions)

Notes

  • The repository includes custom configuration_*.py and modeling_*.py files required by transformers AutoClasses.
  • outputs.logits_by_task contains one tensor per task, and outputs.logits is the concatenated tensor.