You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

m-e5-small-uit-vsfc-uni

Overview

Vietnamese multi-task text classification model for student feedback. The model jointly predicts sentiment and topic labels from a single sentence.

Model Details

  • Base model: intfloat/multilingual-e5-small
  • Architecture: uni_vsfc
  • Checkpoint source: uit-vsfc-uni-e5-small-best.pt
  • Sequence length used during training/inference pipeline: 256
  • Tasks: sentiment, topic

Label Schema

  • sentiment: 0 = negative, 1 = neutral, 2 = positive
  • topic: 0 = lecturer, 1 = training_program, 2 = facility, 3 = others

Task Heads

  • sentiment: 3 classes
  • topic: 4 classes

Dataset

  • Dataset: Vietnamese Students' Feedback Corpus (UIT-VSFC) Vietnamese Students' Feedback Corpus (UIT-VSFC) contains more than 16,000 human-annotated student feedback sentences with sentiment and topic labels.

Data Format

  • sentence is the input text column.
  • sentiment is a 3-class label and topic is a 4-class label.

Splits

  • Train: 11426 samples
  • Validation: 1583 samples
  • Test: 3166 samples

Checkpoint Metrics

  • loss: 0.2894
  • accuracy: 0.9005

Usage

Load the model with trust_remote_code=True because this repository contains custom modeling code.

from transformers import AutoModelForSequenceClassification, AutoTokenizer

repo_id = "NeoCyber/m-e5-small-uit-vsfc-uni"
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForSequenceClassification.from_pretrained(
    repo_id,
    trust_remote_code=True,
)

texts = ["slide giáo trình đầy đủ ."]
inputs = tokenizer(texts, return_tensors="pt", truncation=True, padding=True)
outputs = model(**inputs)
predictions = model.decode_predictions(outputs.logits_by_task)
print(predictions)

Notes

  • The repository includes custom configuration_*.py and modeling_*.py files required by transformers AutoClasses.
  • outputs.logits_by_task contains one tensor per task, and outputs.logits is the concatenated tensor.
Downloads last month
26
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for NeoCyber/m-e5-small-uit-vsfc-uni

Finetuned
(161)
this model