NeoCyber's picture
Update model card
5ec43c9 verified
---
language:
- vi
license: mit
library_name: transformers
pipeline_tag: text-classification
base_model: intfloat/multilingual-e5-small
tags:
- vietnamese
- custom-code
- transformers
- multilingual-e5
- uni_vsfc
- uit-vsfc
- education
- multitask
- text-classification
---
# m-e5-small-uit-vsfc-uni
## Overview
Vietnamese multi-task text classification model for student feedback. The model jointly predicts sentiment and topic labels from a single sentence.
## Model Details
- Base model: `intfloat/multilingual-e5-small`
- Architecture: `uni_vsfc`
- Checkpoint source: `uit-vsfc-uni-e5-small-best.pt`
- Sequence length used during training/inference pipeline: `256`
- Tasks: `sentiment, topic`
## Label Schema
- `sentiment`: `0 = negative`, `1 = neutral`, `2 = positive`
- `topic`: `0 = lecturer`, `1 = training_program`, `2 = facility`, `3 = others`
## Task Heads
- `sentiment`: `3` classes
- `topic`: `4` classes
## Dataset
- Dataset: `Vietnamese Students' Feedback Corpus (UIT-VSFC)`
Vietnamese Students' Feedback Corpus (UIT-VSFC) contains more than 16,000 human-annotated student feedback sentences with sentiment and topic labels.
### Data Format
- `sentence` is the input text column.
- `sentiment` is a 3-class label and `topic` is a 4-class label.
### Splits
- Train: `11426` samples
- Validation: `1583` samples
- Test: `3166` samples
## Checkpoint Metrics
- `loss`: `0.2894`
- `accuracy`: `0.9005`
## Usage
Load the model with `trust_remote_code=True` because this repository contains custom modeling code.
```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer
repo_id = "NeoCyber/m-e5-small-uit-vsfc-uni"
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForSequenceClassification.from_pretrained(
repo_id,
trust_remote_code=True,
)
texts = ["slide giáo trình đầy đủ ."]
inputs = tokenizer(texts, return_tensors="pt", truncation=True, padding=True)
outputs = model(**inputs)
predictions = model.decode_predictions(outputs.logits_by_task)
print(predictions)
```
## Notes
- The repository includes custom `configuration_*.py` and `modeling_*.py` files required by `transformers` AutoClasses.
- `outputs.logits_by_task` contains one tensor per task, and `outputs.logits` is the concatenated tensor.