m-e5-small-hosrev / README.md
NeoCyber's picture
Update model card
0206369 verified
metadata
language:
  - vi
license: mit
library_name: transformers
pipeline_tag: text-classification
base_model: intfloat/multilingual-e5-small
tags:
  - vietnamese
  - custom-code
  - transformers
  - multilingual-e5
  - absa
  - hosrev
  - healthcare
  - text-classification
  - aspect-based-sentiment-analysis

m-e5-small-hosrev

Overview

Vietnamese aspect-category sentiment classification model for hospital reviews. The model predicts sentiment for 13 aspect categories covering facilities, medical staff, and overall experience.

Model Details

  • Base model: intfloat/multilingual-e5-small
  • Architecture: absa
  • Checkpoint source: hosrev-e5-small-best.pt
  • Sequence length used during training/inference pipeline: 256
  • Number of aspect categories: 13

Label Schema

  • 0: aspect not mentioned
  • 1: positive
  • 2: negative
  • 3: neutral

Aspect Categories

  • Cơ sở vật chất#Chất lượng
  • Cơ sở vật chất#Khác
  • Cơ sở vật chất#Không gian
  • Cơ sở vật chất#Vệ sinh
  • Nhân viên y tế#Chất lượng
  • Nhân viên y tế#Khác
  • Nhân viên y tế#Thái độ
  • Trải nghiệm chung#Chất lượng
  • Trải nghiệm chung#Giá
  • Trải nghiệm chung#Khác
  • Trải nghiệm chung#Không gian
  • Trải nghiệm chung#Thái độ
  • Trải nghiệm chung#Vệ sinh

Dataset

  • Dataset: HosRev HosRev is a Vietnamese hospital review dataset for Aspect-Category Sentiment Analysis (ACSA). The data contains reviews of hospitals in Ho Chi Minh City with aspect-category sentiment labels.

Data Format

  • Review is the input text column.
  • Each remaining column is one aspect-category label encoded as 0/1/2/3.

Splits

  • Train: 4566 samples
  • Validation: 978 samples
  • Test: 979 samples

Checkpoint Metrics

  • loss: 0.2834
  • accuracy: 0.8970

Usage

Load the model with trust_remote_code=True because this repository contains custom modeling code.

from transformers import AutoModelForSequenceClassification, AutoTokenizer

repo_id = "NeoCyber/m-e5-small-hosrev"
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForSequenceClassification.from_pretrained(
    repo_id,
    trust_remote_code=True,
)

texts = ["Bác sĩ nhiệt tình và giải thích rất dễ hiểu."]
inputs = tokenizer(texts, return_tensors="pt", truncation=True, padding=True)
outputs = model(**inputs)
predictions = model.decode_predictions(outputs.logits)
print(predictions)

Notes

  • The repository includes custom configuration_*.py and modeling_*.py files required by transformers AutoClasses.
  • outputs.logits has shape [batch_size, num_aspects, 4] and model.decode_predictions(...) maps logits back to aspect-level labels.