m-e5-small-hosrev / README.md
NeoCyber's picture
Update model card
0206369 verified
---
language:
- vi
license: mit
library_name: transformers
pipeline_tag: text-classification
base_model: intfloat/multilingual-e5-small
tags:
- vietnamese
- custom-code
- transformers
- multilingual-e5
- absa
- hosrev
- healthcare
- text-classification
- aspect-based-sentiment-analysis
---
# m-e5-small-hosrev
## Overview
Vietnamese aspect-category sentiment classification model for hospital reviews. The model predicts sentiment for 13 aspect categories covering facilities, medical staff, and overall experience.
## Model Details
- Base model: `intfloat/multilingual-e5-small`
- Architecture: `absa`
- Checkpoint source: `hosrev-e5-small-best.pt`
- Sequence length used during training/inference pipeline: `256`
- Number of aspect categories: `13`
## Label Schema
- `0`: aspect not mentioned
- `1`: positive
- `2`: negative
- `3`: neutral
## Aspect Categories
- `Cơ sở vật chất#Chất lượng`
- `Cơ sở vật chất#Khác`
- `Cơ sở vật chất#Không gian`
- `Cơ sở vật chất#Vệ sinh`
- `Nhân viên y tế#Chất lượng`
- `Nhân viên y tế#Khác`
- `Nhân viên y tế#Thái độ`
- `Trải nghiệm chung#Chất lượng`
- `Trải nghiệm chung#Giá`
- `Trải nghiệm chung#Khác`
- `Trải nghiệm chung#Không gian`
- `Trải nghiệm chung#Thái độ`
- `Trải nghiệm chung#Vệ sinh`
## Dataset
- Dataset: `HosRev`
HosRev is a Vietnamese hospital review dataset for Aspect-Category Sentiment Analysis (ACSA). The data contains reviews of hospitals in Ho Chi Minh City with aspect-category sentiment labels.
### Data Format
- `Review` is the input text column.
- Each remaining column is one aspect-category label encoded as `0/1/2/3`.
### Splits
- Train: `4566` samples
- Validation: `978` samples
- Test: `979` samples
## Checkpoint Metrics
- `loss`: `0.2834`
- `accuracy`: `0.8970`
## Usage
Load the model with `trust_remote_code=True` because this repository contains custom modeling code.
```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer
repo_id = "NeoCyber/m-e5-small-hosrev"
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForSequenceClassification.from_pretrained(
repo_id,
trust_remote_code=True,
)
texts = ["Bác sĩ nhiệt tình và giải thích rất dễ hiểu."]
inputs = tokenizer(texts, return_tensors="pt", truncation=True, padding=True)
outputs = model(**inputs)
predictions = model.decode_predictions(outputs.logits)
print(predictions)
```
## Notes
- The repository includes custom `configuration_*.py` and `modeling_*.py` files required by `transformers` AutoClasses.
- `outputs.logits` has shape `[batch_size, num_aspects, 4]` and `model.decode_predictions(...)` maps logits back to aspect-level labels.