Text Classification
Transformers
Safetensors
Vietnamese
absa_transformer
vietnamese
custom-code
multilingual-e5
absa
hosrev
healthcare
aspect-based-sentiment-analysis
custom_code
Instructions to use NeoCyber/m-e5-small-hosrev with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use NeoCyber/m-e5-small-hosrev with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="NeoCyber/m-e5-small-hosrev", trust_remote_code=True)# Load model directly from transformers import AutoModelForSequenceClassification model = AutoModelForSequenceClassification.from_pretrained("NeoCyber/m-e5-small-hosrev", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
| language: | |
| - vi | |
| license: mit | |
| library_name: transformers | |
| pipeline_tag: text-classification | |
| base_model: intfloat/multilingual-e5-small | |
| tags: | |
| - vietnamese | |
| - custom-code | |
| - transformers | |
| - multilingual-e5 | |
| - absa | |
| - hosrev | |
| - healthcare | |
| - text-classification | |
| - aspect-based-sentiment-analysis | |
| # m-e5-small-hosrev | |
| ## Overview | |
| Vietnamese aspect-category sentiment classification model for hospital reviews. The model predicts sentiment for 13 aspect categories covering facilities, medical staff, and overall experience. | |
| ## Model Details | |
| - Base model: `intfloat/multilingual-e5-small` | |
| - Architecture: `absa` | |
| - Checkpoint source: `hosrev-e5-small-best.pt` | |
| - Sequence length used during training/inference pipeline: `256` | |
| - Number of aspect categories: `13` | |
| ## Label Schema | |
| - `0`: aspect not mentioned | |
| - `1`: positive | |
| - `2`: negative | |
| - `3`: neutral | |
| ## Aspect Categories | |
| - `Cơ sở vật chất#Chất lượng` | |
| - `Cơ sở vật chất#Khác` | |
| - `Cơ sở vật chất#Không gian` | |
| - `Cơ sở vật chất#Vệ sinh` | |
| - `Nhân viên y tế#Chất lượng` | |
| - `Nhân viên y tế#Khác` | |
| - `Nhân viên y tế#Thái độ` | |
| - `Trải nghiệm chung#Chất lượng` | |
| - `Trải nghiệm chung#Giá` | |
| - `Trải nghiệm chung#Khác` | |
| - `Trải nghiệm chung#Không gian` | |
| - `Trải nghiệm chung#Thái độ` | |
| - `Trải nghiệm chung#Vệ sinh` | |
| ## Dataset | |
| - Dataset: `HosRev` | |
| HosRev is a Vietnamese hospital review dataset for Aspect-Category Sentiment Analysis (ACSA). The data contains reviews of hospitals in Ho Chi Minh City with aspect-category sentiment labels. | |
| ### Data Format | |
| - `Review` is the input text column. | |
| - Each remaining column is one aspect-category label encoded as `0/1/2/3`. | |
| ### Splits | |
| - Train: `4566` samples | |
| - Validation: `978` samples | |
| - Test: `979` samples | |
| ## Checkpoint Metrics | |
| - `loss`: `0.2834` | |
| - `accuracy`: `0.8970` | |
| ## Usage | |
| Load the model with `trust_remote_code=True` because this repository contains custom modeling code. | |
| ```python | |
| from transformers import AutoModelForSequenceClassification, AutoTokenizer | |
| repo_id = "NeoCyber/m-e5-small-hosrev" | |
| tokenizer = AutoTokenizer.from_pretrained(repo_id) | |
| model = AutoModelForSequenceClassification.from_pretrained( | |
| repo_id, | |
| trust_remote_code=True, | |
| ) | |
| texts = ["Bác sĩ nhiệt tình và giải thích rất dễ hiểu."] | |
| inputs = tokenizer(texts, return_tensors="pt", truncation=True, padding=True) | |
| outputs = model(**inputs) | |
| predictions = model.decode_predictions(outputs.logits) | |
| print(predictions) | |
| ``` | |
| ## Notes | |
| - The repository includes custom `configuration_*.py` and `modeling_*.py` files required by `transformers` AutoClasses. | |
| - `outputs.logits` has shape `[batch_size, num_aspects, 4]` and `model.decode_predictions(...)` maps logits back to aspect-level labels. | |