AnnyNguyen commited on
Commit
56cf0d4
·
verified ·
1 Parent(s): b002cc0

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +230 -0
README.md ADDED
@@ -0,0 +1,230 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: mbert
4
+ tags:
5
+ - vietnamese
6
+ - aspect-based-sentiment-analysis
7
+ - VLSP-ABSA
8
+ datasets:
9
+ - visolex/VLSP2018-ABSA-Hotel
10
+ metrics:
11
+ - accuracy
12
+ - macro-f1
13
+ model-index:
14
+ - name: mbert-absa-hotel
15
+ results:
16
+ - task:
17
+ type: text-classification
18
+ name: Aspect-based Sentiment Analysis
19
+ dataset:
20
+ name: VLSP2018-ABSA-Hotel
21
+ type: VLSP-ABSA
22
+ metrics:
23
+ - type: accuracy
24
+ value: 0.9524
25
+ - type: macro-f1
26
+ value: 0.5098
27
+ - type: macro_precision
28
+ value: 0.7107
29
+ - type: macro_recall
30
+ value: 0.4263
31
+ ---
32
+
33
+ # mbert-absa-hotel: Aspect-based Sentiment Analysis for Vietnamese Reviews
34
+
35
+ This model is a fine-tuned version of [mbert](https://huggingface.co/mbert)
36
+ on the **VLSP2018-ABSA-Hotel** dataset for aspect-based sentiment analysis in Vietnamese reviews.
37
+
38
+ ## Model Details
39
+
40
+ * **Base Model**: mbert
41
+ * **Description**: mBERT for Vietnamese ABSA
42
+ * **Dataset**: VLSP2018-ABSA-Hotel
43
+ * **Fine-tuning Framework**: HuggingFace Transformers
44
+ * **Task**: Aspect-based Sentiment Classification (3 classes)
45
+
46
+ ### Hyperparameters
47
+
48
+ * Batch size: `32`
49
+ * Learning rate: `3e-5`
50
+ * Epochs: `100`
51
+ * Max sequence length: `256`
52
+ * Weight decay: `0.01`
53
+ * Warmup steps: `500`
54
+ * Optimizer: AdamW
55
+
56
+ ## Dataset
57
+
58
+ Model was trained on **VLSP2018 ABSA Hotel dataset** for aspect-based sentiment analysis.
59
+
60
+ ### Sentiment Labels:
61
+
62
+ * **0 - Negative** (Tiêu cực): Negative opinions
63
+ * **1 - Neutral** (Trung lập): Neutral, objective opinions
64
+ * **2 - Positive** (Tích cực): Positive opinions
65
+
66
+ ### Aspect Categories:
67
+
68
+ Model được train để phân tích sentiment cho các aspects sau:
69
+
70
+ - **FACILITIES#CLEANLINESS**
71
+ - **FACILITIES#COMFORT**
72
+ - **FACILITIES#DESIGN&FEATURES**
73
+ - **FACILITIES#GENERAL**
74
+ - **FACILITIES#MISCELLANEOUS**
75
+ - **FACILITIES#PRICES**
76
+ - **FACILITIES#QUALITY**
77
+ - **FOOD&DRINKS#MISCELLANEOUS**
78
+ - **FOOD&DRINKS#PRICES**
79
+ - **FOOD&DRINKS#QUALITY**
80
+ - **FOOD&DRINKS#STYLE&OPTIONS**
81
+ - **HOTEL#CLEANLINESS**
82
+ - **HOTEL#COMFORT**
83
+ - **HOTEL#DESIGN&FEATURES**
84
+ - **HOTEL#GENERAL**
85
+ - **HOTEL#MISCELLANEOUS**
86
+ - **HOTEL#PRICES**
87
+ - **HOTEL#QUALITY**
88
+ - **LOCATION#GENERAL**
89
+ - **ROOMS#CLEANLINESS**
90
+ - **ROOMS#COMFORT**
91
+ - **ROOMS#DESIGN&FEATURES**
92
+ - **ROOMS#GENERAL**
93
+ - **ROOMS#MISCELLANEOUS**
94
+ - **ROOMS#PRICES**
95
+ - **ROOMS#QUALITY**
96
+ - **ROOM_AMENITIES#CLEANLINESS**
97
+ - **ROOM_AMENITIES#COMFORT**
98
+ - **ROOM_AMENITIES#DESIGN&FEATURES**
99
+ - **ROOM_AMENITIES#GENERAL**
100
+ - **ROOM_AMENITIES#MISCELLANEOUS**
101
+ - **ROOM_AMENITIES#PRICES**
102
+ - **ROOM_AMENITIES#QUALITY**
103
+ - **SERVICE#GENERAL**
104
+
105
+ ## Evaluation Results
106
+
107
+ The model was evaluated on test set with the following metrics:
108
+
109
+ * **Accuracy**: `0.9524`
110
+ * **Macro-F1**: `0.5098`
111
+ * **Weighted-F1**: `0.7670`
112
+ * **Macro-Precision**: `0.7107`
113
+ * **Macro-Recall**: `0.4263`
114
+
115
+ ## Usage Example
116
+
117
+ ```python
118
+ import torch
119
+ from transformers import AutoTokenizer, AutoModel
120
+
121
+ # Load model and tokenizer
122
+ repo = "visolex/mbert-absa-hotel"
123
+ tokenizer = AutoTokenizer.from_pretrained(repo, trust_remote_code=True)
124
+ model = AutoModel.from_pretrained(repo, trust_remote_code=True)
125
+ model.eval()
126
+
127
+ # Aspect labels for VLSP2018-ABSA-Hotel
128
+ aspect_labels = [
129
+ "FACILITIES#CLEANLINESS",
130
+ "FACILITIES#COMFORT",
131
+ "FACILITIES#DESIGN&FEATURES",
132
+ "FACILITIES#GENERAL",
133
+ "FACILITIES#MISCELLANEOUS",
134
+ "FACILITIES#PRICES",
135
+ "FACILITIES#QUALITY",
136
+ "FOOD&DRINKS#MISCELLANEOUS",
137
+ "FOOD&DRINKS#PRICES",
138
+ "FOOD&DRINKS#QUALITY",
139
+ "FOOD&DRINKS#STYLE&OPTIONS",
140
+ "HOTEL#CLEANLINESS",
141
+ "HOTEL#COMFORT",
142
+ "HOTEL#DESIGN&FEATURES",
143
+ "HOTEL#GENERAL",
144
+ "HOTEL#MISCELLANEOUS",
145
+ "HOTEL#PRICES",
146
+ "HOTEL#QUALITY",
147
+ "LOCATION#GENERAL",
148
+ "ROOMS#CLEANLINESS",
149
+ "ROOMS#COMFORT",
150
+ "ROOMS#DESIGN&FEATURES",
151
+ "ROOMS#GENERAL",
152
+ "ROOMS#MISCELLANEOUS",
153
+ "ROOMS#PRICES",
154
+ "ROOMS#QUALITY",
155
+ "ROOM_AMENITIES#CLEANLINESS",
156
+ "ROOM_AMENITIES#COMFORT",
157
+ "ROOM_AMENITIES#DESIGN&FEATURES",
158
+ "ROOM_AMENITIES#GENERAL",
159
+ "ROOM_AMENITIES#MISCELLANEOUS",
160
+ "ROOM_AMENITIES#PRICES",
161
+ "ROOM_AMENITIES#QUALITY",
162
+ "SERVICE#GENERAL"
163
+ ]
164
+
165
+ # Sentiment labels
166
+ sentiment_labels = ["POSITIVE", "NEGATIVE", "NEUTRAL"]
167
+
168
+ # Example review text
169
+ text = "Khách sạn rất sạch sẽ, phòng ốc thoải mái nhưng giá hơi cao."
170
+
171
+ # Tokenize
172
+ inputs = tokenizer(
173
+ text,
174
+ return_tensors="pt",
175
+ padding=True,
176
+ truncation=True,
177
+ max_length=256
178
+ )
179
+ inputs.pop("token_type_ids", None)
180
+
181
+ # Predict
182
+ with torch.no_grad():
183
+ outputs = model(**inputs)
184
+
185
+ # Get logits: shape [1, num_aspects, num_sentiments + 1]
186
+ logits = outputs.logits.squeeze(0) # [num_aspects, num_sentiments + 1]
187
+ probs = torch.softmax(logits, dim=-1)
188
+
189
+ # Predict for each aspect
190
+ none_id = probs.size(-1) - 1 # Index of "none" class
191
+ results = []
192
+
193
+ for i, aspect in enumerate(aspect_labels):
194
+ prob_i = probs[i]
195
+ pred_id = int(prob_i.argmax().item())
196
+
197
+ if pred_id != none_id and pred_id < len(sentiment_labels):
198
+ score = prob_i[pred_id].item()
199
+ if score >= 0.5: # threshold
200
+ results.append((aspect, sentiment_labels[pred_id].lower()))
201
+
202
+ print(f"Text: {text}")
203
+ print(f"Predicted aspects: {results}")
204
+ # Output example: [('aspects', 'positive'), ('aspects', 'positive'), ('aspects', 'negative')]
205
+ ```
206
+
207
+ ## Citation
208
+
209
+ If you use this model, please cite:
210
+
211
+ ```bibtex
212
+ @misc{visolex_absa_mbert_absa_hotel,
213
+ title={mBERT for Vietnamese ABSA for Vietnamese Aspect-based Sentiment Analysis},
214
+ author={ViSoLex Team},
215
+ year={2025},
216
+ url={https://huggingface.co/visolex/mbert-absa-hotel}
217
+ }
218
+ ```
219
+
220
+ ## License
221
+
222
+ This model is released under the Apache-2.0 license.
223
+
224
+ ## Acknowledgments
225
+
226
+ * Base model: [mbert](https://huggingface.co/mbert)
227
+ * Dataset: VLSP2018-ABSA-Hotel
228
+ * ViSoLex Toolkit
229
+
230
+ ---