AnnyNguyen commited on
Commit
d5d7013
·
verified ·
1 Parent(s): 73f6b14

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +88 -0
README.md ADDED
@@ -0,0 +1,88 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - visolex/ViSFD
5
+ language:
6
+ - vi
7
+ base_model:
8
+ - vinai/phobert-base
9
+ pipeline_tag: text-classification
10
+ ---
11
+ Fine‑tuned from `vinai/phobert-base` on `visolex/phobert-absa-smartphone` for joint aspect detection + sentiment classification (shared heads).
12
+
13
+ **Model Details**
14
+
15
+ * **Base Model:** vinai/phobert-base
16
+ * **Dataset:** visolex/ViSFD
17
+ * **Fine‑tuning framework:** HuggingFace Transformers
18
+
19
+ **Hyperparameters**
20
+
21
+ * Batch size: 32
22
+ * Learning rate: 3e‑5
23
+ * Epochs: 100
24
+ * Max sequence length: 256
25
+ * Early stopping patience: 5
26
+
27
+ **Usage**
28
+
29
+ ```python
30
+ import torch
31
+ from transformers import AutoTokenizer, AutoModel
32
+
33
+ # Danh sách aspect và sentiment labels
34
+ aspect_labels = [
35
+ "BATTERY", "CAMERA", "DESIGN", "FEATURES", "GENERAL",
36
+ "PERFORMANCE", "PRICE", "SCREEN", "SERandACC", "STORAGE"
37
+ ]
38
+ sentiment_labels = ["POSITIVE", "NEGATIVE", "NEUTRAL"]
39
+
40
+ # 1) Load tokenizer và model (phải về đúng class TransformerForABSA)
41
+ repo = "visolex/phobert-absa-smartphone"
42
+ tokenizer = AutoTokenizer.from_pretrained(repo, trust_remote_code=True)
43
+ model = AutoModel.from_pretrained(repo, trust_remote_code=True)
44
+ model.eval()
45
+
46
+ def predict_absa_multi(
47
+ text: str,
48
+ aspect_labels: list[str],
49
+ sentiment_labels: list[str],
50
+ threshold: float = 0.5
51
+ ) -> list[tuple[str,str]]:
52
+ inputs = tokenizer(
53
+ text,
54
+ return_tensors="pt",
55
+ padding=True,
56
+ truncation=True,
57
+ max_length=256
58
+ )
59
+ inputs.pop("token_type_ids", None)
60
+
61
+ with torch.no_grad():
62
+ out = model(**inputs)
63
+
64
+ # out.logits có shape [1, A, S+1]
65
+ logits = out.logits.squeeze(0) # [A, S+1]
66
+ probs = torch.softmax(logits, dim=-1) # [A, S+1]
67
+
68
+ num_s = len(sentiment_labels)
69
+ none_id = probs.size(-1) - 1 # chỉ số của lớp "none"
70
+ results = []
71
+
72
+ for i, asp in enumerate(aspect_labels):
73
+ prob_i = probs[i]
74
+ pred_id = int(prob_i.argmax().item())
75
+
76
+ if pred_id != none_id and pred_id < num_s:
77
+ score = prob_i[pred_id].item()
78
+ if score >= threshold:
79
+ results.append((asp, sentiment_labels[pred_id].lower()))
80
+
81
+ return results
82
+
83
+
84
+ text = "mới mua được một tuần pin bốn nghìn mà quá tệ cảm ứng hơi đơ nhận sim bị lỗi."
85
+ preds = predict_absa_multi(text, aspect_labels, sentiment_labels, threshold=0.2)
86
+ print(preds)
87
+ # ➔ [('BATTERY','negative'), ('PERFORMANCE','negative'), ...]
88
+ ```