File size: 4,294 Bytes
6eb2be8
 
 
 
 
 
 
 
 
 
 
 
 
 
71f174e
6eb2be8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102

---
license: mit
language: en
pipeline_tag: text-classification
tags:
- text-classification
- multi-label
- emotion-classification
- ensemble
- deberta
- roberta
---

# Ensemble Model untuk Klasifikasi Emosi Multi-Label

Ini adalah repositori untuk sistem model *ensemble* yang meraih peringkat pertama dalam tugas klasifikasi emosi multi-label.
Sistem ini menggabungkan dua model kuat, **DeBERTa-v3-Large** dan **RoBERTa-Large**, yang dilatih dengan teknik LLRD (Layer-wise Learning Rate Decay) dan Focal Loss.

## Komponen Ensemble
- **`deberta_model`**: Model `microsoft/deberta-v3-large` yang telah di-fine-tune.
- **`roberta_model`**: Model `roberta-large` yang telah di-fine-tune.
- **`best_thresholds.json`**: Array berisi 14 nilai *threshold* optimal untuk setiap label, yang digunakan pada hasil rata-rata probabilitas kedua model.

## Cara Menggunakan

Berikut adalah contoh kode untuk memuat semua komponen dan melakukan prediksi dengan *ensemble* ini:

```python
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from scipy.special import expit as sigmoid
import json
import requests
import numpy as np

# -- Informasi Repositori --
REPO_ID = "Trentz/emotion-classification-ensemble"
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

# -- Label Mapping --
LABELS = ['amusement', 'anger', 'annoyance', 'caring', 'confusion', 'disappointment', 'disgust', 'embarrassment', 'excitement', 'fear', 'gratitude', 'joy', 'love', 'sadness']

class EmotionEnsemble:
    def __init__(self, repo_id, device="cpu"):
        self.device = device
        print("Memuat semua komponen model...")
        
        # Muat DeBERTa
        self.deberta_tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder="deberta_model")
        self.deberta_model = AutoModelForSequenceClassification.from_pretrained(repo_id, subfolder="deberta_model").to(self.device).eval()

        # Muat RoBERTa
        self.roberta_tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder="roberta_model")
        self.roberta_model = AutoModelForSequenceClassification.from_pretrained(repo_id, subfolder="roberta_model").to(self.device).eval()

        # Muat thresholds
        thresholds_url = f"[https://huggingface.co/](https://huggingface.co/)Trentz/emotion-classification-ensemble/resolve/main/best_thresholds.json"
        response = requests.get(thresholds_url)
        self.thresholds = torch.tensor(response.json(), device=self.device)
        
        print("Semua komponen berhasil dimuat.")

    def predict(self, text: str):
        with torch.no_grad():
            # Prediksi DeBERTa
            deberta_inputs = self.deberta_tokenizer(text, return_tensors="pt", truncation=True, padding=True).to(self.device)
            deberta_probs = torch.sigmoid(self.deberta_model(**deberta_inputs).logits).squeeze()

            # Prediksi RoBERTa
            roberta_inputs = self.roberta_tokenizer(text, return_tensors="pt", truncation=True, padding=True).to(self.device)
            roberta_probs = torch.sigmoid(self.roberta_model(**roberta_inputs).logits).squeeze()

            # Rata-ratakan probabilitas
            avg_probs = (deberta_probs + roberta_probs) / 2.0

            # Terapkan threshold & logika "Best Guess"
            preds = (avg_probs > self.thresholds).int()
            if preds.sum() == 0:
                best_guess_idx = torch.argmax(avg_probs).item()
                final_labels = [LABELS[best_guess_idx]]
            else:
                final_labels = [LABELS[i] for i, pred in enumerate(preds) if pred == 1]
                
            return { "text": text, "predicted_emotions": final_labels, "scores": avg_probs.cpu().tolist() }

# -- Contoh Penggunaan --
# Inisialisasi model ensemble
ensemble_model = EmotionEnsemble(REPO_ID, device=DEVICE)

# Prediksi teks
example_text = "This is amazing! Thank you so much for everything, I really love it."
result = ensemble_model.predict(example_text)
print(result)
# Diharapkan output mengandung: 'amusement', 'excitement', 'joy', 'love', 'gratitude'

example_text_2 = "I can't believe you would do that. It's so annoying and disappointing."
result_2 = ensemble_model.predict(example_text_2)
print(result_2)
# Diharapkan output mengandung: 'annoyance', 'disappointment', 'anger'