Trentz
/

emotion-classification-ensemble

+---
+license: mit
+language: en
+pipeline_tag: text-classification
+tags:
+- text-classification
+- multi-label
+- emotion-classification
+- ensemble
+- deberta
+- roberta
+---
+# Peringkat 1: Ensemble Model untuk Klasifikasi Emosi Multi-Label
+Ini adalah repositori untuk sistem model *ensemble* yang meraih peringkat pertama dalam tugas klasifikasi emosi multi-label.
+Sistem ini menggabungkan dua model kuat, **DeBERTa-v3-Large** dan **RoBERTa-Large**, yang dilatih dengan teknik LLRD (Layer-wise Learning Rate Decay) dan Focal Loss.
+## Komponen Ensemble
+- **`deberta_model`**: Model `microsoft/deberta-v3-large` yang telah di-fine-tune.
+- **`roberta_model`**: Model `roberta-large` yang telah di-fine-tune.
+- **`best_thresholds.json`**: Array berisi 14 nilai *threshold* optimal untuk setiap label, yang digunakan pada hasil rata-rata probabilitas kedua model.
+## Cara Menggunakan
+Berikut adalah contoh kode untuk memuat semua komponen dan melakukan prediksi dengan *ensemble* ini:
+```python
+import torch
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+from scipy.special import expit as sigmoid
+import json
+import requests
+import numpy as np
+# -- Informasi Repositori --
+REPO_ID = "Trentz/emotion-classification-ensemble"
+DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
+# -- Label Mapping --
+LABELS = ['amusement', 'anger', 'annoyance', 'caring', 'confusion', 'disappointment', 'disgust', 'embarrassment', 'excitement', 'fear', 'gratitude', 'joy', 'love', 'sadness']
+class EmotionEnsemble:
+    def __init__(self, repo_id, device="cpu"):
+        self.device = device
+        print("Memuat semua komponen model...")
+        # Muat DeBERTa
+        self.deberta_tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder="deberta_model")
+        self.deberta_model = AutoModelForSequenceClassification.from_pretrained(repo_id, subfolder="deberta_model").to(self.device).eval()
+        # Muat RoBERTa
+        self.roberta_tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder="roberta_model")
+        self.roberta_model = AutoModelForSequenceClassification.from_pretrained(repo_id, subfolder="roberta_model").to(self.device).eval()
+        # Muat thresholds
+        thresholds_url = f"[https://huggingface.co/](https://huggingface.co/)Trentz/emotion-classification-ensemble/resolve/main/best_thresholds.json"
+        response = requests.get(thresholds_url)
+        self.thresholds = torch.tensor(response.json(), device=self.device)
+        print("Semua komponen berhasil dimuat.")
+    def predict(self, text: str):
+        with torch.no_grad():
+            # Prediksi DeBERTa
+            deberta_inputs = self.deberta_tokenizer(text, return_tensors="pt", truncation=True, padding=True).to(self.device)
+            deberta_probs = torch.sigmoid(self.deberta_model(**deberta_inputs).logits).squeeze()
+            # Prediksi RoBERTa
+            roberta_inputs = self.roberta_tokenizer(text, return_tensors="pt", truncation=True, padding=True).to(self.device)
+            roberta_probs = torch.sigmoid(self.roberta_model(**roberta_inputs).logits).squeeze()
+            # Rata-ratakan probabilitas
+            avg_probs = (deberta_probs + roberta_probs) / 2.0
+            # Terapkan threshold & logika "Best Guess"
+            preds = (avg_probs > self.thresholds).int()
+            if preds.sum() == 0:
+                best_guess_idx = torch.argmax(avg_probs).item()
+                final_labels = [LABELS[best_guess_idx]]
+            else:
+                final_labels = [LABELS[i] for i, pred in enumerate(preds) if pred == 1]
+            return { "text": text, "predicted_emotions": final_labels, "scores": avg_probs.cpu().tolist() }
+# -- Contoh Penggunaan --
+# Inisialisasi model ensemble
+ensemble_model = EmotionEnsemble(REPO_ID, device=DEVICE)
+# Prediksi teks
+example_text = "This is amazing! Thank you so much for everything, I really love it."
+result = ensemble_model.predict(example_text)
+print(result)
+# Diharapkan output mengandung: 'amusement', 'excitement', 'joy', 'love', 'gratitude'
+example_text_2 = "I can't believe you would do that. It's so annoying and disappointing."
+result_2 = ensemble_model.predict(example_text_2)
+print(result_2)
+# Diharapkan output mengandung: 'annoyance', 'disappointment', 'anger'