emotion-classification-ensemble / README.md

Update README.md

71f174e verified 8 months ago

4.29 kB


	---
	license: mit
	language: en
	pipeline_tag: text-classification
	tags:
	- text-classification
	- multi-label
	- emotion-classification
	- ensemble
	- deberta
	- roberta
	---

	# Ensemble Model untuk Klasifikasi Emosi Multi-Label

	Ini adalah repositori untuk sistem model ensemble yang meraih peringkat pertama dalam tugas klasifikasi emosi multi-label.
	Sistem ini menggabungkan dua model kuat, DeBERTa-v3-Large dan RoBERTa-Large, yang dilatih dengan teknik LLRD (Layer-wise Learning Rate Decay) dan Focal Loss.

	## Komponen Ensemble
	- `deberta_model`: Model `microsoft/deberta-v3-large` yang telah di-fine-tune.
	- `roberta_model`: Model `roberta-large` yang telah di-fine-tune.
	- `best_thresholds.json`: Array berisi 14 nilai threshold optimal untuk setiap label, yang digunakan pada hasil rata-rata probabilitas kedua model.

	## Cara Menggunakan

	Berikut adalah contoh kode untuk memuat semua komponen dan melakukan prediksi dengan ensemble ini:

	```python
	import torch
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	from scipy.special import expit as sigmoid
	import json
	import requests
	import numpy as np

	# -- Informasi Repositori --
	REPO_ID = "Trentz/emotion-classification-ensemble"
	DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

	# -- Label Mapping --
	LABELS = ['amusement', 'anger', 'annoyance', 'caring', 'confusion', 'disappointment', 'disgust', 'embarrassment', 'excitement', 'fear', 'gratitude', 'joy', 'love', 'sadness']

	class EmotionEnsemble:
	def __init__(self, repo_id, device="cpu"):
	self.device = device
	print("Memuat semua komponen model...")

	# Muat DeBERTa
	self.deberta_tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder="deberta_model")
	self.deberta_model = AutoModelForSequenceClassification.from_pretrained(repo_id, subfolder="deberta_model").to(self.device).eval()

	# Muat RoBERTa
	self.roberta_tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder="roberta_model")
	self.roberta_model = AutoModelForSequenceClassification.from_pretrained(repo_id, subfolder="roberta_model").to(self.device).eval()

	# Muat thresholds
	thresholds_url = f"[https://huggingface.co/](https://huggingface.co/)Trentz/emotion-classification-ensemble/resolve/main/best_thresholds.json"
	response = requests.get(thresholds_url)
	self.thresholds = torch.tensor(response.json(), device=self.device)

	print("Semua komponen berhasil dimuat.")

	def predict(self, text: str):
	with torch.no_grad():
	# Prediksi DeBERTa
	deberta_inputs = self.deberta_tokenizer(text, return_tensors="pt", truncation=True, padding=True).to(self.device)
	deberta_probs = torch.sigmoid(self.deberta_model(**deberta_inputs).logits).squeeze()

	# Prediksi RoBERTa
	roberta_inputs = self.roberta_tokenizer(text, return_tensors="pt", truncation=True, padding=True).to(self.device)
	roberta_probs = torch.sigmoid(self.roberta_model(**roberta_inputs).logits).squeeze()

	# Rata-ratakan probabilitas
	avg_probs = (deberta_probs + roberta_probs) / 2.0

	# Terapkan threshold & logika "Best Guess"
	preds = (avg_probs > self.thresholds).int()
	if preds.sum() == 0:
	best_guess_idx = torch.argmax(avg_probs).item()
	final_labels = [LABELS[best_guess_idx]]
	else:
	final_labels = [LABELS[i] for i, pred in enumerate(preds) if pred == 1]

	return { "text": text, "predicted_emotions": final_labels, "scores": avg_probs.cpu().tolist() }

	# -- Contoh Penggunaan --
	# Inisialisasi model ensemble
	ensemble_model = EmotionEnsemble(REPO_ID, device=DEVICE)

	# Prediksi teks
	example_text = "This is amazing! Thank you so much for everything, I really love it."
	result = ensemble_model.predict(example_text)
	print(result)
	# Diharapkan output mengandung: 'amusement', 'excitement', 'joy', 'love', 'gratitude'

	example_text_2 = "I can't believe you would do that. It's so annoying and disappointing."
	result_2 = ensemble_model.predict(example_text_2)
	print(result_2)
	# Diharapkan output mengandung: 'annoyance', 'disappointment', 'anger'