wakaflocka17
/

ensemble-majority-voting-imdb

Text Classification

Model card Files Files and versions

ensemble-majority-voting-imdb / README.md

wakaflocka17's picture

Update README.md

194ad92 verified 10 months ago

|

history blame contribute delete

3.86 kB

	---
	datasets:
	- stanfordnlp/imdb
	language:
	- en
	metrics:
	- accuracy
	- precision
	- recall
	- f1
	base_model:
	- facebook/bart-base
	- google-bert/bert-base-uncased
	- EleutherAI/gpt-neo-2.7B
	pipeline_tag: text-classification
	license: apache-2.0
	---

	# 📝 Model Card: ensemble-majority-voting-imdb

	## 🔍 Introduction
	The `wakaflocka17/ensemble-majority-voting-imdb` model is a majority-voting ensemble of three fine-tuned sentiment classifiers (`bert-imdb-finetuned`, `bart-imdb-finetuned`, `gptneo-imdb-finetuned`) on the IMDb dataset. Each model votes on the sentiment label and the ensemble returns the label with the most votes, improving overall accuracy.

	## 📊 Evaluation Metrics
	\| Metric \| Value \|
	\|-----------\|---------\|
	\| Accuracy \| 0.93296 \|
	\| Precision \| 0.9559 \|
	\| Recall \| 0.9078 \|
	\| F1-score \| 0.9312 \|

	## ⚙️ Training Parameters
	\| Parameter \| Values \|
	\|-----------------------\|--------------------------------------------------\|
	\| Models in ensemble \| `bert_base_uncased`, `bart_base`, `gpt_neo_2_7b` \|
	\| Repo for ensemble \| `models/ensemble_majority_voting` \|
	\| Batch size (eval) \| 64 \|

	## 🚀 Example of use in Colab

	#### Installing dependencies
	```bash
	!pip install --upgrade transformers huggingface_hub
	```
	#### (Optional) Authentication for private models
	```python
	from huggingface_hub import login
	login(token="hf_yourhftoken")
	```
	#### Loading models and creating ensemble pipeline
	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification, TextClassificationPipeline
	from collections import Counter

	# List of fine-tuned model repo IDs
	model_ids = [
	"wakaflocka17/bert-imdb-finetuned",
	"wakaflocka17/bart-imdb-finetuned",
	"wakaflocka17/gptneo-imdb-finetuned"
	]
	```
	#### Load pipelines
	```python
	pipelines = []
	for repo_id in model_ids:
	tokenizer = AutoTokenizer.from_pretrained(repo_id)
	model = AutoModelForSequenceClassification.from_pretrained(repo_id)
	model.config.id2label = {0: 'NEGATIVE', 1: 'POSITIVE'}
	pipelines.append(TextClassificationPipeline(model=model, tokenizer=tokenizer, return_all_scores=False))
	```
	#### Ensemble prediction function
	```python
	def ensemble_predict(text):
	votes = []
	# Collect each model's vote along with its name
	for model_id, pipe in zip(model_ids, pipelines):
	label = pipe(text)[0]['label']
	votes.append({
	"model": model_id, # or model_id.split("/")[-1] for just the short name
	"label": label
	})
	# Determine majority label
	majority_label = Counter([v["label"] for v in votes]).most_common(1)[0][0]
	return {
	"ensemble_label": majority_label,
	"individual_votes": votes
	}
	```
	#### Inference on a text example
	```python
	testo = "This movie was absolutely fantastic—wonderful performances and a gripping story!"
	result = ensemble_predict(testo)
	print(result)
	# Example output:
	# {
	# 'ensemble_label': 'POSITIVE',
	# 'individual_votes': [
	# {'model': 'wakaflocka17/bert-imdb-finetuned', 'label': 'POSITIVE'},
	# {'model': 'wakaflocka17/bart-imdb-finetuned', 'label': 'NEGATIVE'},
	# {'model': 'wakaflocka17/gptneo-imdb-finetuned', 'label': 'POSITIVE'}
	# ]
	# }
	```
	## 📖 How to cite
	If you use this model in your work, you can cite it as:
	```latex
	@misc{Sentiment-Project,
	author = {Francesco Congiu},
	title = {Sentiment Analysis with Pretrained, Fine-tuned and Ensemble Transformer Models},
	howpublished = {\url{https://github.com/wakaflocka17/DLA_LLMSANALYSIS}},
	year = {2025}
	}
	```
	## 🔗 Reference Repository
	> All the file structure and script examples can be found at:
	> https://github.com/wakaflocka17/DLA_LLMSANALYSIS/tree/main