wakaflocka17's picture
Update README.md
194ad92 verified
---
datasets:
- stanfordnlp/imdb
language:
- en
metrics:
- accuracy
- precision
- recall
- f1
base_model:
- facebook/bart-base
- google-bert/bert-base-uncased
- EleutherAI/gpt-neo-2.7B
pipeline_tag: text-classification
license: apache-2.0
---
# 📝 Model Card: ensemble-majority-voting-imdb
## 🔍 Introduction
The `wakaflocka17/ensemble-majority-voting-imdb` model is a majority-voting ensemble of three fine-tuned sentiment classifiers (`bert-imdb-finetuned`, `bart-imdb-finetuned`, `gptneo-imdb-finetuned`) on the IMDb dataset. Each model votes on the sentiment label and the ensemble returns the label with the most votes, improving overall accuracy.
## 📊 Evaluation Metrics
| Metric | Value |
|-----------|---------|
| Accuracy | 0.93296 |
| Precision | 0.9559 |
| Recall | 0.9078 |
| F1-score | 0.9312 |
## ⚙️ Training Parameters
| Parameter | Values |
|-----------------------|--------------------------------------------------|
| Models in ensemble | `bert_base_uncased`, `bart_base`, `gpt_neo_2_7b` |
| Repo for ensemble | `models/ensemble_majority_voting` |
| Batch size (eval) | 64 |
## 🚀 Example of use in Colab
#### Installing dependencies
```bash
!pip install --upgrade transformers huggingface_hub
```
#### (Optional) Authentication for private models
```python
from huggingface_hub import login
login(token="hf_yourhftoken")
```
#### Loading models and creating ensemble pipeline
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification, TextClassificationPipeline
from collections import Counter
# List of fine-tuned model repo IDs
model_ids = [
"wakaflocka17/bert-imdb-finetuned",
"wakaflocka17/bart-imdb-finetuned",
"wakaflocka17/gptneo-imdb-finetuned"
]
```
#### Load pipelines
```python
pipelines = []
for repo_id in model_ids:
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForSequenceClassification.from_pretrained(repo_id)
model.config.id2label = {0: 'NEGATIVE', 1: 'POSITIVE'}
pipelines.append(TextClassificationPipeline(model=model, tokenizer=tokenizer, return_all_scores=False))
```
#### Ensemble prediction function
```python
def ensemble_predict(text):
votes = []
# Collect each model's vote along with its name
for model_id, pipe in zip(model_ids, pipelines):
label = pipe(text)[0]['label']
votes.append({
"model": model_id, # or model_id.split("/")[-1] for just the short name
"label": label
})
# Determine majority label
majority_label = Counter([v["label"] for v in votes]).most_common(1)[0][0]
return {
"ensemble_label": majority_label,
"individual_votes": votes
}
```
#### Inference on a text example
```python
testo = "This movie was absolutely fantastic—wonderful performances and a gripping story!"
result = ensemble_predict(testo)
print(result)
# Example output:
# {
# 'ensemble_label': 'POSITIVE',
# 'individual_votes': [
# {'model': 'wakaflocka17/bert-imdb-finetuned', 'label': 'POSITIVE'},
# {'model': 'wakaflocka17/bart-imdb-finetuned', 'label': 'NEGATIVE'},
# {'model': 'wakaflocka17/gptneo-imdb-finetuned', 'label': 'POSITIVE'}
# ]
# }
```
## 📖 How to cite
If you use this model in your work, you can cite it as:
```latex
@misc{Sentiment-Project,
author = {Francesco Congiu},
title = {Sentiment Analysis with Pretrained, Fine-tuned and Ensemble Transformer Models},
howpublished = {\url{https://github.com/wakaflocka17/DLA_LLMSANALYSIS}},
year = {2025}
}
```
## 🔗 Reference Repository
> All the file structure and script examples can be found at:
> https://github.com/wakaflocka17/DLA_LLMSANALYSIS/tree/main