wakaflocka17
/

ensemble-majority-voting-imdb

Text Classification

English

Model card Files Files and versions

xet

Community

wakaflocka17 commited on May 14, 2025

Commit

194ad92

verified ·

1 Parent(s): c169ea8

Update README.md

Browse files

Files changed (1) hide show

README.md +116 -16

README.md CHANGED Viewed

@@ -1,16 +1,116 @@
----
-datasets:
-- stanfordnlp/imdb
-language:
-- en
-metrics:
-- accuracy
-- precision
-- recall
-- f1
-base_model:
-- facebook/bart-base
-- google-bert/bert-base-uncased
-- EleutherAI/gpt-neo-2.7B
-pipeline_tag: text-classification
----

+---
+datasets:
+- stanfordnlp/imdb
+language:
+- en
+metrics:
+- accuracy
+- precision
+- recall
+- f1
+base_model:
+- facebook/bart-base
+- google-bert/bert-base-uncased
+- EleutherAI/gpt-neo-2.7B
+pipeline_tag: text-classification
+license: apache-2.0
+---
+# 📝 Model Card: ensemble-majority-voting-imdb
+## 🔍 Introduction
+The `wakaflocka17/ensemble-majority-voting-imdb` model is a majority-voting ensemble of three fine-tuned sentiment classifiers (`bert-imdb-finetuned`, `bart-imdb-finetuned`, `gptneo-imdb-finetuned`) on the IMDb dataset. Each model votes on the sentiment label and the ensemble returns the label with the most votes, improving overall accuracy.
+## 📊 Evaluation Metrics
+| Metric    | Value   |
+|-----------|---------|
+| Accuracy  | 0.93296 |
+| Precision | 0.9559  |
+| Recall    | 0.9078  |
+| F1-score  | 0.9312  |
+## ⚙️ Training Parameters
+| Parameter             | Values                                           |
+|-----------------------|--------------------------------------------------|
+| Models in ensemble    | `bert_base_uncased`, `bart_base`, `gpt_neo_2_7b` |
+| Repo for ensemble     | `models/ensemble_majority_voting`                |
+| Batch size (eval)     | 64                                               |
+## 🚀 Example of use in Colab
+#### Installing dependencies
+```bash
+!pip install --upgrade transformers huggingface_hub
+```
+#### (Optional) Authentication for private models
+```python
+from huggingface_hub import login
+login(token="hf_yourhftoken")
+```
+#### Loading models and creating ensemble pipeline
+```python
+from transformers import AutoTokenizer, AutoModelForSequenceClassification, TextClassificationPipeline
+from collections import Counter
+# List of fine-tuned model repo IDs
+model_ids = [
+    "wakaflocka17/bert-imdb-finetuned",
+    "wakaflocka17/bart-imdb-finetuned",
+    "wakaflocka17/gptneo-imdb-finetuned"
+]
+```
+#### Load pipelines
+```python
+pipelines = []
+for repo_id in model_ids:
+    tokenizer = AutoTokenizer.from_pretrained(repo_id)
+    model = AutoModelForSequenceClassification.from_pretrained(repo_id)
+    model.config.id2label = {0: 'NEGATIVE', 1: 'POSITIVE'}
+    pipelines.append(TextClassificationPipeline(model=model, tokenizer=tokenizer, return_all_scores=False))
+```
+#### Ensemble prediction function
+```python
+def ensemble_predict(text):
+    votes = []
+    # Collect each model's vote along with its name
+    for model_id, pipe in zip(model_ids, pipelines):
+        label = pipe(text)[0]['label']
+        votes.append({
+            "model": model_id,   # or model_id.split("/")[-1] for just the short name
+            "label": label
+        })
+    # Determine majority label
+    majority_label = Counter([v["label"] for v in votes]).most_common(1)[0][0]
+    return {
+        "ensemble_label": majority_label,
+        "individual_votes": votes
+    }
+```
+#### Inference on a text example
+```python
+testo = "This movie was absolutely fantastic—wonderful performances and a gripping story!"
+result = ensemble_predict(testo)
+print(result)
+# Example output:
+# {
+#   'ensemble_label': 'POSITIVE',
+#   'individual_votes': [
+#       {'model': 'wakaflocka17/bert-imdb-finetuned', 'label': 'POSITIVE'},
+#       {'model': 'wakaflocka17/bart-imdb-finetuned', 'label': 'NEGATIVE'},
+#       {'model': 'wakaflocka17/gptneo-imdb-finetuned', 'label': 'POSITIVE'}
+#   ]
+# }
+```
+## 📖 How to cite
+If you use this model in your work, you can cite it as:
+```latex
+@misc{Sentiment-Project,
+  author       = {Francesco Congiu},
+  title        = {Sentiment Analysis with Pretrained, Fine-tuned and Ensemble Transformer Models},
+  howpublished = {\url{https://github.com/wakaflocka17/DLA_LLMSANALYSIS}},
+  year         = {2025}
+}
+```
+## 🔗 Reference Repository
+> All the file structure and script examples can be found at:
+> https://github.com/wakaflocka17/DLA_LLMSANALYSIS/tree/main