| | ---
|
| | datasets:
|
| | - stanfordnlp/imdb
|
| | language:
|
| | - en
|
| | metrics:
|
| | - accuracy
|
| | - precision
|
| | - recall
|
| | - f1
|
| | base_model:
|
| | - facebook/bart-base
|
| | - google-bert/bert-base-uncased
|
| | - EleutherAI/gpt-neo-2.7B
|
| | pipeline_tag: text-classification
|
| | license: apache-2.0
|
| | ---
|
| | |
| | # 📝 Model Card: ensemble-majority-voting-imdb |
| |
|
| | ## 🔍 Introduction |
| | The `wakaflocka17/ensemble-majority-voting-imdb` model is a majority-voting ensemble of three fine-tuned sentiment classifiers (`bert-imdb-finetuned`, `bart-imdb-finetuned`, `gptneo-imdb-finetuned`) on the IMDb dataset. Each model votes on the sentiment label and the ensemble returns the label with the most votes, improving overall accuracy. |
| |
|
| | ## 📊 Evaluation Metrics |
| | | Metric | Value | |
| | |-----------|---------| |
| | | Accuracy | 0.93296 | |
| | | Precision | 0.9559 | |
| | | Recall | 0.9078 | |
| | | F1-score | 0.9312 | |
| |
|
| | ## ⚙️ Training Parameters |
| | | Parameter | Values | |
| | |-----------------------|--------------------------------------------------| |
| | | Models in ensemble | `bert_base_uncased`, `bart_base`, `gpt_neo_2_7b` | |
| | | Repo for ensemble | `models/ensemble_majority_voting` | |
| | | Batch size (eval) | 64 | |
| |
|
| | ## 🚀 Example of use in Colab |
| |
|
| | #### Installing dependencies |
| | ```bash |
| | !pip install --upgrade transformers huggingface_hub |
| | ``` |
| | #### (Optional) Authentication for private models |
| | ```python |
| | from huggingface_hub import login |
| | login(token="hf_yourhftoken") |
| | ``` |
| | #### Loading models and creating ensemble pipeline |
| | ```python |
| | from transformers import AutoTokenizer, AutoModelForSequenceClassification, TextClassificationPipeline |
| | from collections import Counter |
| | |
| | # List of fine-tuned model repo IDs |
| | model_ids = [ |
| | "wakaflocka17/bert-imdb-finetuned", |
| | "wakaflocka17/bart-imdb-finetuned", |
| | "wakaflocka17/gptneo-imdb-finetuned" |
| | ] |
| | ``` |
| | #### Load pipelines |
| | ```python |
| | pipelines = [] |
| | for repo_id in model_ids: |
| | tokenizer = AutoTokenizer.from_pretrained(repo_id) |
| | model = AutoModelForSequenceClassification.from_pretrained(repo_id) |
| | model.config.id2label = {0: 'NEGATIVE', 1: 'POSITIVE'} |
| | pipelines.append(TextClassificationPipeline(model=model, tokenizer=tokenizer, return_all_scores=False)) |
| | ``` |
| | #### Ensemble prediction function |
| | ```python |
| | def ensemble_predict(text): |
| | votes = [] |
| | # Collect each model's vote along with its name |
| | for model_id, pipe in zip(model_ids, pipelines): |
| | label = pipe(text)[0]['label'] |
| | votes.append({ |
| | "model": model_id, # or model_id.split("/")[-1] for just the short name |
| | "label": label |
| | }) |
| | # Determine majority label |
| | majority_label = Counter([v["label"] for v in votes]).most_common(1)[0][0] |
| | return { |
| | "ensemble_label": majority_label, |
| | "individual_votes": votes |
| | } |
| | ``` |
| | #### Inference on a text example |
| | ```python |
| | testo = "This movie was absolutely fantastic—wonderful performances and a gripping story!" |
| | result = ensemble_predict(testo) |
| | print(result) |
| | # Example output: |
| | # { |
| | # 'ensemble_label': 'POSITIVE', |
| | # 'individual_votes': [ |
| | # {'model': 'wakaflocka17/bert-imdb-finetuned', 'label': 'POSITIVE'}, |
| | # {'model': 'wakaflocka17/bart-imdb-finetuned', 'label': 'NEGATIVE'}, |
| | # {'model': 'wakaflocka17/gptneo-imdb-finetuned', 'label': 'POSITIVE'} |
| | # ] |
| | # } |
| | ``` |
| | ## 📖 How to cite |
| | If you use this model in your work, you can cite it as: |
| | ```latex |
| | @misc{Sentiment-Project, |
| | author = {Francesco Congiu}, |
| | title = {Sentiment Analysis with Pretrained, Fine-tuned and Ensemble Transformer Models}, |
| | howpublished = {\url{https://github.com/wakaflocka17/DLA_LLMSANALYSIS}}, |
| | year = {2025} |
| | } |
| | ``` |
| | ## 🔗 Reference Repository |
| | > All the file structure and script examples can be found at: |
| | > https://github.com/wakaflocka17/DLA_LLMSANALYSIS/tree/main |