finetuning_notebook / README.md
ysakhale's picture
Add files using upload-large-folder tool
5a31c4e verified
metadata
tags:
  - text-classification
  - transformers
  - distilbert
datasets:
  - rlogh/superhero-texts
metrics:
  - accuracy
  - f1
license: apache-2.0

DistilBERT Fine-tuned on Superhero Texts

Model Summary

  • Task: Binary text classification
  • Classes: DC vs Marvel
  • Base Model: distilbert-base-uncased
  • Training Setup: 3 epochs, batch size 16, learning rate 2e-5
  • Evaluation Metrics: Accuracy, precision, recall, F1 score

Dataset

Preprocessing

  • Tokenization with DistilBERT tokenizer
  • Max sequence length: 256
  • Labels encoded: DC = 0, Marvel = 1

Results

  • Accuracy and F1 reported on test set
  • Confusion matrix included in notebook

Error Analysis

The model occasionally misclassifies superheroes with ambiguous or overlapping traits (e.g., similarities between certain DC and Marvel characters).
This suggests the model may rely heavily on explicit universe keywords in the text.

Intended Use

  • Educational purpose only
  • Demonstration of fine-tuning transformers on text classification
  • Not suitable for production deployment

License

Apache-2.0

Hardware/Compute

  • Trained on 1 GPU (Colab environment)
  • Training time: ~10 minutes