text / README.md
scottymcgee's picture
Update README.md
ff8ee0e verified
metadata
library_name: transformers
license: apache-2.0
base_model: distilbert-base-uncased
language:
  - en
tags:
  - text-classification
  - sequence-classification
  - youtube
  - music-genres
  - 7-class
  - distilbert
  - generated_from_trainer
datasets:
  - custom-youtube-music-genres
metrics:
  - accuracy
  - f1
  - precision
  - recall
model-index:
  - name: text
    results:
      - task:
          type: text-classification
          name: Text Classification
        dataset:
          name: YouTube Music Genre Comments (custom)
          type: custom
          split: validation
        metrics:
          - type: accuracy
            value: 1
          - type: f1
            value: 1
          - type: precision
            value: 1
          - type: recall
            value: 1

text

A DistilBERT-based 7-class text classifier fine-tuned to predict the music genre associated with a YouTube comment.
Inputs are raw comment strings; outputs are one of seven genre labels.

Base model: distilbert-base-uncased

Results (evaluation set)

  • Loss: 0.0675
  • Accuracy: 1.0
  • F1: 1.0
  • Precision: 1.0
  • Recall: 1.0

Training curves (from Trainer logs)

Training Loss Epoch Step Validation Loss Accuracy F1 Precision Recall
1.2677 1.0 84 1.0653 0.9107 0.9097 0.9147 0.9107
0.4341 2.0 168 0.3179 0.9821 0.9820 0.9829 0.9821
0.0963 3.0 252 0.0865 1.0 1.0 1.0 1.0
0.0568 4.0 336 0.0427 1.0 1.0 1.0 1.0
0.0414 5.0 420 0.0356 1.0 1.0 1.0 1.0

Note: Perfect scores may indicate an easy task, strong regularization, or possible data leakage. Validate on a held-out set and/or external data.

Model description

  • Architecture: DistilBERT encoder with a linear classification head
  • Task: Multi-class text classification (7 genres)
  • Input: A single YouTube comment (str)
  • Output: Predicted genre label + scores

Labels

Classical rock metal electronic R&B pop jazz

Intended uses & limitations

Intended uses

  • Exploratory analysis of audience/genre engagement on music videos
  • Routing comments to genre-specific moderation or analytics queues
  • Downstream features (e.g., per-genre dashboards)

Limitations

  • Trained on YouTube comments; may not generalize to other platforms/domains
  • Genre labels reflect the training taxonomy; ambiguous or mixed-genre comments can be misclassified
  • Not designed for toxicity, sentiment, or demographic inference

Ethical considerations

  • Comments can contain personal data; ensure collection complies with platform ToS and privacy laws
  • Avoid using predictions to profile individuals

How to use

from transformers import AutoTokenizer, AutoModelForSequenceClassification, TextClassificationPipeline

repo_id = "scottymcgee/text-classifier"  # update if different
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForSequenceClassification.from_pretrained(repo_id)

pipe = TextClassificationPipeline(model=model, tokenizer=tokenizer, return_all_scores=False)
pipe("this chorus is so catchy, reminds me of late 90s production")