text / README.md

scottymcgee

Update README.md

ff8ee0e verified 3 months ago

preview code

raw

history blame contribute delete

3.43 kB

metadata

library_name: transformers
license: apache-2.0
base_model: distilbert-base-uncased
language:
  - en
tags:
  - text-classification
  - sequence-classification
  - youtube
  - music-genres
  - 7-class
  - distilbert
  - generated_from_trainer
datasets:
  - custom-youtube-music-genres
metrics:
  - accuracy
  - f1
  - precision
  - recall
model-index:
  - name: text
    results:
      - task:
          type: text-classification
          name: Text Classification
        dataset:
          name: YouTube Music Genre Comments (custom)
          type: custom
          split: validation
        metrics:
          - type: accuracy
            value: 1
          - type: f1
            value: 1
          - type: precision
            value: 1
          - type: recall
            value: 1

text

A DistilBERT-based 7-class text classifier fine-tuned to predict the music genre associated with a YouTube comment.
Inputs are raw comment strings; outputs are one of seven genre labels.

Base model: distilbert-base-uncased

Results (evaluation set)

Loss: 0.0675
Accuracy: 1.0
F1: 1.0
Precision: 1.0
Recall: 1.0

Training curves (from `Trainer` logs)

Training Loss	Epoch	Step	Validation Loss	Accuracy	F1	Precision	Recall
1.2677	1.0	84	1.0653	0.9107	0.9097	0.9147	0.9107
0.4341	2.0	168	0.3179	0.9821	0.9820	0.9829	0.9821
0.0963	3.0	252	0.0865	1.0	1.0	1.0	1.0
0.0568	4.0	336	0.0427	1.0	1.0	1.0	1.0
0.0414	5.0	420	0.0356	1.0	1.0	1.0	1.0

Note: Perfect scores may indicate an easy task, strong regularization, or possible data leakage. Validate on a held-out set and/or external data.

Model description

Architecture: DistilBERT encoder with a linear classification head
Task: Multi-class text classification (7 genres)
Input: A single YouTube comment (str)
Output: Predicted genre label + scores

Labels

Classical rock metal electronic R&B pop jazz

Intended uses & limitations

Intended uses

Exploratory analysis of audience/genre engagement on music videos
Routing comments to genre-specific moderation or analytics queues
Downstream features (e.g., per-genre dashboards)

Limitations

Trained on YouTube comments; may not generalize to other platforms/domains
Genre labels reflect the training taxonomy; ambiguous or mixed-genre comments can be misclassified
Not designed for toxicity, sentiment, or demographic inference

Ethical considerations

Comments can contain personal data; ensure collection complies with platform ToS and privacy laws
Avoid using predictions to profile individuals

How to use

from transformers import AutoTokenizer, AutoModelForSequenceClassification, TextClassificationPipeline

repo_id = "scottymcgee/text-classifier"  # update if different
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForSequenceClassification.from_pretrained(repo_id)

pipe = TextClassificationPipeline(model=model, tokenizer=tokenizer, return_all_scores=False)
pipe("this chorus is so catchy, reminds me of late 90s production")