metadata
library_name: transformers
license: apache-2.0
base_model: distilbert-base-uncased
language:
- en
tags:
- text-classification
- sequence-classification
- youtube
- music-genres
- 7-class
- distilbert
- generated_from_trainer
datasets:
- custom-youtube-music-genres
metrics:
- accuracy
- f1
- precision
- recall
model-index:
- name: text
results:
- task:
type: text-classification
name: Text Classification
dataset:
name: YouTube Music Genre Comments (custom)
type: custom
split: validation
metrics:
- type: accuracy
value: 1
- type: f1
value: 1
- type: precision
value: 1
- type: recall
value: 1
text
A DistilBERT-based 7-class text classifier fine-tuned to predict the music genre associated with a YouTube comment.
Inputs are raw comment strings; outputs are one of seven genre labels.
Base model:
distilbert-base-uncased
Results (evaluation set)
- Loss: 0.0675
- Accuracy: 1.0
- F1: 1.0
- Precision: 1.0
- Recall: 1.0
Training curves (from Trainer logs)
| Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 | Precision | Recall |
|---|---|---|---|---|---|---|---|
| 1.2677 | 1.0 | 84 | 1.0653 | 0.9107 | 0.9097 | 0.9147 | 0.9107 |
| 0.4341 | 2.0 | 168 | 0.3179 | 0.9821 | 0.9820 | 0.9829 | 0.9821 |
| 0.0963 | 3.0 | 252 | 0.0865 | 1.0 | 1.0 | 1.0 | 1.0 |
| 0.0568 | 4.0 | 336 | 0.0427 | 1.0 | 1.0 | 1.0 | 1.0 |
| 0.0414 | 5.0 | 420 | 0.0356 | 1.0 | 1.0 | 1.0 | 1.0 |
Note: Perfect scores may indicate an easy task, strong regularization, or possible data leakage. Validate on a held-out set and/or external data.
Model description
- Architecture: DistilBERT encoder with a linear classification head
- Task: Multi-class text classification (7 genres)
- Input: A single YouTube comment (
str) - Output: Predicted genre label + scores
Labels
Classical rock metal electronic R&B pop jazz
Intended uses & limitations
Intended uses
- Exploratory analysis of audience/genre engagement on music videos
- Routing comments to genre-specific moderation or analytics queues
- Downstream features (e.g., per-genre dashboards)
Limitations
- Trained on YouTube comments; may not generalize to other platforms/domains
- Genre labels reflect the training taxonomy; ambiguous or mixed-genre comments can be misclassified
- Not designed for toxicity, sentiment, or demographic inference
Ethical considerations
- Comments can contain personal data; ensure collection complies with platform ToS and privacy laws
- Avoid using predictions to profile individuals
How to use
from transformers import AutoTokenizer, AutoModelForSequenceClassification, TextClassificationPipeline
repo_id = "scottymcgee/text-classifier" # update if different
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForSequenceClassification.from_pretrained(repo_id)
pipe = TextClassificationPipeline(model=model, tokenizer=tokenizer, return_all_scores=False)
pipe("this chorus is so catchy, reminds me of late 90s production")