File size: 3,430 Bytes
9ac1083 ff8ee0e 9ac1083 ff8ee0e 9ac1083 ff8ee0e 9ac1083 ff8ee0e 9ac1083 ff8ee0e 9ac1083 ff8ee0e 9ac1083 ff8ee0e 9ac1083 ff8ee0e 9ac1083 ff8ee0e 9ac1083 ff8ee0e 9ac1083 ff8ee0e 9ac1083 ff8ee0e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 |
---
library_name: transformers
license: apache-2.0
base_model: distilbert-base-uncased
language:
- en
tags:
- text-classification
- sequence-classification
- youtube
- music-genres
- 7-class
- distilbert
- generated_from_trainer
datasets:
- custom-youtube-music-genres
metrics:
- accuracy
- f1
- precision
- recall
model-index:
- name: text
results:
- task:
type: text-classification
name: Text Classification
dataset:
name: YouTube Music Genre Comments (custom)
type: custom
split: validation
metrics:
- type: accuracy
value: 1.0
- type: f1
value: 1.0
- type: precision
value: 1.0
- type: recall
value: 1.0
---
# text
A DistilBERT-based **7-class text classifier** fine-tuned to predict the **music genre** associated with a YouTube comment.
Inputs are raw comment strings; outputs are one of seven genre labels.
> Base model: [`distilbert-base-uncased`](https://huggingface.co/distilbert-base-uncased)
## Results (evaluation set)
- **Loss:** 0.0675
- **Accuracy:** 1.0
- **F1:** 1.0
- **Precision:** 1.0
- **Recall:** 1.0
### Training curves (from `Trainer` logs)
| Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 | Precision | Recall |
|:-------------:|:-----:|:----:|:---------------:|:--------:|:------:|:---------:|:------:|
| 1.2677 | 1.0 | 84 | 1.0653 | 0.9107 | 0.9097 | 0.9147 | 0.9107 |
| 0.4341 | 2.0 | 168 | 0.3179 | 0.9821 | 0.9820 | 0.9829 | 0.9821 |
| 0.0963 | 3.0 | 252 | 0.0865 | 1.0 | 1.0 | 1.0 | 1.0 |
| 0.0568 | 4.0 | 336 | 0.0427 | 1.0 | 1.0 | 1.0 | 1.0 |
| 0.0414 | 5.0 | 420 | 0.0356 | 1.0 | 1.0 | 1.0 | 1.0 |
> **Note:** Perfect scores may indicate an easy task, strong regularization, or possible data leakage. Validate on a held-out set and/or external data.
## Model description
- **Architecture:** DistilBERT encoder with a linear classification head
- **Task:** Multi-class text classification (7 genres)
- **Input:** A single YouTube comment (`str`)
- **Output:** Predicted genre label + scores
### Labels
Classical
rock
metal
electronic
R&B
pop
jazz
## Intended uses & limitations
**Intended uses**
- Exploratory analysis of audience/genre engagement on music videos
- Routing comments to genre-specific moderation or analytics queues
- Downstream features (e.g., per-genre dashboards)
**Limitations**
- Trained on YouTube comments; may not generalize to other platforms/domains
- Genre labels reflect the training taxonomy; ambiguous or mixed-genre comments can be misclassified
- Not designed for toxicity, sentiment, or demographic inference
**Ethical considerations**
- Comments can contain personal data; ensure collection complies with platform ToS and privacy laws
- Avoid using predictions to profile individuals
## How to use
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification, TextClassificationPipeline
repo_id = "scottymcgee/text-classifier" # update if different
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForSequenceClassification.from_pretrained(repo_id)
pipe = TextClassificationPipeline(model=model, tokenizer=tokenizer, return_all_scores=False)
pipe("this chorus is so catchy, reminds me of late 90s production")
|