File size: 3,430 Bytes
9ac1083
 
 
 
ff8ee0e
 
9ac1083
ff8ee0e
 
 
 
 
 
9ac1083
ff8ee0e
 
9ac1083
 
 
 
 
 
 
ff8ee0e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9ac1083
 
 
 
ff8ee0e
 
9ac1083
ff8ee0e
9ac1083
ff8ee0e
9ac1083
ff8ee0e
 
 
 
 
9ac1083
ff8ee0e
9ac1083
 
 
 
 
 
 
 
 
ff8ee0e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9ac1083
ff8ee0e
 
 
9ac1083
ff8ee0e
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
---
library_name: transformers
license: apache-2.0
base_model: distilbert-base-uncased
language:
- en
tags:
- text-classification
- sequence-classification
- youtube
- music-genres
- 7-class
- distilbert
- generated_from_trainer
datasets:
- custom-youtube-music-genres
metrics:
- accuracy
- f1
- precision
- recall
model-index:
- name: text
  results:
  - task:
      type: text-classification
      name: Text Classification
    dataset:
      name: YouTube Music Genre Comments (custom)
      type: custom
      split: validation
    metrics:
    - type: accuracy
      value: 1.0
    - type: f1
      value: 1.0
    - type: precision
      value: 1.0
    - type: recall
      value: 1.0
---

# text

A DistilBERT-based **7-class text classifier** fine-tuned to predict the **music genre** associated with a YouTube comment.  
Inputs are raw comment strings; outputs are one of seven genre labels.

> Base model: [`distilbert-base-uncased`](https://huggingface.co/distilbert-base-uncased)

## Results (evaluation set)

- **Loss:** 0.0675  
- **Accuracy:** 1.0  
- **F1:** 1.0  
- **Precision:** 1.0  
- **Recall:** 1.0  

### Training curves (from `Trainer` logs)

| Training Loss | Epoch | Step | Validation Loss | Accuracy | F1     | Precision | Recall |
|:-------------:|:-----:|:----:|:---------------:|:--------:|:------:|:---------:|:------:|
| 1.2677        | 1.0   | 84   | 1.0653          | 0.9107   | 0.9097 | 0.9147    | 0.9107 |
| 0.4341        | 2.0   | 168  | 0.3179          | 0.9821   | 0.9820 | 0.9829    | 0.9821 |
| 0.0963        | 3.0   | 252  | 0.0865          | 1.0      | 1.0    | 1.0       | 1.0    |
| 0.0568        | 4.0   | 336  | 0.0427          | 1.0      | 1.0    | 1.0       | 1.0    |
| 0.0414        | 5.0   | 420  | 0.0356          | 1.0      | 1.0    | 1.0       | 1.0    |

> **Note:** Perfect scores may indicate an easy task, strong regularization, or possible data leakage. Validate on a held-out set and/or external data.

## Model description

- **Architecture:** DistilBERT encoder with a linear classification head  
- **Task:** Multi-class text classification (7 genres)  
- **Input:** A single YouTube comment (`str`)  
- **Output:** Predicted genre label + scores

### Labels

Classical
rock
metal
electronic
R&B
pop
jazz


## Intended uses & limitations

**Intended uses**
- Exploratory analysis of audience/genre engagement on music videos  
- Routing comments to genre-specific moderation or analytics queues  
- Downstream features (e.g., per-genre dashboards)

**Limitations**
- Trained on YouTube comments; may not generalize to other platforms/domains  
- Genre labels reflect the training taxonomy; ambiguous or mixed-genre comments can be misclassified  
- Not designed for toxicity, sentiment, or demographic inference

**Ethical considerations**
- Comments can contain personal data; ensure collection complies with platform ToS and privacy laws  
- Avoid using predictions to profile individuals

## How to use

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification, TextClassificationPipeline

repo_id = "scottymcgee/text-classifier"  # update if different
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForSequenceClassification.from_pretrained(repo_id)

pipe = TextClassificationPipeline(model=model, tokenizer=tokenizer, return_all_scores=False)
pipe("this chorus is so catchy, reminds me of late 90s production")