scottymcgee commited on
Commit
ff8ee0e
·
verified ·
1 Parent(s): 9ac1083

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +82 -39
README.md CHANGED
@@ -2,8 +2,18 @@
2
  library_name: transformers
3
  license: apache-2.0
4
  base_model: distilbert-base-uncased
 
 
5
  tags:
 
 
 
 
 
 
6
  - generated_from_trainer
 
 
7
  metrics:
8
  - accuracy
9
  - f1
@@ -11,48 +21,41 @@ metrics:
11
  - recall
12
  model-index:
13
  - name: text
14
- results: []
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  ---
16
 
17
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
18
- should probably proofread and complete it, then remove this comment. -->
19
-
20
  # text
21
 
22
- This model is a fine-tuned version of [distilbert-base-uncased](https://huggingface.co/distilbert-base-uncased) on the None dataset.
23
- It achieves the following results on the evaluation set:
24
- - Loss: 0.0675
25
- - Accuracy: 1.0
26
- - F1: 1.0
27
- - Precision: 1.0
28
- - Recall: 1.0
29
-
30
- ## Model description
31
-
32
- More information needed
33
-
34
- ## Intended uses & limitations
35
 
36
- More information needed
37
 
38
- ## Training and evaluation data
39
 
40
- More information needed
 
 
 
 
41
 
42
- ## Training procedure
43
-
44
- ### Training hyperparameters
45
-
46
- The following hyperparameters were used during training:
47
- - learning_rate: 2e-05
48
- - train_batch_size: 8
49
- - eval_batch_size: 8
50
- - seed: 42
51
- - optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
52
- - lr_scheduler_type: linear
53
- - num_epochs: 5
54
-
55
- ### Training results
56
 
57
  | Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 | Precision | Recall |
58
  |:-------------:|:-----:|:----:|:---------------:|:--------:|:------:|:---------:|:------:|
@@ -62,10 +65,50 @@ The following hyperparameters were used during training:
62
  | 0.0568 | 4.0 | 336 | 0.0427 | 1.0 | 1.0 | 1.0 | 1.0 |
63
  | 0.0414 | 5.0 | 420 | 0.0356 | 1.0 | 1.0 | 1.0 | 1.0 |
64
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
65
 
66
- ### Framework versions
 
 
67
 
68
- - Transformers 4.56.1
69
- - Pytorch 2.8.0+cu126
70
- - Datasets 4.0.0
71
- - Tokenizers 0.22.0
 
2
  library_name: transformers
3
  license: apache-2.0
4
  base_model: distilbert-base-uncased
5
+ language:
6
+ - en
7
  tags:
8
+ - text-classification
9
+ - sequence-classification
10
+ - youtube
11
+ - music-genres
12
+ - 7-class
13
+ - distilbert
14
  - generated_from_trainer
15
+ datasets:
16
+ - custom-youtube-music-genres
17
  metrics:
18
  - accuracy
19
  - f1
 
21
  - recall
22
  model-index:
23
  - name: text
24
+ results:
25
+ - task:
26
+ type: text-classification
27
+ name: Text Classification
28
+ dataset:
29
+ name: YouTube Music Genre Comments (custom)
30
+ type: custom
31
+ split: validation
32
+ metrics:
33
+ - type: accuracy
34
+ value: 1.0
35
+ - type: f1
36
+ value: 1.0
37
+ - type: precision
38
+ value: 1.0
39
+ - type: recall
40
+ value: 1.0
41
  ---
42
 
 
 
 
43
  # text
44
 
45
+ A DistilBERT-based **7-class text classifier** fine-tuned to predict the **music genre** associated with a YouTube comment.
46
+ Inputs are raw comment strings; outputs are one of seven genre labels.
 
 
 
 
 
 
 
 
 
 
 
47
 
48
+ > Base model: [`distilbert-base-uncased`](https://huggingface.co/distilbert-base-uncased)
49
 
50
+ ## Results (evaluation set)
51
 
52
+ - **Loss:** 0.0675
53
+ - **Accuracy:** 1.0
54
+ - **F1:** 1.0
55
+ - **Precision:** 1.0
56
+ - **Recall:** 1.0
57
 
58
+ ### Training curves (from `Trainer` logs)
 
 
 
 
 
 
 
 
 
 
 
 
 
59
 
60
  | Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 | Precision | Recall |
61
  |:-------------:|:-----:|:----:|:---------------:|:--------:|:------:|:---------:|:------:|
 
65
  | 0.0568 | 4.0 | 336 | 0.0427 | 1.0 | 1.0 | 1.0 | 1.0 |
66
  | 0.0414 | 5.0 | 420 | 0.0356 | 1.0 | 1.0 | 1.0 | 1.0 |
67
 
68
+ > **Note:** Perfect scores may indicate an easy task, strong regularization, or possible data leakage. Validate on a held-out set and/or external data.
69
+
70
+ ## Model description
71
+
72
+ - **Architecture:** DistilBERT encoder with a linear classification head
73
+ - **Task:** Multi-class text classification (7 genres)
74
+ - **Input:** A single YouTube comment (`str`)
75
+ - **Output:** Predicted genre label + scores
76
+
77
+ ### Labels
78
+
79
+ Classical
80
+ rock
81
+ metal
82
+ electronic
83
+ R&B
84
+ pop
85
+ jazz
86
+
87
+
88
+ ## Intended uses & limitations
89
+
90
+ **Intended uses**
91
+ - Exploratory analysis of audience/genre engagement on music videos
92
+ - Routing comments to genre-specific moderation or analytics queues
93
+ - Downstream features (e.g., per-genre dashboards)
94
+
95
+ **Limitations**
96
+ - Trained on YouTube comments; may not generalize to other platforms/domains
97
+ - Genre labels reflect the training taxonomy; ambiguous or mixed-genre comments can be misclassified
98
+ - Not designed for toxicity, sentiment, or demographic inference
99
+
100
+ **Ethical considerations**
101
+ - Comments can contain personal data; ensure collection complies with platform ToS and privacy laws
102
+ - Avoid using predictions to profile individuals
103
+
104
+ ## How to use
105
+
106
+ ```python
107
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification, TextClassificationPipeline
108
 
109
+ repo_id = "scottymcgee/text-classifier" # update if different
110
+ tokenizer = AutoTokenizer.from_pretrained(repo_id)
111
+ model = AutoModelForSequenceClassification.from_pretrained(repo_id)
112
 
113
+ pipe = TextClassificationPipeline(model=model, tokenizer=tokenizer, return_all_scores=False)
114
+ pipe("this chorus is so catchy, reminds me of late 90s production")