scottymcgee
/

text

@@ -2,8 +2,18 @@
 library_name: transformers
 license: apache-2.0
 base_model: distilbert-base-uncased
 tags:
 - generated_from_trainer
 metrics:
 - accuracy
 - f1
@@ -11,48 +21,41 @@ metrics:
 - recall
 model-index:
 - name: text
-  results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
 # text
-This model is a fine-tuned version of [distilbert-base-uncased](https://huggingface.co/distilbert-base-uncased) on the None dataset.
-It achieves the following results on the evaluation set:
-- Loss: 0.0675
-- Accuracy: 1.0
-- F1: 1.0
-- Precision: 1.0
-- Recall: 1.0
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 2e-05
-- train_batch_size: 8
-- eval_batch_size: 8
-- seed: 42
-- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
-- lr_scheduler_type: linear
-- num_epochs: 5
-### Training results
 | Training Loss | Epoch | Step | Validation Loss | Accuracy | F1     | Precision | Recall |
 |:-------------:|:-----:|:----:|:---------------:|:--------:|:------:|:---------:|:------:|
@@ -62,10 +65,50 @@ The following hyperparameters were used during training:
 | 0.0568        | 4.0   | 336  | 0.0427          | 1.0      | 1.0    | 1.0       | 1.0    |
 | 0.0414        | 5.0   | 420  | 0.0356          | 1.0      | 1.0    | 1.0       | 1.0    |
-### Framework versions
-- Transformers 4.56.1
-- Pytorch 2.8.0+cu126
-- Datasets 4.0.0
-- Tokenizers 0.22.0

 library_name: transformers
 license: apache-2.0
 base_model: distilbert-base-uncased
+language:
+- en
 tags:
+- text-classification
+- sequence-classification
+- youtube
+- music-genres
+- 7-class
+- distilbert
 - generated_from_trainer
+datasets:
+- custom-youtube-music-genres
 metrics:
 - accuracy
 - f1
 - recall
 model-index:
 - name: text
+  results:
+  - task:
+      type: text-classification
+      name: Text Classification
+    dataset:
+      name: YouTube Music Genre Comments (custom)
+      type: custom
+      split: validation
+    metrics:
+    - type: accuracy
+      value: 1.0
+    - type: f1
+      value: 1.0
+    - type: precision
+      value: 1.0
+    - type: recall
+      value: 1.0
 ---
 # text
+A DistilBERT-based **7-class text classifier** fine-tuned to predict the **music genre** associated with a YouTube comment.
+Inputs are raw comment strings; outputs are one of seven genre labels.
+> Base model: [`distilbert-base-uncased`](https://huggingface.co/distilbert-base-uncased)
+## Results (evaluation set)
+- **Loss:** 0.0675
+- **Accuracy:** 1.0
+- **F1:** 1.0
+- **Precision:** 1.0
+- **Recall:** 1.0
+### Training curves (from `Trainer` logs)
 | Training Loss | Epoch | Step | Validation Loss | Accuracy | F1     | Precision | Recall |
 |:-------------:|:-----:|:----:|:---------------:|:--------:|:------:|:---------:|:------:|
 | 0.0568        | 4.0   | 336  | 0.0427          | 1.0      | 1.0    | 1.0       | 1.0    |
 | 0.0414        | 5.0   | 420  | 0.0356          | 1.0      | 1.0    | 1.0       | 1.0    |
+> **Note:** Perfect scores may indicate an easy task, strong regularization, or possible data leakage. Validate on a held-out set and/or external data.
+## Model description
+- **Architecture:** DistilBERT encoder with a linear classification head
+- **Task:** Multi-class text classification (7 genres)
+- **Input:** A single YouTube comment (`str`)
+- **Output:** Predicted genre label + scores
+### Labels
+Classical
+rock
+metal
+electronic
+R&B
+pop
+jazz
+## Intended uses & limitations
+**Intended uses**
+- Exploratory analysis of audience/genre engagement on music videos
+- Routing comments to genre-specific moderation or analytics queues
+- Downstream features (e.g., per-genre dashboards)
+**Limitations**
+- Trained on YouTube comments; may not generalize to other platforms/domains
+- Genre labels reflect the training taxonomy; ambiguous or mixed-genre comments can be misclassified
+- Not designed for toxicity, sentiment, or demographic inference
+**Ethical considerations**
+- Comments can contain personal data; ensure collection complies with platform ToS and privacy laws
+- Avoid using predictions to profile individuals
+## How to use
+```python
+from transformers import AutoTokenizer, AutoModelForSequenceClassification, TextClassificationPipeline
+repo_id = "scottymcgee/text-classifier"  # update if different
+tokenizer = AutoTokenizer.from_pretrained(repo_id)
+model = AutoModelForSequenceClassification.from_pretrained(repo_id)
+pipe = TextClassificationPipeline(model=model, tokenizer=tokenizer, return_all_scores=False)
+pipe("this chorus is so catchy, reminds me of late 90s production")