--- library_name: transformers license: apache-2.0 base_model: distilbert-base-uncased language: - en tags: - text-classification - sequence-classification - youtube - music-genres - 7-class - distilbert - generated_from_trainer datasets: - custom-youtube-music-genres metrics: - accuracy - f1 - precision - recall model-index: - name: text results: - task: type: text-classification name: Text Classification dataset: name: YouTube Music Genre Comments (custom) type: custom split: validation metrics: - type: accuracy value: 1.0 - type: f1 value: 1.0 - type: precision value: 1.0 - type: recall value: 1.0 --- # text A DistilBERT-based **7-class text classifier** fine-tuned to predict the **music genre** associated with a YouTube comment. Inputs are raw comment strings; outputs are one of seven genre labels. > Base model: [`distilbert-base-uncased`](https://huggingface.co/distilbert-base-uncased) ## Results (evaluation set) - **Loss:** 0.0675 - **Accuracy:** 1.0 - **F1:** 1.0 - **Precision:** 1.0 - **Recall:** 1.0 ### Training curves (from `Trainer` logs) | Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 | Precision | Recall | |:-------------:|:-----:|:----:|:---------------:|:--------:|:------:|:---------:|:------:| | 1.2677 | 1.0 | 84 | 1.0653 | 0.9107 | 0.9097 | 0.9147 | 0.9107 | | 0.4341 | 2.0 | 168 | 0.3179 | 0.9821 | 0.9820 | 0.9829 | 0.9821 | | 0.0963 | 3.0 | 252 | 0.0865 | 1.0 | 1.0 | 1.0 | 1.0 | | 0.0568 | 4.0 | 336 | 0.0427 | 1.0 | 1.0 | 1.0 | 1.0 | | 0.0414 | 5.0 | 420 | 0.0356 | 1.0 | 1.0 | 1.0 | 1.0 | > **Note:** Perfect scores may indicate an easy task, strong regularization, or possible data leakage. Validate on a held-out set and/or external data. ## Model description - **Architecture:** DistilBERT encoder with a linear classification head - **Task:** Multi-class text classification (7 genres) - **Input:** A single YouTube comment (`str`) - **Output:** Predicted genre label + scores ### Labels Classical rock metal electronic R&B pop jazz ## Intended uses & limitations **Intended uses** - Exploratory analysis of audience/genre engagement on music videos - Routing comments to genre-specific moderation or analytics queues - Downstream features (e.g., per-genre dashboards) **Limitations** - Trained on YouTube comments; may not generalize to other platforms/domains - Genre labels reflect the training taxonomy; ambiguous or mixed-genre comments can be misclassified - Not designed for toxicity, sentiment, or demographic inference **Ethical considerations** - Comments can contain personal data; ensure collection complies with platform ToS and privacy laws - Avoid using predictions to profile individuals ## How to use ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification, TextClassificationPipeline repo_id = "scottymcgee/text-classifier" # update if different tokenizer = AutoTokenizer.from_pretrained(repo_id) model = AutoModelForSequenceClassification.from_pretrained(repo_id) pipe = TextClassificationPipeline(model=model, tokenizer=tokenizer, return_all_scores=False) pipe("this chorus is so catchy, reminds me of late 90s production")