|
|
--- |
|
|
library_name: transformers |
|
|
tags: |
|
|
- bert |
|
|
- youtube |
|
|
- classification |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- en |
|
|
--- |
|
|
|
|
|
# Model Card for Model ID |
|
|
This is a fine-tuned BERT model that classifies YouTube channels content into categories such as Education, Technology, Finance, and more. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## Model Details |
|
|
|
|
|
### Model Description |
|
|
|
|
|
This is a fine-tuned BERT-based classification model designed to categorize **YouTube video metadata**—specifically titles, descriptions into one of categories**: |
|
|
|
|
|
* **Education** |
|
|
* **Technology** |
|
|
* **Motivation** |
|
|
* **Entertainment** |
|
|
* **Gaming** |
|
|
|
|
|
The model is based on the `bert-base-uncased` architecture from the [Hugging Face Transformers](https://huggingface.co/transformers/) library and was fine-tuned using a labeled dataset of YouTube content. It is optimized for short text classification, making it ideal for content analytics, recommendation systems, and media monitoring tools focused on YouTube. |
|
|
|
|
|
--- |
|
|
|
|
|
### Highlights |
|
|
|
|
|
* 🧠 **Model type:** BERT (Transformer-based) |
|
|
* 🔠 **Input:** Raw text (title + optional description) |
|
|
* 🎯 **Task:** Multi-class classification |
|
|
* 🏷️ **Classes:** 20 categories Such as Gaming,Technology,Finance etc |
|
|
* 📦 **Pretrained Base:** `bert-base-uncased` |
|
|
* 💡 **Use Case:** YouTube video categorization, content recommendation, channel analysis |
|
|
|
|
|
--- |
|
|
|
|
|
Let me know if you also want a short version or something more technical for the `model-index` or metadata fields. |
|
|
|
|
|
|
|
|
- **Developed by:** Jayesh Mehta |
|
|
- **Funded by [optional]:** [More Information Needed] |
|
|
- **Shared by [optional]:** [More Information Needed] |
|
|
- **Model type:** BERT-based sequence classification model |
|
|
- **Language(s) (NLP):** English |
|
|
- **License:** Apache 2.0 |
|
|
- **Finetuned from model [optional]:** [More Information Needed] |
|
|
|
|
|
### Model Sources [optional] |
|
|
|
|
|
<!-- Provide the basic links for the model. --> |
|
|
|
|
|
- **Repository:** [More Information Needed] |
|
|
- **Paper [optional]:** [More Information Needed] |
|
|
- **Demo [optional]:** [More Information Needed] |
|
|
|
|
|
## Uses |
|
|
|
|
|
<!-- This model can be directly used to classify YouTube video titles and descriptions into predefined categories: Education, Technology, Motivation, Entertainment, and Gaming. |
|
|
|
|
|
Example use cases: |
|
|
|
|
|
Automatically tagging videos in content moderation systems |
|
|
|
|
|
Enabling smart filtering and recommendations |
|
|
|
|
|
Analyzing category distribution of YouTube channels --> |
|
|
|
|
|
### Direct Use |
|
|
<!-- python |
|
|
|
|
|
from transformers import BertTokenizer, BertForSequenceClassification |
|
|
|
|
|
model = BertForSequenceClassification.from_pretrained("JaySenpai/bert-youtube-model") |
|
|
tokenizer = BertTokenizer.from_pretrained("JaySenpai/bert-youtube-model") |
|
|
|
|
|
inputs = tokenizer("This video is about personal productivity hacks", return_tensors="pt") |
|
|
outputs = model(**inputs) |
|
|
predicted = outputs.logits.argmax(dim=1).item()``` |
|
|
|
|
|
--> |
|
|
|
|
|
### Downstream Use [optional] |
|
|
|
|
|
This model can be integrated into larger systems, such as: |
|
|
|
|
|
Content management systems |
|
|
|
|
|
YouTube channel analytics tools |
|
|
|
|
|
Personalized recommendation engines |
|
|
|
|
|
|
|
|
### Out-of-Scope Use |
|
|
|
|
|
The model is not suitable for long-form text or transcript-level classification. |
|
|
|
|
|
Should not be used to classify non-YouTube content or languages other than English. |
|
|
|
|
|
Avoid using it in sensitive decision-making scenarios (e.g., legal, medical). |
|
|
|
|
|
|
|
|
## Bias, Risks, and Limitations |
|
|
|
|
|
Like most models trained on public or scraped data: |
|
|
|
|
|
The model may carry biases from the underlying data (e.g., overrepresentation of certain video types). |
|
|
|
|
|
It may misclassify mixed-genre or ambiguous titles (e.g., “Top 10 Gaming Laptops for Students”). |
|
|
|
|
|
It is sensitive to text length and clarity—very short or vague titles may reduce accuracy. |
|
|
|
|
|
|
|
|
### Recommendations |
|
|
|
|
|
Use the model as an assistive tool, not a final decision-maker. |
|
|
|
|
|
Evaluate its performance on your specific data before deploying. |
|
|
|
|
|
Consider adding user feedback or manual review in production systems. |
|
|
|
|
|
## How to Get Started with the Model |
|
|
|
|
|
from transformers import BertTokenizer, BertForSequenceClassification |
|
|
|
|
|
model = BertForSequenceClassification.from_pretrained("JaySenpai/bert-model") |
|
|
tokenizer = BertTokenizer.from_pretrained("JaySenpai/bert-model") |
|
|
|
|
|
text = "10 Tips to Grow Your YouTube Channel" |
|
|
inputs = tokenizer(text, return_tensors="pt") |
|
|
outputs = model(**inputs) |
|
|
prediction = outputs.logits.argmax(dim=1).item() |
|
|
|
|
|
labels = {0: "Education", 1: "Comedy and Humour", 2: "Gaming", 3: "Technology", 4: "Motivation"} |
|
|
print("Predicted label:", labels[prediction]) |
|
|
|
|
|
|
|
|
## Training Details |
|
|
|
|
|
### Training Data |
|
|
|
|
|
Training Data |
|
|
The model was fine-tuned using a labeled dataset of YouTube titles and descriptions, mapped to categories: |
|
|
|
|
|
Education |
|
|
Travel |
|
|
Cooking |
|
|
Gaming |
|
|
Music |
|
|
Health and Fitness |
|
|
Finance |
|
|
Technology |
|
|
Vlogging |
|
|
Beauty & Fashion |
|
|
Digital Marketing |
|
|
Movies/Series Reviews |
|
|
Comedy and Humour |
|
|
Podcast |
|
|
Youtube or Instagram Grow Tips |
|
|
Online Income |
|
|
ASMR |
|
|
Business and Marketing |
|
|
News |
|
|
Motivation |
|
|
|
|
|
|
|
|
### Training Procedure |
|
|
|
|
|
|
|
|
#### Preprocessing [optional] |
|
|
|
|
|
[More Information Needed] |
|
|
|
|
|
|
|
|
#### Training Hyperparameters |
|
|
|
|
|
- **Training regime:** <!Base model: bert-base-uncased Epochs: 4 Batch size: 16 Learning rate: 2e-5 Optimizer: AdamW --> |
|
|
|
|
|
#### Speeds, Sizes, Times [optional] |
|
|
|
|
|
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. --> |
|
|
|
|
|
[More Information Needed] |
|
|
|
|
|
## Evaluation |
|
|
|
|
|
### Testing Data, Factors & Metrics |
|
|
|
|
|
#### Testing Data |
|
|
|
|
|
The model was evaluated on a held-out validation set of manually labeled YouTube titles and descriptions. |
|
|
|
|
|
|
|
|
|
|
|
#### Factors |
|
|
|
|
|
<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. --> |
|
|
|
|
|
[More Information Needed] |
|
|
|
|
|
#### Metrics |
|
|
|
|
|
Accuracy: ~97% |
|
|
|
|
|
F1-score (macro): ~0.95 |
|
|
|
|
|
### Results |
|
|
|
|
|
The model performed well on clear-cut categories like "Gaming" and "Technology" but showed confusion between "Motivation" and "Education" in edge cases. |
|
|
|
|
|
|
|
|
|
|
|
#### Summary |
|
|
|
|
|
|
|
|
|
|
|
## Model Examination [optional] |
|
|
|
|
|
<!-- Relevant interpretability work for the model goes here --> |
|
|
|
|
|
[More Information Needed] |
|
|
|
|
|
## Environmental Impact |
|
|
|
|
|
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly --> |
|
|
|
|
|
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). |
|
|
|
|
|
- **Hardware Type:** [More Information Needed] |
|
|
- **Hours used:** [More Information Needed] |
|
|
- **Cloud Provider:** [More Information Needed] |
|
|
- **Compute Region:** [More Information Needed] |
|
|
- **Carbon Emitted:** [More Information Needed] |
|
|
|
|
|
## Technical Specifications [optional] |
|
|
|
|
|
### Model Architecture and Objective |
|
|
|
|
|
[More Information Needed] |
|
|
|
|
|
### Compute Infrastructure |
|
|
|
|
|
[More Information Needed] |
|
|
|
|
|
#### Hardware |
|
|
|
|
|
[More Information Needed] |
|
|
|
|
|
#### Software |
|
|
|
|
|
[More Information Needed] |
|
|
|
|
|
## Citation [optional] |
|
|
|
|
|
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. --> |
|
|
|
|
|
**BibTeX:** |
|
|
|
|
|
[More Information Needed] |
|
|
|
|
|
**APA:** |
|
|
|
|
|
[More Information Needed] |
|
|
|
|
|
## Glossary [optional] |
|
|
|
|
|
<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. --> |
|
|
|
|
|
[More Information Needed] |
|
|
|
|
|
## More Information [optional] |
|
|
|
|
|
[More Information Needed] |
|
|
|
|
|
## Model Card Authors [optional] |
|
|
|
|
|
[More Information Needed] |
|
|
|
|
|
## Model Card Contact |
|
|
|
|
|
Author: Jayesh Mehta(JaySenpai) |
|
|
Hugging Face: @JaySenpai |
|
|
|
|
|
|