dstefa/New_York_Times_Topics
Viewer β’ Updated β’ 256k β’ 113 β’ 3
How to use dstefa/roberta-base_topic_classification_nyt_news with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-classification", model="dstefa/roberta-base_topic_classification_nyt_news") # Load model directly
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("dstefa/roberta-base_topic_classification_nyt_news")
model = AutoModelForSequenceClassification.from_pretrained("dstefa/roberta-base_topic_classification_nyt_news")# Load model directly
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("dstefa/roberta-base_topic_classification_nyt_news")
model = AutoModelForSequenceClassification.from_pretrained("dstefa/roberta-base_topic_classification_nyt_news")This model is a fine-tuned version of roberta-base on the NYT News dataset, which contains 256,000 news titles from articles published from 2000 to the present (https://www.kaggle.com/datasets/aryansingh0909/nyt-articles-21m-2000-present). It achieves the following results on the test set of 51200 cases:
Training data was classified as follow:
| class | Description |
|---|---|
| 0 | Sports |
| 1 | Arts, Culture, and Entertainment |
| 2 | Business and Finance |
| 3 | Health and Wellness |
| 4 | Lifestyle and Fashion |
| 5 | Science and Technology |
| 6 | Politics |
| 7 | Crime |
The following hyperparameters were used during training:
| Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 | Precision | Recall |
|---|---|---|---|---|---|---|---|
| 0.3192 | 1.0 | 20480 | 0.4078 | 0.8865 | 0.8859 | 0.8892 | 0.8865 |
| 0.2863 | 2.0 | 40960 | 0.4271 | 0.8972 | 0.8970 | 0.8982 | 0.8972 |
| 0.1979 | 3.0 | 61440 | 0.3797 | 0.9094 | 0.9092 | 0.9098 | 0.9094 |
| 0.1239 | 4.0 | 81920 | 0.3981 | 0.9117 | 0.9113 | 0.9114 | 0.9117 |
| 0.1472 | 5.0 | 102400 | 0.4033 | 0.9137 | 0.9135 | 0.9134 | 0.9137 |
| - | precision | recall | f1 | support |
|---|---|---|---|---|
| Sports | 0.97 | 0.98 | 0.97 | 6400 |
| Arts, Culture, and Entertainment | 0.94 | 0.95 | 0.94 | 6400 |
| Business and Finance | 0.85 | 0.84 | 0.84 | 6400 |
| Health and Wellness | 0.90 | 0.93 | 0.91 | 6400 |
| Lifestyle and Fashion | 0.95 | 0.95 | 0.95 | 6400 |
| Science and Technology | 0.89 | 0.83 | 0.86 | 6400 |
| Politics | 0.93 | 0.88 | 0.90 | 6400 |
| Crime | 0.85 | 0.93 | 0.89 | 6400 |
| accuracy | 0.91 | 51200 | ||
| macro avg | 0.91 | 0.91 | 0.91 | 51200 |
| weighted avg | 0.91 | 0.91 | 0.91 | 51200 |
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from transformers import pipeline
tokenizer = AutoTokenizer.from_pretrained("dstefa/roberta-base_topic_classification_nyt_news")
model = AutoModelForSequenceClassification.from_pretrained("dstefa/roberta-base_topic_classification_nyt_news")
pipe = pipeline("text-classification", model=model, tokenizer=tokenizer)
text = "Kederis proclaims innocence Olympic champion Kostas Kederis today left hospital ahead of his date with IOC inquisitors claiming his innocence and vowing."
pipe(text)
[{'label': 'Sports', 'score': 0.9989326596260071}]
Base model
FacebookAI/roberta-base
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="dstefa/roberta-base_topic_classification_nyt_news")