roberta-base_topic_classification_nyt_news

This model is a fine-tuned version of roberta-base on the NYT News dataset, which contains 256,000 news titles from articles published from 2000 to the present (https://www.kaggle.com/datasets/aryansingh0909/nyt-articles-21m-2000-present). It achieves the following results on the test set of 51200 cases:

Accuracy: 0.91
F1: 0.91
Precision: 0.91
Recall: 0.91

Training data

Training data was classified as follow:

class	Description
0	Sports
1	Arts, Culture, and Entertainment
2	Business and Finance
3	Health and Wellness
4	Lifestyle and Fashion
5	Science and Technology
6	Politics
7	Crime

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
num_epochs: 5

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy	F1	Precision	Recall
0.3192	1.0	20480	0.4078	0.8865	0.8859	0.8892	0.8865
0.2863	2.0	40960	0.4271	0.8972	0.8970	0.8982	0.8972
0.1979	3.0	61440	0.3797	0.9094	0.9092	0.9098	0.9094
0.1239	4.0	81920	0.3981	0.9117	0.9113	0.9114	0.9117
0.1472	5.0	102400	0.4033	0.9137	0.9135	0.9134	0.9137

Model performance

-	precision	recall	f1	support
Sports	0.97	0.98	0.97	6400
Arts, Culture, and Entertainment	0.94	0.95	0.94	6400
Business and Finance	0.85	0.84	0.84	6400
Health and Wellness	0.90	0.93	0.91	6400
Lifestyle and Fashion	0.95	0.95	0.95	6400
Science and Technology	0.89	0.83	0.86	6400
Politics	0.93	0.88	0.90	6400
Crime	0.85	0.93	0.89	6400

accuracy			0.91	51200
macro avg	0.91	0.91	0.91	51200
weighted avg	0.91	0.91	0.91	51200

How to use roberta-base_topic_classification_nyt_news with HuggingFace

from transformers import AutoTokenizer, AutoModelForSequenceClassification
from transformers import pipeline

tokenizer = AutoTokenizer.from_pretrained("dstefa/roberta-base_topic_classification_nyt_news")
model = AutoModelForSequenceClassification.from_pretrained("dstefa/roberta-base_topic_classification_nyt_news")
pipe = pipeline("text-classification", model=model, tokenizer=tokenizer)

text = "Kederis proclaims innocence Olympic champion Kostas Kederis today left hospital ahead of his date with IOC inquisitors claiming his innocence and vowing."
pipe(text)

[{'label': 'Sports', 'score': 0.9989326596260071}]

Framework versions

Transformers 4.32.1
Pytorch 2.1.0+cu121
Datasets 2.12.0
Tokenizers 0.13.2

Downloads last month: 2,198

Model tree for dstefa/roberta-base_topic_classification_nyt_news

Base model

FacebookAI/roberta-base

Finetuned

(2308)

this model

Dataset used to train dstefa/roberta-base_topic_classification_nyt_news

Spaces using dstefa/roberta-base_topic_classification_nyt_news 10

Evaluation results

F1 on New_York_Times_Topics
self-reported

0.910
accuracy on New_York_Times_Topics
self-reported

0.910
precision on New_York_Times_Topics
self-reported

0.910
recall on New_York_Times_Topics
self-reported

0.910