Update README.md

70c747c almost 4 years ago

2.13 kB

license: apache-2.0
tags:
  - generated_from_trainer
metrics:
  - accuracy
  - f1
  - precision
  - recall
model-index:
  - name: FakevsRealNews
    results: []

Coding challenge

The challenge involved building a fake news classifier using the huggingface library.

This final model is a fine-tuned version of distilbert-base-uncased on an fake-and-real-news dataset. The link to the dataset is https://www.kaggle.com/datasets/clmentbisaillon/fake-and-real-news-dataset.

It achieves the following results on the evaluation set:

Loss: 0.0000
Accuracy: 1.0
F1: 1.0
Precision: 1.0
Recall: 1.0

Model description

Finetuned Distilbert

Training and evaluation data

The training data was split into train-dev-test in the ratio 80-10-10.

Training procedure

The title and text of each news story was concatenated to form each datapoint. Then a model was finetuned to perform single label classification on each datapoint. The final prediction is the class with the highest probability.

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 16
eval_batch_size: 16
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy	F1	Precision	Recall
0.0503	1.0	1956	0.0025	0.9995	0.9995	0.9995	0.9995
0.001	2.0	3912	0.0001	1.0	1.0	1.0	1.0
0.0007	3.0	5868	0.0000	1.0	1.0	1.0	1.0

Framework versions

Transformers 4.18.0
Pytorch 1.10.0+cu111
Datasets 2.1.0
Tokenizers 0.12.1