distilbert-no-aug-ag-news

This model is a fine-tuned version of distilbert/distilbert-base-multilingual-cased on the fancyzhx/ag_news dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 16
eval_batch_size: 16
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 10

Training Loss	Epoch	Step	Validation Loss	F1
No log	1.0	25	1.0288	0.7771
No log	2.0	50	0.5710	0.8375
No log	3.0	75	0.4612	0.8534
No log	4.0	100	0.4320	0.8623
No log	5.0	125	0.4473	0.8562
No log	6.0	150	0.4336	0.8621
No log	7.0	175	0.4517	0.8618
No log	8.0	200	0.4614	0.8629
No log	9.0	225	0.4669	0.8627
No log	10.0	250	0.4701	0.8622

Safetensors

Model size

0.1B params

Tensor type

F32

Base model

Finetuned

(437)

this model