blip-image-captioning-base-blip2

This model is a fine-tuned version of Salesforce/blip-image-captioning-base on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.4501
Wer: 0.8353

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 2
eval_batch_size: 2
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 16
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 50
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
1.1988	1.576	50	0.3600	0.8457
0.2346	3.128	100	0.3105	0.8388
0.1382	4.704	150	0.3111	0.8431
0.0779	6.256	200	0.3312	0.8388
0.0429	7.832	250	0.3430	0.8397
0.0248	9.384	300	0.3507	0.8448
0.0169	10.96	350	0.3602	0.8267
0.0113	12.512	400	0.3684	0.8448
0.0087	14.064	450	0.3737	0.8414
0.0059	15.64	500	0.3814	0.8422
0.0049	17.192	550	0.3762	0.8284
0.0036	18.768	600	0.3785	0.8388
0.0026	20.32	650	0.3805	0.8422
0.0023	21.896	700	0.3892	0.8414
0.0019	23.448	750	0.3901	0.8414
0.0016	25.0	800	0.3903	0.8371
0.0012	26.576	850	0.3999	0.8431
0.0009	28.128	900	0.4078	0.8457
0.0008	29.704	950	0.4049	0.8414
0.0008	31.256	1000	0.4063	0.8345
0.0005	32.832	1050	0.4133	0.8362
0.0004	34.384	1100	0.4173	0.8353
0.0003	35.96	1150	0.4238	0.8405
0.0003	37.512	1200	0.4254	0.8388
0.0002	39.064	1250	0.4263	0.8293
0.0001	40.64	1300	0.4326	0.8293
0.0001	42.192	1350	0.4376	0.8371
0.0001	43.768	1400	0.4391	0.8302
0.0	45.32	1450	0.4450	0.8388
0.0001	46.896	1500	0.4464	0.8328
0.0	48.448	1550	0.4488	0.8353
0.0	50.0	1600	0.4501	0.8353

Framework versions

Transformers 4.52.4
Pytorch 2.6.0+cu124
Datasets 3.6.0
Tokenizers 0.21.2

Downloads last month: 11

Safetensors

Model size

0.2B params

Tensor type

F32

Inference Providers NEW

Image-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for WafaaFraih/blip-image-captioning-base-blip2

Base model

Salesforce/blip-image-captioning-base

Finetuned

(48)

this model