This is a finetune of Gemma-3-1b-it using the Erudite-V2 dataset.

The dataset used to finetune this model tries to improve gemma 3 1b's performance in things like mmlu and humaneval.

image

  • Model training loss:

    Run history:

train/epoch β–β–β–β–β–β–‚β–‚β–‚β–‚β–‚β–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–„β–„β–„β–„β–„β–…β–…β–…β–…β–…β–†β–†β–†β–†β–†β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–ˆβ–ˆ train/global_step β–β–β–β–β–‚β–‚β–‚β–‚β–‚β–‚β–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–„β–„β–„β–„β–„β–„β–…β–…β–…β–…β–…β–†β–†β–†β–†β–†β–‡β–‡β–‡β–‡β–‡β–‡β–ˆβ–ˆβ–ˆ train/grad_norm β–β–ƒβ–ˆβ–‚β–„β–ƒβ–‚β–„β–ƒβ–‚β–ƒβ–„β–…β–…β–ƒβ–„β–„β–…β–ƒβ–„β–„β–ƒβ–„β–„β–„β–„β–„β–…β–„β–„β–…β–…β–„β–„β–„β–„β–„β–…β–„β–„ train/learning_rate β–„β–ˆβ–ˆβ–ˆβ–ˆβ–‡β–‡β–‡β–‡β–‡β–†β–†β–†β–†β–†β–†β–†β–…β–…β–…β–…β–…β–…β–„β–„β–„β–„β–„β–ƒβ–ƒβ–ƒβ–‚β–‚β–‚β–‚β–‚β–‚β–β–β– train/loss β–ˆβ–„β–ƒβ–ƒβ–ƒβ–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–

Run summary:

total_flos 1.092006641664e+18 train/epoch 1 train/global_step 3907 train/grad_norm 0.21461 train/learning_rate 0.0 train/loss 0.7482 train_loss 0.81085 train_runtime 15509.2958 train_samples_per_second 16.119 train_steps_per_second 0.252

Downloads last month
10
Safetensors
Model size
1.0B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Stormtrooperaim/Erudite-V2-1b

Finetuned
(332)
this model
Merges
3 models
Quantizations
3 models

Dataset used to train Stormtrooperaim/Erudite-V2-1b

Collection including Stormtrooperaim/Erudite-V2-1b