trained on 12,312,444,928 tokens from the kjj0/fineweb100B-gpt2 dataset

$ lm_eval --model hf \
    --model_args pretrained=michaelbzhu/test-7.6B-base,trust_remote_code=True \
    --tasks mmlu_college_medicine,hellaswag,lambada_openai,arc_easy,winogrande,arc_challenge,openbookqa \
    --device cuda:0 \
    --batch_size 16

|     Tasks      |Version|Filter|n-shot|  Metric  |   | Value |   |Stderr|
|----------------|------:|------|-----:|----------|---|------:|---|-----:|
|arc_challenge   |      1|none  |     0|acc       |↑  | 0.2295|±  |0.0123|
|                |       |none  |     0|acc_norm  |↑  | 0.2628|±  |0.0129|
|arc_easy        |      1|none  |     0|acc       |↑  | 0.5358|±  |0.0102|
|                |       |none  |     0|acc_norm  |↑  | 0.4663|±  |0.0102|
|hellaswag       |      1|none  |     0|acc       |↑  | 0.3788|±  |0.0048|
|                |       |none  |     0|acc_norm  |↑  | 0.4801|±  |0.0050|
|lambada_openai  |      1|none  |     0|acc       |↑  | 0.4527|±  |0.0069|
|                |       |none  |     0|perplexity|↓  |14.3601|±  |0.4468|
|college_medicine|      1|none  |     0|acc       |↑  | 0.2254|±  |0.0319|
|openbookqa      |      1|none  |     0|acc       |↑  | 0.1920|±  |0.0176|
|                |       |none  |     0|acc_norm  |↑  | 0.3020|±  |0.0206|
|winogrande      |      1|none  |     0|acc       |↑  | 0.5107|±  |0.0140|
Downloads last month
3
Safetensors
Model size
8B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train michaelbzhu/test-7.6B-base