michaelbzhu
/

test-3.2B-base

custom-mbz-test

model_hub_mixin

pytorch_model_hub_mixin

Model card Files Files and versions

michaelbzhu commited on Sep 7, 2025

Commit

9dd587a

·

verified ·

1 Parent(s): b07a847

Update README.md

Files changed (1) hide show

README.md +24 -0

README.md CHANGED Viewed

@@ -43,4 +43,28 @@ for _ in range(20):
     next_token = torch.multinomial(torch.softmax(logits, dim=-1), 1).unsqueeze(0)
     input_ids = torch.cat([input_ids, next_token], dim=1)
 print(tokenizer.decode(input_ids[0]))
 ```

     next_token = torch.multinomial(torch.softmax(logits, dim=-1), 1).unsqueeze(0)
     input_ids = torch.cat([input_ids, next_token], dim=1)
 print(tokenizer.decode(input_ids[0]))
+```
+Eval:
+```
+$ lm_eval --model hf \
+    --model_args pretrained=michaelbzhu/test-3.2B-base,trust_remote_code=True \
+    --tasks mmlu_college_medicine,hellaswag,lambada_openai,arc_easy,winogrande,arc_challenge,openbookqa \
+    --device cuda:0 \
+    --batch_size 16
+|     Tasks      |Version|Filter|n-shot|  Metric  |   | Value |   |Stderr|
+|----------------|------:|------|-----:|----------|---|------:|---|-----:|
+|arc_challenge   |      1|none  |     0|acc       |↑  | 0.2363|±  |0.0124|
+|                |       |none  |     0|acc_norm  |↑  | 0.2637|±  |0.0129|
+|arc_easy        |      1|none  |     0|acc       |↑  | 0.5758|±  |0.0101|
+|                |       |none  |     0|acc_norm  |↑  | 0.4996|±  |0.0103|
+|hellaswag       |      1|none  |     0|acc       |↑  | 0.3827|±  |0.0049|
+|                |       |none  |     0|acc_norm  |↑  | 0.4846|±  |0.0050|
+|lambada_openai  |      1|none  |     0|acc       |↑  | 0.4238|±  |0.0069|
+|                |       |none  |     0|perplexity|↓  |14.7850|±  |0.4335|
+|college_medicine|      1|none  |     0|acc       |↑  | 0.2370|±  |0.0324|
+|openbookqa      |      1|none  |     0|acc       |↑  | 0.2180|±  |0.0185|
+|                |       |none  |     0|acc_norm  |↑  | 0.3180|±  |0.0208|
+|winogrande      |      1|none  |     0|acc       |↑  | 0.5367|±  |0.0140|
 ```