bnolton
/

AP_Stat_Inference_Helper

Model card Files Files and versions

bnolton commited on Dec 3, 2025

Commit

ed18a12

·

verified ·

1 Parent(s): 5599454

Update README.md

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -46,8 +46,9 @@ The finetuned model fixes these errors and gives answer in a format much more su
 This may be due to the specific nature of the object of the model itself (inference problems for the AP test) verses the broad nature of the benchmark (all of high school statistics in a non AP setting).
 So, while the benchmark scores do not indicate success, the model does perform better in real world scenarios indicating the finetuning was a success.
 The model was compared to Llama-3.2-3B-Instruct and Mistral7B-Instruct-v0.2 and show superior metrics on the mmlu_high_school_statistics and minerva_math while having a comparable race metric.
-| Model                    | mmlu_high_school_statistics | minerva_math | race | bert: precision | bert: recall | bert: f1 |
 |--------------------------|-----------------------------|--------------|------|-----------------|--------------|----------|
 | AP_Stat_Inference_Helper | 0.72                        | 0.45         | 0.32 | 0.75            | 0.85         | 0.80     |
 | Qwen3-4B-Instruct-2507   | 0.72                        | 0.45         | 0.32 | 0.75            | 0.85         | 0.80     |
@@ -63,6 +64,7 @@ model = AutoModel.from_pretrained("bnolton/AP_Stat_Inference_Helper", dtype="aut
 ```
 The intended use of this model is geared at AP Statistics students wanting to ensure they understanding the topic of inference.
 ## Prompt Format
 pipe = pipeline(

 This may be due to the specific nature of the object of the model itself (inference problems for the AP test) verses the broad nature of the benchmark (all of high school statistics in a non AP setting).
 So, while the benchmark scores do not indicate success, the model does perform better in real world scenarios indicating the finetuning was a success.
 The model was compared to Llama-3.2-3B-Instruct and Mistral7B-Instruct-v0.2 and show superior metrics on the mmlu_high_school_statistics and minerva_math while having a comparable race metric.
+Similar BERT scores were also calculated across all models.
+| Model                    | mmlu_high_school_statistics | minerva_math | race | BERT: precision | BERT: recall | BERT: f1 |
 |--------------------------|-----------------------------|--------------|------|-----------------|--------------|----------|
 | AP_Stat_Inference_Helper | 0.72                        | 0.45         | 0.32 | 0.75            | 0.85         | 0.80     |
 | Qwen3-4B-Instruct-2507   | 0.72                        | 0.45         | 0.32 | 0.75            | 0.85         | 0.80     |
 ```
 The intended use of this model is geared at AP Statistics students wanting to ensure they understanding the topic of inference.
+The student will enter their question into the model and the model will output a response in the AP style format.
 ## Prompt Format
 pipe = pipeline(