bnolton commited on
Commit
ed18a12
·
verified ·
1 Parent(s): 5599454

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -46,8 +46,9 @@ The finetuned model fixes these errors and gives answer in a format much more su
46
  This may be due to the specific nature of the object of the model itself (inference problems for the AP test) verses the broad nature of the benchmark (all of high school statistics in a non AP setting).
47
  So, while the benchmark scores do not indicate success, the model does perform better in real world scenarios indicating the finetuning was a success.
48
  The model was compared to Llama-3.2-3B-Instruct and Mistral7B-Instruct-v0.2 and show superior metrics on the mmlu_high_school_statistics and minerva_math while having a comparable race metric.
 
49
 
50
- | Model | mmlu_high_school_statistics | minerva_math | race | bert: precision | bert: recall | bert: f1 |
51
  |--------------------------|-----------------------------|--------------|------|-----------------|--------------|----------|
52
  | AP_Stat_Inference_Helper | 0.72 | 0.45 | 0.32 | 0.75 | 0.85 | 0.80 |
53
  | Qwen3-4B-Instruct-2507 | 0.72 | 0.45 | 0.32 | 0.75 | 0.85 | 0.80 |
@@ -63,6 +64,7 @@ model = AutoModel.from_pretrained("bnolton/AP_Stat_Inference_Helper", dtype="aut
63
  ```
64
 
65
  The intended use of this model is geared at AP Statistics students wanting to ensure they understanding the topic of inference.
 
66
 
67
  ## Prompt Format
68
  pipe = pipeline(
 
46
  This may be due to the specific nature of the object of the model itself (inference problems for the AP test) verses the broad nature of the benchmark (all of high school statistics in a non AP setting).
47
  So, while the benchmark scores do not indicate success, the model does perform better in real world scenarios indicating the finetuning was a success.
48
  The model was compared to Llama-3.2-3B-Instruct and Mistral7B-Instruct-v0.2 and show superior metrics on the mmlu_high_school_statistics and minerva_math while having a comparable race metric.
49
+ Similar BERT scores were also calculated across all models.
50
 
51
+ | Model | mmlu_high_school_statistics | minerva_math | race | BERT: precision | BERT: recall | BERT: f1 |
52
  |--------------------------|-----------------------------|--------------|------|-----------------|--------------|----------|
53
  | AP_Stat_Inference_Helper | 0.72 | 0.45 | 0.32 | 0.75 | 0.85 | 0.80 |
54
  | Qwen3-4B-Instruct-2507 | 0.72 | 0.45 | 0.32 | 0.75 | 0.85 | 0.80 |
 
64
  ```
65
 
66
  The intended use of this model is geared at AP Statistics students wanting to ensure they understanding the topic of inference.
67
+ The student will enter their question into the model and the model will output a response in the AP style format.
68
 
69
  ## Prompt Format
70
  pipe = pipeline(