Update README.md
Browse files
README.md
CHANGED
|
@@ -46,8 +46,9 @@ The finetuned model fixes these errors and gives answer in a format much more su
|
|
| 46 |
This may be due to the specific nature of the object of the model itself (inference problems for the AP test) verses the broad nature of the benchmark (all of high school statistics in a non AP setting).
|
| 47 |
So, while the benchmark scores do not indicate success, the model does perform better in real world scenarios indicating the finetuning was a success.
|
| 48 |
The model was compared to Llama-3.2-3B-Instruct and Mistral7B-Instruct-v0.2 and show superior metrics on the mmlu_high_school_statistics and minerva_math while having a comparable race metric.
|
|
|
|
| 49 |
|
| 50 |
-
| Model | mmlu_high_school_statistics | minerva_math | race |
|
| 51 |
|--------------------------|-----------------------------|--------------|------|-----------------|--------------|----------|
|
| 52 |
| AP_Stat_Inference_Helper | 0.72 | 0.45 | 0.32 | 0.75 | 0.85 | 0.80 |
|
| 53 |
| Qwen3-4B-Instruct-2507 | 0.72 | 0.45 | 0.32 | 0.75 | 0.85 | 0.80 |
|
|
@@ -63,6 +64,7 @@ model = AutoModel.from_pretrained("bnolton/AP_Stat_Inference_Helper", dtype="aut
|
|
| 63 |
```
|
| 64 |
|
| 65 |
The intended use of this model is geared at AP Statistics students wanting to ensure they understanding the topic of inference.
|
|
|
|
| 66 |
|
| 67 |
## Prompt Format
|
| 68 |
pipe = pipeline(
|
|
|
|
| 46 |
This may be due to the specific nature of the object of the model itself (inference problems for the AP test) verses the broad nature of the benchmark (all of high school statistics in a non AP setting).
|
| 47 |
So, while the benchmark scores do not indicate success, the model does perform better in real world scenarios indicating the finetuning was a success.
|
| 48 |
The model was compared to Llama-3.2-3B-Instruct and Mistral7B-Instruct-v0.2 and show superior metrics on the mmlu_high_school_statistics and minerva_math while having a comparable race metric.
|
| 49 |
+
Similar BERT scores were also calculated across all models.
|
| 50 |
|
| 51 |
+
| Model | mmlu_high_school_statistics | minerva_math | race | BERT: precision | BERT: recall | BERT: f1 |
|
| 52 |
|--------------------------|-----------------------------|--------------|------|-----------------|--------------|----------|
|
| 53 |
| AP_Stat_Inference_Helper | 0.72 | 0.45 | 0.32 | 0.75 | 0.85 | 0.80 |
|
| 54 |
| Qwen3-4B-Instruct-2507 | 0.72 | 0.45 | 0.32 | 0.75 | 0.85 | 0.80 |
|
|
|
|
| 64 |
```
|
| 65 |
|
| 66 |
The intended use of this model is geared at AP Statistics students wanting to ensure they understanding the topic of inference.
|
| 67 |
+
The student will enter their question into the model and the model will output a response in the AP style format.
|
| 68 |
|
| 69 |
## Prompt Format
|
| 70 |
pipe = pipeline(
|