|
|
--- |
|
|
library_name: transformers |
|
|
license: apache-2.0 |
|
|
base_model: unsloth/Qwen3-0.6B-Base |
|
|
tags: |
|
|
- unsloth |
|
|
- generated_from_trainer |
|
|
model-index: |
|
|
- name: Qwen3-0.6B-MNLP_M2_mcqa_model |
|
|
results: [] |
|
|
datasets: |
|
|
- andresnowak/MNLP_MCQA_dataset |
|
|
- andresnowak/MNLP_M2_mcqa_dataset |
|
|
--- |
|
|
|
|
|
<!-- This model card has been generated automatically according to the information the Trainer had access to. You |
|
|
should probably proofread and complete it, then remove this comment. --> |
|
|
|
|
|
# Qwen3-0.6B-MNLP_M2_mcqa_model |
|
|
|
|
|
This model is a fine-tuned version of [unsloth/Qwen3-0.6B-Base](https://huggingface.co/unsloth/Qwen3-0.6B-Base) on an unknown dataset. |
|
|
|
|
|
## Model description |
|
|
|
|
|
More information needed |
|
|
|
|
|
## Intended uses & limitations |
|
|
|
|
|
More information needed |
|
|
|
|
|
## Training and evaluation data |
|
|
|
|
|
Training was done on the training splits of |
|
|
- MEDMCQA |
|
|
- MMLU |
|
|
- Sciq |
|
|
- Ai2 Arc |
|
|
- Math_qa |
|
|
- ScienceQa |
|
|
- Openbookqa |
|
|
|
|
|
## Training procedure |
|
|
|
|
|
The procedure for training was to only leave the question that have only 4 choices to chose from, and from there we do the training |
|
|
by only grabbing the last logit form doing a feedforward on the whole prompt (question with choices) and we do cross entropy loss on this last logit with the 4 options to choose 4 from |
|
|
(so we don't do cross entyropy on the whole vocabulary we only do it on the tokens of the letters of the 4 options (A, B, C and D)) |
|
|
|
|
|
### Training hyperparameters |
|
|
|
|
|
The following hyperparameters were used during training: |
|
|
- learning_rate: 1e-05 |
|
|
- train_batch_size: 2 |
|
|
- eval_batch_size: 2 |
|
|
- seed: 42 |
|
|
- gradient_accumulation_steps: 32 |
|
|
- total_train_batch_size: 64 |
|
|
- optimizer: Use OptimizerNames.ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments |
|
|
- lr_scheduler_type: linear |
|
|
- lr_scheduler_warmup_ratio: 0.04 |
|
|
- num_epochs: 2 |
|
|
|
|
|
## Evaluation Results |
|
|
|
|
|
The model was evaluated on a suite of Multiple Choice Question Answering (MCQA) benchmarks (on its validation and test sets repsectively for each one), |
|
|
and NLP4education is only the approximated 1000 question and answers given to use. |
|
|
|
|
|
**Important Note on MCQA Evals Benchmark:** |
|
|
|
|
|
**The performance on these benchmarks is as follows**: |
|
|
|
|
|
### First evaluation: The tests where done with this prompt (type 5): |
|
|
``` |
|
|
This question assesses challenging STEM problems as found on graduate standardized tests. Carefully evaluate the options and select the correct answer. |
|
|
|
|
|
--- |
|
|
[Insert Question Here] |
|
|
--- |
|
|
[Insert Choices Here, e.g.: |
|
|
A. Option 1 |
|
|
B. Option 2 |
|
|
C. Option 3 |
|
|
D. Option 4] |
|
|
--- |
|
|
|
|
|
Your response should include the letter and the exact text of the correct choice. |
|
|
Example: B. Entropy increases. |
|
|
Answer: |
|
|
``` |
|
|
|
|
|
And the teseting was done on ``` [Letter]. [Text answer]``` |
|
|
|
|
|
| Benchmark | Accuracy (Acc) | Normalized Accuracy (Acc Norm) | |
|
|
| :----------------- | :------------- | :----------------------------- | |
|
|
| ARC Challenge | 66.28% | 64.92% | |
|
|
| ARC Easy | 84.22% | 81.33% | |
|
|
| GPQA | 38.84% | 36.61% | |
|
|
| Math QA | 25.03% | 24.67% | |
|
|
| MCQA Evals | 43.51% | 40.91% | |
|
|
| MMLU | 52.17% | 52.17% | |
|
|
| MMLU Pro | 16.45% | 15.04% | |
|
|
| MuSR | 53.17% | 52.25% | |
|
|
| NLP4Education | 44.45% | 42.65% | |
|
|
| **Overall** | **47.12%** | **45.62%** | |
|
|
|
|
|
### Second evaluation: (type 0) |
|
|
``` |
|
|
The following are multiple choice questions (with answers) about knowledge and skills in advanced master-level STEM courses. |
|
|
|
|
|
--- |
|
|
*[Insert Question Here]* |
|
|
--- |
|
|
*[Insert Choices Here, e.g.:* |
|
|
*A. Option 1* |
|
|
*B. Option 2* |
|
|
*C. Option 3* |
|
|
*D. Option 4]* |
|
|
--- |
|
|
Answer: |
|
|
``` |
|
|
|
|
|
And the teseting was done on ``` [Letter]. [Text answer]``` |
|
|
|
|
|
| Benchmark | Accuracy (Acc) | Normalized Accuracy (Acc Norm) | |
|
|
| :----------------- | :------------- | :----------------------------- | |
|
|
| ARC Challenge | 69.95% | 65.33% | |
|
|
| ARC Easy | 84.45% | 78.51% | |
|
|
| GPQA | 31.92% | 28.57% | |
|
|
| Math QA | 27.02% | 26.88% | |
|
|
| MCQA Evals | 43.90% | 35.32% | |
|
|
| MMLU | 52.17% | 52.17% | |
|
|
| MMLU Pro | 15.04% | 13.27% | |
|
|
| MuSR | 53.17% | 52.25% | |
|
|
| NLP4Education | 49.14% | 42.85% | |
|
|
| **Overall** | **47.42%** | **43.91%** | |
|
|
|
|
|
|
|
|
### Third evaluation: (type 2) |
|
|
``` |
|
|
|
|
|
This is part of an assessment on graduate-level science, technology, engineering, and mathematics (STEM) concepts. Each question is multiple-choice and requires a single correct answer. |
|
|
|
|
|
--- |
|
|
*[Insert Question Here]* |
|
|
--- |
|
|
*[Insert Choices Here, e.g.:* |
|
|
*A. Option 1* |
|
|
*B. Option 2* |
|
|
*C. Option 3* |
|
|
*D. Option 4]* |
|
|
--- |
|
|
For grading purposes, respond with: [LETTER]. [VERBATIM TEXT] |
|
|
Example: D. Planck constant |
|
|
Your Response: |
|
|
``` |
|
|
|
|
|
And the teseting was done on ``` [Letter]. [Text answer]``` |
|
|
|
|
|
| Benchmark | Accuracy (Acc) | Normalized Accuracy (Acc Norm) | |
|
|
| :----------------- | :------------- | :----------------------------- | |
|
|
| ARC Challenge | 55.34% | 55.34% | |
|
|
| ARC Easy | 74.00% | 74.00% | |
|
|
| GPQA | 29.69% | 29.69% | |
|
|
| Math QA | 22.35% | 22.35% | |
|
|
| MCQA Evals | 37.92% | 37.92% | |
|
|
| MMLU | 52.14% | 52.14% | |
|
|
| MMLU Pro | 12.98% | 12.98% | |
|
|
| MuSR | 53.04% | 53.04% | |
|
|
| NLP4Education | 36.36% | 36.36% | |
|
|
| **Overall** | **41.53%** | **41.53%** | |
|
|
|
|
|
|
|
|
### First evaluation: (type 0) |
|
|
``` |
|
|
The following are multiple choice questions (with answers) about knowledge and skills in advanced master-level STEM courses. |
|
|
|
|
|
--- |
|
|
*[Insert Question Here]* |
|
|
--- |
|
|
*[Insert Choices Here, e.g.:* |
|
|
*A. Option 1* |
|
|
*B. Option 2* |
|
|
*C. Option 3* |
|
|
*D. Option 4]* |
|
|
--- |
|
|
Answer: |
|
|
``` |
|
|
|
|
|
And the teseting was done on ``` [Letter]``` |
|
|
|
|
|
| Benchmark | Accuracy (Acc) | Normalized Accuracy (Acc Norm) | |
|
|
| :----------------- | :------------- | :----------------------------- | |
|
|
| ARC Challenge | 70.63% | 70.63% | |
|
|
| ARC Easy | 85.13% | 85.13% | |
|
|
| GPQA | 25.45% | 25.45% | |
|
|
| Math QA | 27.35% | 27.35% | |
|
|
| MCQA Evals | 45.97% | 45.97% | |
|
|
| MMLU | 52.14% | 52.14% | |
|
|
| MMLU Pro | 14.97% | 14.97% | |
|
|
| MuSR | 53.04% | 53.04% | |
|
|
| NLP4Education | 50.86% | 50.86% | |
|
|
| **Overall** | **47.28%** | **47.28%** | |
|
|
|
|
|
### Framework versions |
|
|
|
|
|
- Transformers 4.51.3 |
|
|
- Pytorch 2.5.1+cu121 |
|
|
- Datasets 3.6.0 |
|
|
- Tokenizers 0.21.0 |