youssefbelghmi commited on
Commit
4eb2fec
·
verified ·
1 Parent(s): ccd6e16

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +72 -20
README.md CHANGED
@@ -10,41 +10,90 @@ tags:
10
  licence: license
11
  ---
12
 
13
- # Model Card for MNLP_M3_dpo_mcqa_model
14
 
15
- This model is a fine-tuned version of [tocico28/MNLP_M3_dpo_model](https://huggingface.co/tocico28/MNLP_M3_dpo_model) on the [youssefbelghmi/MNLP_M3_mcqa_dataset](https://huggingface.co/datasets/youssefbelghmi/MNLP_M3_mcqa_dataset) dataset.
16
- It has been trained using [TRL](https://github.com/huggingface/trl).
17
 
18
- ## Quick start
19
 
20
- ```python
21
- from transformers import pipeline
22
 
23
- question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
24
- generator = pipeline("text-generation", model="youssefbelghmi/MNLP_M3_dpo_mcqa_model", device="cuda")
25
- output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
26
- print(output["generated_text"])
27
- ```
28
 
29
- ## Training procedure
30
 
31
-
32
 
 
 
 
 
 
 
 
33
 
34
- This model was trained with SFT.
35
 
36
- ### Framework versions
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
38
  - TRL: 0.17.0
39
  - Transformers: 4.53.0.dev0
40
  - Pytorch: 2.7.0
41
  - Datasets: 3.2.0
42
  - Tokenizers: 0.21.0
43
-
44
  ## Citations
45
-
46
-
47
-
48
  Cite TRL as:
49
 
50
  ```bibtex
@@ -56,4 +105,7 @@ Cite TRL as:
56
  publisher = {GitHub},
57
  howpublished = {\url{https://github.com/huggingface/trl}}
58
  }
59
- ```
 
 
 
 
10
  licence: license
11
  ---
12
 
13
+ # MNLP M3 MCQA Model (Qwen3-0.6B fine-tuned)
14
 
15
+ This model is a fine-tuned version of [tocico28/MNLP_M3_dpo_model](https://huggingface.co/tocico28/MNLP_M3_dpo_model) on the [youssefbelghmi/MNLP_M3_mcqa_dataset](https://huggingface.co/datasets/youssefbelghmi/MNLP_M3_mcqa_dataset) dataset, a large-scale collection of multiple-choice questions designed for evaluating and training models in **STEM** domains (science, math, engineering, medicine, etc.).
 
16
 
17
+ The [tocico28/MNLP_M3_dpo_model](https://huggingface.co/tocico28/MNLP_M3_dpo_model) model is itself a fine-tuned version of **Qwen/Qwen3-0.6B-Base** using a dataset of preference-labeled STEM response pairs collected through a collaborative classroom annotation effort.
18
 
19
+ It has been trained using [TRL](https://github.com/huggingface/trl) as part of the final milestone of the **CS-552: Modern NLP** course at EPFL (Spring 2025).
 
20
 
21
+ ## Task
 
 
 
 
22
 
23
+ **Multiple-Choice Question Answering (MCQA):** Given a question and four answer options (A–D), the model must complete the prompt with the correct option letter only (e.g., `A`, `B`, `C`, or `D`). It was trained with rationales during supervision but outputs only the letter during inference, making it compatible with evaluation frameworks such as LightEval.
24
 
25
+ ## Training Dataset
26
 
27
+ - **Dataset:** [`youssefbelghmi/MNLP_M3_mcqa_dataset`](https://huggingface.co/datasets/youssefbelghmi/MNLP_M3_mcqa_dataset).
28
+ - ~30,000 questions from SciQ, OpenBookQA, MathQA, ARC, and MedMCQA.
29
+ - Each sample includes in particular:
30
+ - question,
31
+ - four answer choices (A–D),
32
+ - the correct answer as a letter,
33
+ - a short explanation (`support`) to guide learning.
34
 
35
+ ## Training Setup
36
 
37
+ - **Base model:** `Qwen/Qwen3-0.6B-Base`.
38
+ - **Method:** Supervised Fine-Tuning (SFT) with `trl` and `SFTTrainer`.
39
+ - **Tokenizer:** AutoTokenizer (with `eos_token` used as padding).
40
+
41
+ ## Training Prompt Format
42
+
43
+ During fine-tuning, each training example is converted into a prompt-completion pair. The prompt includes both the question and an explanation to guide the model’s reasoning:
44
+
45
+ ```text
46
+ The following is a multiple-choice question (with answers) about knowledge and skills in advanced master's-level STEM fields.
47
+ You will be provided with an explanation to help you understand the correct answer.
48
+ Select the correct answer by replying with the option letter (A, B, C, or D) only.
49
+ Question: <question_text>
50
+ A. <option_A>
51
+ B. <option_B>
52
+ C. <option_C>
53
+ D. <option_D>
54
+ Explanation: <support_text>
55
+ Answer:
56
+ ```
57
+
58
+ The completion is a single token: " A", " B", " C", or " D", corresponding to the correct answer.
59
 
60
+ ## Training hyperparameters
61
+
62
+ The following hyperparameters were used during training:
63
+ - learning_rate: 2e-5
64
+ - num_train_epochs: 1
65
+ - per_device_train_batch_size: 4
66
+ - per_device_eval_batch_size: 4
67
+ - gradient_accumulation_steps: 4
68
+ - gradient_checkpointing: true
69
+ - eval_strategy: steps
70
+ - eval_steps: 100
71
+ - logging_steps: 100
72
+ ## Training Results
73
+ | Epoch | Training Loss | Validation Loss |
74
+ |--------:|----------------:|------------------:|
75
+ | 0.08 | 0.3461 | 0.2748 |
76
+ | 0.15 | 0.2881 | 0.2666 |
77
+ | 0.23 | 0.2938 | 0.2661 |
78
+ | 0.31 | 0.2741 | 0.26 |
79
+ | 0.38 | 0.2684 | 0.257 |
80
+ | 0.46 | 0.2603 | 0.2539 |
81
+ | 0.54 | 0.2635 | 0.2441 |
82
+ | 0.61 | 0.2555 | 0.2457 |
83
+ | 0.69 | 0.2459 | 0.2414 |
84
+ | 0.77 | 0.2383 | 0.2353 |
85
+ | 0.84 | 0.2266 | 0.2337 |
86
+ | 0.92 | 0.2112 | 0.2338 |
87
+ | 0.99 | 0.211 | 0.2335 |
88
+ - **Final training loss:** 0.211
89
+ - **Final validation accuracy:** ~92.0%
90
+ ### Framework versions
91
  - TRL: 0.17.0
92
  - Transformers: 4.53.0.dev0
93
  - Pytorch: 2.7.0
94
  - Datasets: 3.2.0
95
  - Tokenizers: 0.21.0
 
96
  ## Citations
 
 
 
97
  Cite TRL as:
98
 
99
  ```bibtex
 
105
  publisher = {GitHub},
106
  howpublished = {\url{https://github.com/huggingface/trl}}
107
  }
108
+ ```
109
+ ## Author
110
+ Developed by [**Youssef Belghmi**](https://huggingface.co/youssefbelghmi)
111
+ CS-552: Modern NLP – EPFL, Spring 2025