|
|
--- |
|
|
library_name: transformers |
|
|
base_model: |
|
|
- Qwen/Qwen3-0.6B-Base |
|
|
--- |
|
|
|
|
|
<img src="https://cdn-uploads.huggingface.co/production/uploads/670b7242705db29c00451666/tgCqFZJAKtl-rw-7csI-b.png" width="500" height="300"> |
|
|
|
|
|
|
|
|
# EdNa: Educational Nimble Assistant (MCQA Model) |
|
|
|
|
|
This is the official Hugging Face model card for the Multiple-Choice Question Answering (MCQA) version of EdNa (Educational Nimble Assistant), an AI tutor specialized for STEM subjects. |
|
|
|
|
|
This model was developed by Lysandre Costes, Hassen Aissa, Levin Hertrich and Yassine Turki. |
|
|
|
|
|
Github link: https://github.com/HassenAissa/EdNA |
|
|
|
|
|
## Model Description |
|
|
|
|
|
EdNa is an AI tutor fine-tuned to excel at answering multiple-choice questions in STEM fields. It is designed to provide accurate and consistently formatted answers, making it a reliable tool for educational applications. |
|
|
|
|
|
This model is the result of a two-stage training pipeline built upon the `Qwen/Qwen2-0.5B-Instruct` base model: |
|
|
|
|
|
1. **Supervised Fine-Tuning (SFT):** The base model was first fine-tuned on a rich mixture of STEM-focused datasets (mathematics, abstract algebra, coding) and general instruction-following datasets. This SFT stage built a strong foundation in scientific topics and conversational structure, preventing catastrophic forgetting. |
|
|
|
|
|
2. **Reinforcement Learning with Verifiable Reward (RLVR):** To master the MCQA format, the SFT model was further trained using RLVR. This stage employed a specific reward scheme to shape the model's behavior: |
|
|
* `+1.0` reward for generating the correct answer. |
|
|
* `-1.0` penalty for generating an incorrect answer. |
|
|
* `+0.5` reward for adhering to the required output format (i.e., outputting only the correct letter). |
|
|
|
|
|
This process pushes the model to not only identify the correct solution but also to present it in a clean, predictable format, making it "nimble" and easy to integrate into downstream applications. |
|
|
|
|
|
## Intended Uses & Limitations |
|
|
|
|
|
### Intended Use |
|
|
|
|
|
EdNa is primarily intended as an educational tool for STEM students. Its main use case is zero-shot Multiple-Choice Question Answering. It can be integrated into applications like: |
|
|
|
|
|
* AI-powered tutoring platforms |
|
|
* Interactive study aids |
|
|
* Automated quiz generators and checkers |
|
|
|
|
|
The model is trained to receive a question and a set of multiple-choice options and output only the letter corresponding to the correct answer. |
|
|
|
|
|
### Limitations and Bias |
|
|
|
|
|
* **Language:** EdNa is trained exclusively on English data and will not perform well in other languages. |
|
|
* **Domain:** The model is highly specialized for STEM subjects. Using it for non-STEM topics may lead to a higher rate of hallucinations and incorrect answers. |
|
|
* **Potential for Misuse:** Like any educational tool, EdNa could be misused for academic dishonesty (e.g., cheating on exams). We recommend its use as a learning aid rather than an answer key. |
|
|
* **Knowledge Cutoff:** The model's knowledge is static and based on its training data. It is not aware of information or developments beyond its training date. |
|
|
|
|
|
## How to Get Started |
|
|
|
|
|
You can use the `transformers` library to easily run EdNa. Since the model is trained to provide a concise answer, the generation parameters should be set accordingly. |
|
|
|
|
|
```python |
|
|
import torch |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
|
|
model_id = "HAissa/EdNA" |
|
|
|
|
|
# Load the model and tokenizer |
|
|
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto") |
|
|
tokenizer = AutoTokenizer.from_pretrained(model_id) |
|
|
|
|
|
# --- Example 1: Math Question --- |
|
|
question = "What is the derivative of x^2 with respect to x?" |
|
|
options = "A) 2x\nB) x\nC) x^2\nD) 2" |
|
|
|
|
|
prompt = f"Question: {question}\nOptions:\n{options}\nAnswer:" |
|
|
|
|
|
inputs = tokenizer(prompt, return_tensors="pt").to(model.device) |
|
|
|
|
|
# Generate the answer |
|
|
# EdNa is trained to be concise, so a low max_new_tokens is sufficient. |
|
|
outputs = model.generate(**inputs, max_new_tokens=3) |
|
|
answer_text = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
|
|
|
|
# The model is trained to output the correct letter in the first line. |
|
|
# We can parse it like this: |
|
|
final_answer = answer_text.split("Answer:")[1].strip().split('\n')[0] |
|
|
|
|
|
print(f"Question: {question}") |
|
|
print(f"Final Answer: {final_answer}") |
|
|
# Expected Output: A |
|
|
|
|
|
# --- Example 2: Science Question --- |
|
|
question = "Which of the following is a noble gas?" |
|
|
options = "A) Oxygen\nB) Nitrogen\nC) Argon\nD) Carbon Dioxide" |
|
|
|
|
|
prompt = f"Question: {question}\nOptions:\n{options}\nAnswer:" |
|
|
inputs = tokenizer(prompt, return_tensors="pt").to(model.device) |
|
|
outputs = model.generate(**inputs, max_new_tokens=3) |
|
|
answer_text = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
|
final_answer = answer_text.split("Answer:")[1].strip().split('\n')[0] |
|
|
|
|
|
print(f"Question: {question}") |
|
|
print(f"Final Answer: {final_answer}") |
|
|
# Expected Output: C |
|
|
``` |
|
|
|
|
|
## Evaluation Results |
|
|
|
|
|
EdNa's two-stage training process results in significant performance gains over the base model, particularly in reasoning-intensive tasks. The Output Correctness (OC) metric measures the percentage of questions where the model generates the exact correct option in a zero-shot setting. |
|
|
|
|
|
The table below shows the clear progression in performance from the base model, through the SFT stage, to the final RLVR-tuned EdNa model. |
|
|
|
|
|
| Model | SciQ (OC) | MMLU (OC) | AquaRat (OC) | MMLU PRO (Likelihood) | |
|
|
|-------------------|-----------|-----------|--------------|-----------------------| |
|
|
| Qwen 0.6B Base | 18.9% | 4.4% | 2.5% | 19.0% | |
|
|
| Qwen SFT | 77.0% | 34.9% | 19.5% | 20.0% | |
|
|
| EdNa (SFT+RLVR) | 84.0% | 42.4% | 34.1% | 22.7% | |
|
|
|
|
|
The results highlight: |
|
|
* **Effectiveness of RLVR:** The reinforcement learning stage dramatically improves performance on all benchmarks, especially on the math reasoning dataset AquaRat (from 19.5% to 34.1%). |
|
|
* **Reliable Formatting:** The training method teaches the model to answer MCQs correctly and in the proper format, boosting the Output Correctness metric significantly over the base model. |
|
|
* **Strong Generalization:** The model shows improved reasoning capabilities on the challenging MMLU-PRO benchmark. |
|
|
|
|
|
## Training Data |
|
|
|
|
|
EdNa was trained on a diverse corpus of data to ensure robust STEM and instruction-following capabilities. |
|
|
|
|
|
### SFT Stage |
|
|
A mixture of datasets including: |
|
|
* Math, abstract algebra, and coding subsets from Tulu3 SFT. |
|
|
* Math questions from various Stack Exchange sites (stackmathqa2024). |
|
|
* General STEM MCQ training splits and instruction-following datasets. |
|
|
* A Chain-of-Thought (CoT) dataset to improve reasoning. |
|
|
|
|
|
### RLVR Stage |
|
|
Utilized the MCQ datasets listed above, with rewards based on the correctness of the answer and format. |