EdNA / README.md

Update README.md

822186a verified 3 months ago

6.8 kB

	---
	library_name: transformers
	base_model:
	- Qwen/Qwen3-0.6B-Base
	---

	<img src="https://cdn-uploads.huggingface.co/production/uploads/670b7242705db29c00451666/tgCqFZJAKtl-rw-7csI-b.png" width="500" height="300">


	# EdNa: Educational Nimble Assistant (MCQA Model)

	This is the official Hugging Face model card for the Multiple-Choice Question Answering (MCQA) version of EdNa (Educational Nimble Assistant), an AI tutor specialized for STEM subjects.

	This model was developed by Lysandre Costes, Hassen Aissa, Levin Hertrich and Yassine Turki.

	Github link: https://github.com/HassenAissa/EdNA

	## Model Description

	EdNa is an AI tutor fine-tuned to excel at answering multiple-choice questions in STEM fields. It is designed to provide accurate and consistently formatted answers, making it a reliable tool for educational applications.

	This model is the result of a two-stage training pipeline built upon the `Qwen/Qwen2-0.5B-Instruct` base model:

	1. Supervised Fine-Tuning (SFT): The base model was first fine-tuned on a rich mixture of STEM-focused datasets (mathematics, abstract algebra, coding) and general instruction-following datasets. This SFT stage built a strong foundation in scientific topics and conversational structure, preventing catastrophic forgetting.

	2. Reinforcement Learning with Verifiable Reward (RLVR): To master the MCQA format, the SFT model was further trained using RLVR. This stage employed a specific reward scheme to shape the model's behavior:
	* `+1.0` reward for generating the correct answer.
	* `-1.0` penalty for generating an incorrect answer.
	* `+0.5` reward for adhering to the required output format (i.e., outputting only the correct letter).

	This process pushes the model to not only identify the correct solution but also to present it in a clean, predictable format, making it "nimble" and easy to integrate into downstream applications.

	## Intended Uses & Limitations

	### Intended Use

	EdNa is primarily intended as an educational tool for STEM students. Its main use case is zero-shot Multiple-Choice Question Answering. It can be integrated into applications like:

	* AI-powered tutoring platforms
	* Interactive study aids
	* Automated quiz generators and checkers

	The model is trained to receive a question and a set of multiple-choice options and output only the letter corresponding to the correct answer.

	### Limitations and Bias

	* Language: EdNa is trained exclusively on English data and will not perform well in other languages.
	* Domain: The model is highly specialized for STEM subjects. Using it for non-STEM topics may lead to a higher rate of hallucinations and incorrect answers.
	* Potential for Misuse: Like any educational tool, EdNa could be misused for academic dishonesty (e.g., cheating on exams). We recommend its use as a learning aid rather than an answer key.
	* Knowledge Cutoff: The model's knowledge is static and based on its training data. It is not aware of information or developments beyond its training date.

	## How to Get Started

	You can use the `transformers` library to easily run EdNa. Since the model is trained to provide a concise answer, the generation parameters should be set accordingly.

	```python
	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_id = "HAissa/EdNA"

	# Load the model and tokenizer
	model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
	tokenizer = AutoTokenizer.from_pretrained(model_id)

	# --- Example 1: Math Question ---
	question = "What is the derivative of x^2 with respect to x?"
	options = "A) 2x\nB) x\nC) x^2\nD) 2"

	prompt = f"Question: {question}\nOptions:\n{options}\nAnswer:"

	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

	# Generate the answer
	# EdNa is trained to be concise, so a low max_new_tokens is sufficient.
	outputs = model.generate(**inputs, max_new_tokens=3)
	answer_text = tokenizer.decode(outputs[0], skip_special_tokens=True)

	# The model is trained to output the correct letter in the first line.
	# We can parse it like this:
	final_answer = answer_text.split("Answer:")[1].strip().split('\n')[0]

	print(f"Question: {question}")
	print(f"Final Answer: {final_answer}")
	# Expected Output: A

	# --- Example 2: Science Question ---
	question = "Which of the following is a noble gas?"
	options = "A) Oxygen\nB) Nitrogen\nC) Argon\nD) Carbon Dioxide"

	prompt = f"Question: {question}\nOptions:\n{options}\nAnswer:"
	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
	outputs = model.generate(**inputs, max_new_tokens=3)
	answer_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
	final_answer = answer_text.split("Answer:")[1].strip().split('\n')[0]

	print(f"Question: {question}")
	print(f"Final Answer: {final_answer}")
	# Expected Output: C
	```

	## Evaluation Results

	EdNa's two-stage training process results in significant performance gains over the base model, particularly in reasoning-intensive tasks. The Output Correctness (OC) metric measures the percentage of questions where the model generates the exact correct option in a zero-shot setting.

	The table below shows the clear progression in performance from the base model, through the SFT stage, to the final RLVR-tuned EdNa model.

	\| Model \| SciQ (OC) \| MMLU (OC) \| AquaRat (OC) \| MMLU PRO (Likelihood) \|
	\|-------------------\|-----------\|-----------\|--------------\|-----------------------\|
	\| Qwen 0.6B Base \| 18.9% \| 4.4% \| 2.5% \| 19.0% \|
	\| Qwen SFT \| 77.0% \| 34.9% \| 19.5% \| 20.0% \|
	\| EdNa (SFT+RLVR) \| 84.0% \| 42.4% \| 34.1% \| 22.7% \|

	The results highlight:
	* Effectiveness of RLVR: The reinforcement learning stage dramatically improves performance on all benchmarks, especially on the math reasoning dataset AquaRat (from 19.5% to 34.1%).
	* Reliable Formatting: The training method teaches the model to answer MCQs correctly and in the proper format, boosting the Output Correctness metric significantly over the base model.
	* Strong Generalization: The model shows improved reasoning capabilities on the challenging MMLU-PRO benchmark.

	## Training Data

	EdNa was trained on a diverse corpus of data to ensure robust STEM and instruction-following capabilities.

	### SFT Stage
	A mixture of datasets including:
	* Math, abstract algebra, and coding subsets from Tulu3 SFT.
	* Math questions from various Stack Exchange sites (stackmathqa2024).
	* General STEM MCQ training splits and instruction-following datasets.
	* A Chain-of-Thought (CoT) dataset to improve reasoning.

	### RLVR Stage
	Utilized the MCQ datasets listed above, with rewards based on the correctness of the answer and format.