|
|
--- |
|
|
library_name: transformers |
|
|
license: mit |
|
|
language: |
|
|
- en |
|
|
base_model: |
|
|
- microsoft/phi-2 |
|
|
--- |
|
|
|
|
|
# Model Card for ShAIkespear/Phi-2_DPO_M3_Quantized |
|
|
|
|
|
A **quantized (8-bit)**, **LoRA-finetuned** variant of **microsoft/phi-2** specialized for **multiple-choice question answering (MCQA)**, particularly in **STEM and general knowledge** domains. |
|
|
This model represents the final **Direct Preference Optimization (DPO)** stage of the *ShAIkespear* project, fine-tuned on both public MCQA datasets and EPFL preference-annotated data, then quantized to 8-bit for efficient inference and deployment. |
|
|
|
|
|
--- |
|
|
|
|
|
## Model Details |
|
|
|
|
|
* **Developed by:** ShAIkespear team |
|
|
* **Shared by:** ShAIkespear team |
|
|
* **Model type:** Causal LM (Phi-2) with LoRA adapters; DPO-aligned and 8-bit quantized |
|
|
* **Languages:** English |
|
|
* **License:** MIT |
|
|
* **Finetuned from:** microsoft/phi-2 |
|
|
|
|
|
### Model Sources |
|
|
|
|
|
* **Repository:** [2.8B-Phi-2-LLM-QA](https://github.com/EricSaikali/2.8B-Phi-2-LLM-QA) |
|
|
* **Report:** *“ShAIkespear – How to replace TAs: A comprehensive study on letting LLMs answer your questions”* |
|
|
|
|
|
--- |
|
|
|
|
|
## Uses |
|
|
|
|
|
### Direct Use |
|
|
|
|
|
* Lightweight, low-memory MCQA reasoning for STEM and general knowledge domains. |
|
|
* Educational tutoring or automated evaluation assistants following structured prompts. |
|
|
* Deployment on GPUs with limited VRAM (8-bit quantization reduces memory from ~11 GB → ~3 GB). |
|
|
|
|
|
### Out-of-Scope Use |
|
|
|
|
|
* Critical decision-making (medical, legal, financial). |
|
|
* Long-form reasoning or open-ended creative writing. |
|
|
* Any application violating academic integrity or confidentiality of test materials. |
|
|
|
|
|
--- |
|
|
|
|
|
## Bias, Risks, and Limitations |
|
|
|
|
|
* **Quantization trade-off:** Slight loss in accuracy compared to full-precision base model. |
|
|
* **STEM reasoning:** Difficult multi-step math/science questions may still yield near-random performance (~25 % accuracy). |
|
|
* **Alignment drift:** DPO may slightly overfit stylistic preferences or verbosity. |
|
|
|
|
|
### Recommendations |
|
|
|
|
|
* Use structured prompts (`### Question → ### Explanation → ### Answer`) for best results. |
|
|
* Include human oversight for evaluation or teaching uses. |
|
|
* Avoid deployment where model-generated answers have direct consequences. |
|
|
|
|
|
--- |
|
|
|
|
|
## How to Get Started |
|
|
|
|
|
```python |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig |
|
|
|
|
|
bnb_cfg = BitsAndBytesConfig(load_in_8bit=True) |
|
|
tok = AutoTokenizer.from_pretrained("ShAIkespear/Phi-2_DPO_M3_Quantized", use_fast=True) |
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
"ShAIkespear/Phi-2_DPO_M3_Quantized", device_map="auto", quantization_config=bnb_cfg |
|
|
) |
|
|
|
|
|
prompt = "### Question: What planet is known as the Red Planet?\n### Explanation: Identify the planet with a reddish appearance.\n### Answer:" |
|
|
inputs = tok(prompt, return_tensors="pt").to(model.device) |
|
|
out = model.generate(**inputs, max_new_tokens=15) |
|
|
print(tok.decode(out[0], skip_special_tokens=True)) |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## Training Details |
|
|
|
|
|
### Training Data |
|
|
|
|
|
* **SFT stage:** Mixed MCQA sets — MathQA, OpenBookQA, ScienceQA, TAL-SCQ5K, and EPFL-curated questions. |
|
|
* **DPO stage:** Human preference pairs (EPFL exams + HelpSteer-style pairs). |
|
|
* **Preprocessing:** Filtered to ≤512 tokens, unified MCQA schema. |
|
|
* **Split:** 50 % train, 25 % overfit test, 10 % comparison, 15 % quantization validation. |
|
|
|
|
|
### Training Procedure |
|
|
|
|
|
* **Pipeline:** SFT → DPO → 8-bit quantization. |
|
|
* **LoRA:** rank = 16, α = 16, dropout = 0.05. |
|
|
* **Batch size:** 4 (SFT), 1 (DPO). |
|
|
* **Learning rates:** 1e-5 (public), 1e-4 (EPFL). |
|
|
* **Scheduler:** Cosine with warmup. |
|
|
* **Frameworks:** Hugging Face Transformers + TRL + PEFT + BitsAndBytes. |
|
|
|
|
|
--- |
|
|
|
|
|
## Evaluation Summary |
|
|
|
|
|
* **Configuration:** “Balanced-then-DPO” (M3) achieved best overall performance. |
|
|
* **Accuracy:** ≈ 0.61 on MMLU (balanced set); STEM tasks lower (~0.25). |
|
|
* **Memory:** Reduced to ~3 GB with minor quality loss. |
|
|
* **Outcome:** Best trade-off between efficiency and alignment across ShAIkespear models. |
|
|
|
|
|
--- |
|
|
|
|
|
## Technical Specifications |
|
|
|
|
|
* **Architecture:** Phi-2 (2.78 B parameters), decoder-only transformer. |
|
|
* **Objective:** SFT next-token prediction + DPO preference alignment. |
|
|
* **Quantization:** Post-training 8-bit (BitsAndBytes). |
|
|
* **Precision:** 8-bit integer with dynamic quantization layers. |
|
|
* **Software:** Hugging Face Transformers, TRL, PEFT, BitsAndBytes. |
|
|
|
|
|
--- |
|
|
|
|
|
## Glossary |
|
|
|
|
|
* **MCQA:** Multiple-Choice Question Answering |
|
|
* **SFT:** Supervised Finetuning |
|
|
* **DPO:** Direct Preference Optimization |
|
|
* **LoRA:** Low-Rank Adaptation for efficient fine-tuning |
|
|
* **Quantization:** Reducing model precision for faster, memory-efficient inference |