Model Card for ShAIkespear/Phi-2_DPO_M3_Quantized
A quantized (8-bit), LoRA-finetuned variant of microsoft/phi-2 specialized for multiple-choice question answering (MCQA), particularly in STEM and general knowledge domains.
This model represents the final Direct Preference Optimization (DPO) stage of the ShAIkespear project, fine-tuned on both public MCQA datasets and EPFL preference-annotated data, then quantized to 8-bit for efficient inference and deployment.
Model Details
- Developed by: ShAIkespear team
- Shared by: ShAIkespear team
- Model type: Causal LM (Phi-2) with LoRA adapters; DPO-aligned and 8-bit quantized
- Languages: English
- License: MIT
- Finetuned from: microsoft/phi-2
Model Sources
- Repository: 2.8B-Phi-2-LLM-QA
- Report: “ShAIkespear – How to replace TAs: A comprehensive study on letting LLMs answer your questions”
Uses
Direct Use
- Lightweight, low-memory MCQA reasoning for STEM and general knowledge domains.
- Educational tutoring or automated evaluation assistants following structured prompts.
- Deployment on GPUs with limited VRAM (8-bit quantization reduces memory from ~11 GB → ~3 GB).
Out-of-Scope Use
- Critical decision-making (medical, legal, financial).
- Long-form reasoning or open-ended creative writing.
- Any application violating academic integrity or confidentiality of test materials.
Bias, Risks, and Limitations
- Quantization trade-off: Slight loss in accuracy compared to full-precision base model.
- STEM reasoning: Difficult multi-step math/science questions may still yield near-random performance (~25 % accuracy).
- Alignment drift: DPO may slightly overfit stylistic preferences or verbosity.
Recommendations
- Use structured prompts (
### Question → ### Explanation → ### Answer) for best results.
- Include human oversight for evaluation or teaching uses.
- Avoid deployment where model-generated answers have direct consequences.
How to Get Started
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
bnb_cfg = BitsAndBytesConfig(load_in_8bit=True)
tok = AutoTokenizer.from_pretrained("ShAIkespear/Phi-2_DPO_M3_Quantized", use_fast=True)
model = AutoModelForCausalLM.from_pretrained(
"ShAIkespear/Phi-2_DPO_M3_Quantized", device_map="auto", quantization_config=bnb_cfg
)
prompt = "### Question: What planet is known as the Red Planet?\n### Explanation: Identify the planet with a reddish appearance.\n### Answer:"
inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=15)
print(tok.decode(out[0], skip_special_tokens=True))
Training Details
Training Data
- SFT stage: Mixed MCQA sets — MathQA, OpenBookQA, ScienceQA, TAL-SCQ5K, and EPFL-curated questions.
- DPO stage: Human preference pairs (EPFL exams + HelpSteer-style pairs).
- Preprocessing: Filtered to ≤512 tokens, unified MCQA schema.
- Split: 50 % train, 25 % overfit test, 10 % comparison, 15 % quantization validation.
Training Procedure
- Pipeline: SFT → DPO → 8-bit quantization.
- LoRA: rank = 16, α = 16, dropout = 0.05.
- Batch size: 4 (SFT), 1 (DPO).
- Learning rates: 1e-5 (public), 1e-4 (EPFL).
- Scheduler: Cosine with warmup.
- Frameworks: Hugging Face Transformers + TRL + PEFT + BitsAndBytes.
Evaluation Summary
- Configuration: “Balanced-then-DPO” (M3) achieved best overall performance.
- Accuracy: ≈ 0.61 on MMLU (balanced set); STEM tasks lower (~0.25).
- Memory: Reduced to ~3 GB with minor quality loss.
- Outcome: Best trade-off between efficiency and alignment across ShAIkespear models.
Technical Specifications
- Architecture: Phi-2 (2.78 B parameters), decoder-only transformer.
- Objective: SFT next-token prediction + DPO preference alignment.
- Quantization: Post-training 8-bit (BitsAndBytes).
- Precision: 8-bit integer with dynamic quantization layers.
- Software: Hugging Face Transformers, TRL, PEFT, BitsAndBytes.
Glossary
- MCQA: Multiple-Choice Question Answering
- SFT: Supervised Finetuning
- DPO: Direct Preference Optimization
- LoRA: Low-Rank Adaptation for efficient fine-tuning
- Quantization: Reducing model precision for faster, memory-efficient inference