--- library_name: transformers license: mit language: - en base_model: - microsoft/phi-2 --- # Model Card for ShAIkespear/Phi-2_DPO_M3_Quantized A **quantized (8-bit)**, **LoRA-finetuned** variant of **microsoft/phi-2** specialized for **multiple-choice question answering (MCQA)**, particularly in **STEM and general knowledge** domains. This model represents the final **Direct Preference Optimization (DPO)** stage of the *ShAIkespear* project, fine-tuned on both public MCQA datasets and EPFL preference-annotated data, then quantized to 8-bit for efficient inference and deployment. --- ## Model Details * **Developed by:** ShAIkespear team * **Shared by:** ShAIkespear team * **Model type:** Causal LM (Phi-2) with LoRA adapters; DPO-aligned and 8-bit quantized * **Languages:** English * **License:** MIT * **Finetuned from:** microsoft/phi-2 ### Model Sources * **Repository:** [2.8B-Phi-2-LLM-QA](https://github.com/EricSaikali/2.8B-Phi-2-LLM-QA) * **Report:** *“ShAIkespear – How to replace TAs: A comprehensive study on letting LLMs answer your questions”* --- ## Uses ### Direct Use * Lightweight, low-memory MCQA reasoning for STEM and general knowledge domains. * Educational tutoring or automated evaluation assistants following structured prompts. * Deployment on GPUs with limited VRAM (8-bit quantization reduces memory from ~11 GB → ~3 GB). ### Out-of-Scope Use * Critical decision-making (medical, legal, financial). * Long-form reasoning or open-ended creative writing. * Any application violating academic integrity or confidentiality of test materials. --- ## Bias, Risks, and Limitations * **Quantization trade-off:** Slight loss in accuracy compared to full-precision base model. * **STEM reasoning:** Difficult multi-step math/science questions may still yield near-random performance (~25 % accuracy). * **Alignment drift:** DPO may slightly overfit stylistic preferences or verbosity. ### Recommendations * Use structured prompts (`### Question → ### Explanation → ### Answer`) for best results. * Include human oversight for evaluation or teaching uses. * Avoid deployment where model-generated answers have direct consequences. --- ## How to Get Started ```python from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig bnb_cfg = BitsAndBytesConfig(load_in_8bit=True) tok = AutoTokenizer.from_pretrained("ShAIkespear/Phi-2_DPO_M3_Quantized", use_fast=True) model = AutoModelForCausalLM.from_pretrained( "ShAIkespear/Phi-2_DPO_M3_Quantized", device_map="auto", quantization_config=bnb_cfg ) prompt = "### Question: What planet is known as the Red Planet?\n### Explanation: Identify the planet with a reddish appearance.\n### Answer:" inputs = tok(prompt, return_tensors="pt").to(model.device) out = model.generate(**inputs, max_new_tokens=15) print(tok.decode(out[0], skip_special_tokens=True)) ``` --- ## Training Details ### Training Data * **SFT stage:** Mixed MCQA sets — MathQA, OpenBookQA, ScienceQA, TAL-SCQ5K, and EPFL-curated questions. * **DPO stage:** Human preference pairs (EPFL exams + HelpSteer-style pairs). * **Preprocessing:** Filtered to ≤512 tokens, unified MCQA schema. * **Split:** 50 % train, 25 % overfit test, 10 % comparison, 15 % quantization validation. ### Training Procedure * **Pipeline:** SFT → DPO → 8-bit quantization. * **LoRA:** rank = 16, α = 16, dropout = 0.05. * **Batch size:** 4 (SFT), 1 (DPO). * **Learning rates:** 1e-5 (public), 1e-4 (EPFL). * **Scheduler:** Cosine with warmup. * **Frameworks:** Hugging Face Transformers + TRL + PEFT + BitsAndBytes. --- ## Evaluation Summary * **Configuration:** “Balanced-then-DPO” (M3) achieved best overall performance. * **Accuracy:** ≈ 0.61 on MMLU (balanced set); STEM tasks lower (~0.25). * **Memory:** Reduced to ~3 GB with minor quality loss. * **Outcome:** Best trade-off between efficiency and alignment across ShAIkespear models. --- ## Technical Specifications * **Architecture:** Phi-2 (2.78 B parameters), decoder-only transformer. * **Objective:** SFT next-token prediction + DPO preference alignment. * **Quantization:** Post-training 8-bit (BitsAndBytes). * **Precision:** 8-bit integer with dynamic quantization layers. * **Software:** Hugging Face Transformers, TRL, PEFT, BitsAndBytes. --- ## Glossary * **MCQA:** Multiple-Choice Question Answering * **SFT:** Supervised Finetuning * **DPO:** Direct Preference Optimization * **LoRA:** Low-Rank Adaptation for efficient fine-tuning * **Quantization:** Reducing model precision for faster, memory-efficient inference