--- library_name: transformers license: mit language: - en base_model: - microsoft/phi-2 --- # Model Card for ShAIkespear/Phi-2_DPO_M3_Quantized_Alt A **4-bit (NF4)**, **LoRA-finetuned**, **DPO-aligned** variant of **microsoft/phi-2** specialized for **multiple-choice question answering (MCQA)** in **STEM and general knowledge**. This **Alt** checkpoint is the memory-efficient counterpart to the unquantized M3 Base Alt model: same SFT → DPO training, then **post-training 4-bit quantization** for fast, low-VRAM inference. --- ## Model Details * **Developed by:** ShAIkespear team * **Shared by:** ShAIkespear team * **Model type:** Causal LM (Phi-2) with LoRA adapters; DPO-aligned; **4-bit NF4** quantized * **Languages:** English * **License:** MIT * **Finetuned from:** microsoft/phi-2 ### Model Sources * **Repository:** [2.8B-Phi-2-LLM-QA](https://github.com/EricSaikali/2.8B-Phi-2-LLM-QA) * **Report:** *“ShAIkespear – How to replace TAs: A comprehensive study on letting LLMs answer your questions”* --- ## Uses ### Direct Use * MCQA inference for STEM & general knowledge (MMLU/ScienceQA style). * Educational assistants and lightweight evaluation tools on **low-VRAM GPUs**. ### Out-of-Scope Use * Safety-critical domains (medical/legal/financial) without human oversight. * Long-form creative writing or tasks far from MCQA. * Any misuse involving exam integrity or confidential assessments. --- ## Bias, Risks, and Limitations * **Quantization trade-offs:** Small accuracy drop vs. full-precision; bigger memory savings than 8-bit. * **STEM difficulty:** Multi-step reasoning can remain challenging. * **Alignment bias:** DPO style preferences may influence verbosity/format. ### Recommendations * Use the structured prompt format: ``` ### Question ... ### Explanation ... ### Answer: ``` * Keep a human in the loop for teaching/grading. * Prefer the **M3 Base Alt** (full precision) for further fine-tuning; use this **4-bit Alt** for deployment. --- ## How to Get Started ```python from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig model_id = "ShAIkespear/Phi-2_DPO_M3_Quantized_Alt" bnb_cfg = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_use_double_quant=True, # often improves stability bnb_4bit_compute_dtype="bfloat16" # or "float16" depending on your GPU ) tok = AutoTokenizer.from_pretrained(model_id, use_fast=True) model = AutoModelForCausalLM.from_pretrained( model_id, device_map="auto", quantization_config=bnb_cfg ) prompt = "### Question: Which planet is known as the Red Planet?\n### Explanation: Identify the planet with the reddish appearance.\n### Answer:" inputs = tok(prompt, return_tensors="pt").to(model.device) out = model.generate(**inputs, max_new_tokens=15) print(tok.decode(out[0], skip_special_tokens=True)) ``` --- ## Training Details ### Data (SFT → DPO) * **SFT:** Mixed MCQA (MathQA, OpenBookQA, ScienceQA, TAL-SCQ5K) + EPFL MCQA; unified schema; ≤512 tokens; per-dataset caps. * **DPO:** EPFL preference pairs + public preference data (chosen vs. rejected responses). ### Procedure & Hyperparameters * **Pipeline:** SFT → DPO → **4-bit (NF4) quantization**. * **LoRA:** rank=16, α=16, dropout=0.05. * **Batch sizes:** 4 (SFT), 1 (DPO). * **LR:** 1e-5 (public), 1e-4 (EPFL); cosine schedule w/ warmup. * **Frameworks:** HF Transformers, TRL, PEFT (LoRA), bitsandbytes. --- ## Evaluation Summary * **Configuration:** Balanced-then-DPO (**M3 Alt**). * **Efficiency:** Fits comfortably on mid-range GPUs thanks to **4-bit** weights; faster/lighter than 8-bit with a modest accuracy trade-off vs. full precision. * **Use case:** Best when **VRAM is tight** and you want DPO-aligned behavior with structured MCQA prompts. --- ## Technical Specifications * **Architecture:** Phi-2 (~2.78B params), decoder-only transformer. * **Objective:** SFT next-token prediction + DPO preference alignment. * **Quantization:** **4-bit NF4** (bitsandbytes) with optional double quantization; compute in bf16/fp16. * **Precision:** Quantized 4-bit runtime. --- ## Glossary * **MCQA:** Multiple-Choice Question Answering * **SFT:** Supervised Finetuning * **DPO:** Direct Preference Optimization * **LoRA:** Low-Rank Adaptation * **NF4:** NormalFloat-4 quantization format (bnb) for 4-bit weight quantization