File size: 4,589 Bytes
e7126f9 f5981fa e7126f9 f5981fa e7126f9 f5981fa e7126f9 f5981fa e7126f9 f5981fa e7126f9 f5981fa e7126f9 f5981fa e7126f9 f5981fa e7126f9 f5981fa e7126f9 f5981fa e7126f9 f5981fa e7126f9 f5981fa e7126f9 f5981fa e7126f9 f5981fa e7126f9 f5981fa e7126f9 f5981fa e7126f9 f5981fa e7126f9 f5981fa e7126f9 f5981fa e7126f9 f5981fa e7126f9 f5981fa e7126f9 f5981fa e7126f9 f5981fa e7126f9 f5981fa e7126f9 f5981fa e7126f9 f5981fa e7126f9 f5981fa e7126f9 f5981fa | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 | ---
library_name: transformers
license: mit
language:
- en
base_model:
- microsoft/phi-2
---
# Model Card for ShAIkespear/Phi-2_DPO_M3_Quantized
A **quantized (8-bit)**, **LoRA-finetuned** variant of **microsoft/phi-2** specialized for **multiple-choice question answering (MCQA)**, particularly in **STEM and general knowledge** domains.
This model represents the final **Direct Preference Optimization (DPO)** stage of the *ShAIkespear* project, fine-tuned on both public MCQA datasets and EPFL preference-annotated data, then quantized to 8-bit for efficient inference and deployment.
---
## Model Details
* **Developed by:** ShAIkespear team
* **Shared by:** ShAIkespear team
* **Model type:** Causal LM (Phi-2) with LoRA adapters; DPO-aligned and 8-bit quantized
* **Languages:** English
* **License:** MIT
* **Finetuned from:** microsoft/phi-2
### Model Sources
* **Repository:** [2.8B-Phi-2-LLM-QA](https://github.com/EricSaikali/2.8B-Phi-2-LLM-QA)
* **Report:** *“ShAIkespear – How to replace TAs: A comprehensive study on letting LLMs answer your questions”*
---
## Uses
### Direct Use
* Lightweight, low-memory MCQA reasoning for STEM and general knowledge domains.
* Educational tutoring or automated evaluation assistants following structured prompts.
* Deployment on GPUs with limited VRAM (8-bit quantization reduces memory from ~11 GB → ~3 GB).
### Out-of-Scope Use
* Critical decision-making (medical, legal, financial).
* Long-form reasoning or open-ended creative writing.
* Any application violating academic integrity or confidentiality of test materials.
---
## Bias, Risks, and Limitations
* **Quantization trade-off:** Slight loss in accuracy compared to full-precision base model.
* **STEM reasoning:** Difficult multi-step math/science questions may still yield near-random performance (~25 % accuracy).
* **Alignment drift:** DPO may slightly overfit stylistic preferences or verbosity.
### Recommendations
* Use structured prompts (`### Question → ### Explanation → ### Answer`) for best results.
* Include human oversight for evaluation or teaching uses.
* Avoid deployment where model-generated answers have direct consequences.
---
## How to Get Started
```python
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
bnb_cfg = BitsAndBytesConfig(load_in_8bit=True)
tok = AutoTokenizer.from_pretrained("ShAIkespear/Phi-2_DPO_M3_Quantized", use_fast=True)
model = AutoModelForCausalLM.from_pretrained(
"ShAIkespear/Phi-2_DPO_M3_Quantized", device_map="auto", quantization_config=bnb_cfg
)
prompt = "### Question: What planet is known as the Red Planet?\n### Explanation: Identify the planet with a reddish appearance.\n### Answer:"
inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=15)
print(tok.decode(out[0], skip_special_tokens=True))
```
---
## Training Details
### Training Data
* **SFT stage:** Mixed MCQA sets — MathQA, OpenBookQA, ScienceQA, TAL-SCQ5K, and EPFL-curated questions.
* **DPO stage:** Human preference pairs (EPFL exams + HelpSteer-style pairs).
* **Preprocessing:** Filtered to ≤512 tokens, unified MCQA schema.
* **Split:** 50 % train, 25 % overfit test, 10 % comparison, 15 % quantization validation.
### Training Procedure
* **Pipeline:** SFT → DPO → 8-bit quantization.
* **LoRA:** rank = 16, α = 16, dropout = 0.05.
* **Batch size:** 4 (SFT), 1 (DPO).
* **Learning rates:** 1e-5 (public), 1e-4 (EPFL).
* **Scheduler:** Cosine with warmup.
* **Frameworks:** Hugging Face Transformers + TRL + PEFT + BitsAndBytes.
---
## Evaluation Summary
* **Configuration:** “Balanced-then-DPO” (M3) achieved best overall performance.
* **Accuracy:** ≈ 0.61 on MMLU (balanced set); STEM tasks lower (~0.25).
* **Memory:** Reduced to ~3 GB with minor quality loss.
* **Outcome:** Best trade-off between efficiency and alignment across ShAIkespear models.
---
## Technical Specifications
* **Architecture:** Phi-2 (2.78 B parameters), decoder-only transformer.
* **Objective:** SFT next-token prediction + DPO preference alignment.
* **Quantization:** Post-training 8-bit (BitsAndBytes).
* **Precision:** 8-bit integer with dynamic quantization layers.
* **Software:** Hugging Face Transformers, TRL, PEFT, BitsAndBytes.
---
## Glossary
* **MCQA:** Multiple-Choice Question Answering
* **SFT:** Supervised Finetuning
* **DPO:** Direct Preference Optimization
* **LoRA:** Low-Rank Adaptation for efficient fine-tuning
* **Quantization:** Reducing model precision for faster, memory-efficient inference |