File size: 4,388 Bytes
f5e8d60 c655975 f5e8d60 c655975 f5e8d60 c655975 f5e8d60 c655975 f5e8d60 c655975 f5e8d60 c655975 f5e8d60 c655975 f5e8d60 c655975 f5e8d60 c655975 f5e8d60 c655975 f5e8d60 c655975 f5e8d60 c655975 f5e8d60 c655975 f5e8d60 c655975 f5e8d60 c655975 f5e8d60 c655975 f5e8d60 c655975 f5e8d60 c655975 f5e8d60 c655975 f5e8d60 c655975 f5e8d60 c655975 f5e8d60 c655975 f5e8d60 c655975 f5e8d60 c655975 f5e8d60 c655975 f5e8d60 c655975 f5e8d60 c655975 f5e8d60 c655975 f5e8d60 c655975 f5e8d60 c655975 f5e8d60 c655975 f5e8d60 c655975 f5e8d60 c655975 f5e8d60 c655975 f5e8d60 c655975 f5e8d60 c655975 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 | ---
library_name: transformers
license: mit
language:
- en
base_model:
- microsoft/phi-2
---
# Model Card for ShAIkespear/Phi-2_DPO_M3_Quantized_Alt
A **4-bit (NF4)**, **LoRA-finetuned**, **DPO-aligned** variant of **microsoft/phi-2** specialized for **multiple-choice question answering (MCQA)** in **STEM and general knowledge**.
This **Alt** checkpoint is the memory-efficient counterpart to the unquantized M3 Base Alt model: same SFT → DPO training, then **post-training 4-bit quantization** for fast, low-VRAM inference.
---
## Model Details
* **Developed by:** ShAIkespear team
* **Shared by:** ShAIkespear team
* **Model type:** Causal LM (Phi-2) with LoRA adapters; DPO-aligned; **4-bit NF4** quantized
* **Languages:** English
* **License:** MIT
* **Finetuned from:** microsoft/phi-2
### Model Sources
* **Repository:** [2.8B-Phi-2-LLM-QA](https://github.com/EricSaikali/2.8B-Phi-2-LLM-QA)
* **Report:** *“ShAIkespear – How to replace TAs: A comprehensive study on letting LLMs answer your questions”*
---
## Uses
### Direct Use
* MCQA inference for STEM & general knowledge (MMLU/ScienceQA style).
* Educational assistants and lightweight evaluation tools on **low-VRAM GPUs**.
### Out-of-Scope Use
* Safety-critical domains (medical/legal/financial) without human oversight.
* Long-form creative writing or tasks far from MCQA.
* Any misuse involving exam integrity or confidential assessments.
---
## Bias, Risks, and Limitations
* **Quantization trade-offs:** Small accuracy drop vs. full-precision; bigger memory savings than 8-bit.
* **STEM difficulty:** Multi-step reasoning can remain challenging.
* **Alignment bias:** DPO style preferences may influence verbosity/format.
### Recommendations
* Use the structured prompt format:
```
### Question ...
### Explanation ...
### Answer:
```
* Keep a human in the loop for teaching/grading.
* Prefer the **M3 Base Alt** (full precision) for further fine-tuning; use this **4-bit Alt** for deployment.
---
## How to Get Started
```python
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
model_id = "ShAIkespear/Phi-2_DPO_M3_Quantized_Alt"
bnb_cfg = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True, # often improves stability
bnb_4bit_compute_dtype="bfloat16" # or "float16" depending on your GPU
)
tok = AutoTokenizer.from_pretrained(model_id, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(
model_id, device_map="auto", quantization_config=bnb_cfg
)
prompt = "### Question: Which planet is known as the Red Planet?\n### Explanation: Identify the planet with the reddish appearance.\n### Answer:"
inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=15)
print(tok.decode(out[0], skip_special_tokens=True))
```
---
## Training Details
### Data (SFT → DPO)
* **SFT:** Mixed MCQA (MathQA, OpenBookQA, ScienceQA, TAL-SCQ5K) + EPFL MCQA; unified schema; ≤512 tokens; per-dataset caps.
* **DPO:** EPFL preference pairs + public preference data (chosen vs. rejected responses).
### Procedure & Hyperparameters
* **Pipeline:** SFT → DPO → **4-bit (NF4) quantization**.
* **LoRA:** rank=16, α=16, dropout=0.05.
* **Batch sizes:** 4 (SFT), 1 (DPO).
* **LR:** 1e-5 (public), 1e-4 (EPFL); cosine schedule w/ warmup.
* **Frameworks:** HF Transformers, TRL, PEFT (LoRA), bitsandbytes.
---
## Evaluation Summary
* **Configuration:** Balanced-then-DPO (**M3 Alt**).
* **Efficiency:** Fits comfortably on mid-range GPUs thanks to **4-bit** weights; faster/lighter than 8-bit with a modest accuracy trade-off vs. full precision.
* **Use case:** Best when **VRAM is tight** and you want DPO-aligned behavior with structured MCQA prompts.
---
## Technical Specifications
* **Architecture:** Phi-2 (~2.78B params), decoder-only transformer.
* **Objective:** SFT next-token prediction + DPO preference alignment.
* **Quantization:** **4-bit NF4** (bitsandbytes) with optional double quantization; compute in bf16/fp16.
* **Precision:** Quantized 4-bit runtime.
---
## Glossary
* **MCQA:** Multiple-Choice Question Answering
* **SFT:** Supervised Finetuning
* **DPO:** Direct Preference Optimization
* **LoRA:** Low-Rank Adaptation
* **NF4:** NormalFloat-4 quantization format (bnb) for 4-bit weight quantization |