|
|
--- |
|
|
library_name: transformers |
|
|
license: mit |
|
|
language: |
|
|
- en |
|
|
base_model: |
|
|
- microsoft/phi-2 |
|
|
--- |
|
|
|
|
|
|
|
|
# Model Card for ShAIkespear/Phi-2_DPO_M3_Quantized_Alt |
|
|
|
|
|
A **4-bit (NF4)**, **LoRA-finetuned**, **DPO-aligned** variant of **microsoft/phi-2** specialized for **multiple-choice question answering (MCQA)** in **STEM and general knowledge**. |
|
|
This **Alt** checkpoint is the memory-efficient counterpart to the unquantized M3 Base Alt model: same SFT → DPO training, then **post-training 4-bit quantization** for fast, low-VRAM inference. |
|
|
|
|
|
--- |
|
|
|
|
|
## Model Details |
|
|
|
|
|
* **Developed by:** ShAIkespear team |
|
|
* **Shared by:** ShAIkespear team |
|
|
* **Model type:** Causal LM (Phi-2) with LoRA adapters; DPO-aligned; **4-bit NF4** quantized |
|
|
* **Languages:** English |
|
|
* **License:** MIT |
|
|
* **Finetuned from:** microsoft/phi-2 |
|
|
|
|
|
### Model Sources |
|
|
|
|
|
* **Repository:** [2.8B-Phi-2-LLM-QA](https://github.com/EricSaikali/2.8B-Phi-2-LLM-QA) |
|
|
* **Report:** *“ShAIkespear – How to replace TAs: A comprehensive study on letting LLMs answer your questions”* |
|
|
|
|
|
--- |
|
|
|
|
|
## Uses |
|
|
|
|
|
### Direct Use |
|
|
|
|
|
* MCQA inference for STEM & general knowledge (MMLU/ScienceQA style). |
|
|
* Educational assistants and lightweight evaluation tools on **low-VRAM GPUs**. |
|
|
|
|
|
### Out-of-Scope Use |
|
|
|
|
|
* Safety-critical domains (medical/legal/financial) without human oversight. |
|
|
* Long-form creative writing or tasks far from MCQA. |
|
|
* Any misuse involving exam integrity or confidential assessments. |
|
|
|
|
|
--- |
|
|
|
|
|
## Bias, Risks, and Limitations |
|
|
|
|
|
* **Quantization trade-offs:** Small accuracy drop vs. full-precision; bigger memory savings than 8-bit. |
|
|
* **STEM difficulty:** Multi-step reasoning can remain challenging. |
|
|
* **Alignment bias:** DPO style preferences may influence verbosity/format. |
|
|
|
|
|
### Recommendations |
|
|
|
|
|
* Use the structured prompt format: |
|
|
|
|
|
``` |
|
|
### Question ... |
|
|
### Explanation ... |
|
|
### Answer: |
|
|
``` |
|
|
* Keep a human in the loop for teaching/grading. |
|
|
* Prefer the **M3 Base Alt** (full precision) for further fine-tuning; use this **4-bit Alt** for deployment. |
|
|
|
|
|
--- |
|
|
|
|
|
## How to Get Started |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig |
|
|
|
|
|
model_id = "ShAIkespear/Phi-2_DPO_M3_Quantized_Alt" |
|
|
|
|
|
bnb_cfg = BitsAndBytesConfig( |
|
|
load_in_4bit=True, |
|
|
bnb_4bit_quant_type="nf4", |
|
|
bnb_4bit_use_double_quant=True, # often improves stability |
|
|
bnb_4bit_compute_dtype="bfloat16" # or "float16" depending on your GPU |
|
|
) |
|
|
|
|
|
tok = AutoTokenizer.from_pretrained(model_id, use_fast=True) |
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
model_id, device_map="auto", quantization_config=bnb_cfg |
|
|
) |
|
|
|
|
|
prompt = "### Question: Which planet is known as the Red Planet?\n### Explanation: Identify the planet with the reddish appearance.\n### Answer:" |
|
|
inputs = tok(prompt, return_tensors="pt").to(model.device) |
|
|
out = model.generate(**inputs, max_new_tokens=15) |
|
|
print(tok.decode(out[0], skip_special_tokens=True)) |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## Training Details |
|
|
|
|
|
### Data (SFT → DPO) |
|
|
|
|
|
* **SFT:** Mixed MCQA (MathQA, OpenBookQA, ScienceQA, TAL-SCQ5K) + EPFL MCQA; unified schema; ≤512 tokens; per-dataset caps. |
|
|
* **DPO:** EPFL preference pairs + public preference data (chosen vs. rejected responses). |
|
|
|
|
|
### Procedure & Hyperparameters |
|
|
|
|
|
* **Pipeline:** SFT → DPO → **4-bit (NF4) quantization**. |
|
|
* **LoRA:** rank=16, α=16, dropout=0.05. |
|
|
* **Batch sizes:** 4 (SFT), 1 (DPO). |
|
|
* **LR:** 1e-5 (public), 1e-4 (EPFL); cosine schedule w/ warmup. |
|
|
* **Frameworks:** HF Transformers, TRL, PEFT (LoRA), bitsandbytes. |
|
|
|
|
|
--- |
|
|
|
|
|
## Evaluation Summary |
|
|
|
|
|
* **Configuration:** Balanced-then-DPO (**M3 Alt**). |
|
|
* **Efficiency:** Fits comfortably on mid-range GPUs thanks to **4-bit** weights; faster/lighter than 8-bit with a modest accuracy trade-off vs. full precision. |
|
|
* **Use case:** Best when **VRAM is tight** and you want DPO-aligned behavior with structured MCQA prompts. |
|
|
|
|
|
--- |
|
|
|
|
|
## Technical Specifications |
|
|
|
|
|
* **Architecture:** Phi-2 (~2.78B params), decoder-only transformer. |
|
|
* **Objective:** SFT next-token prediction + DPO preference alignment. |
|
|
* **Quantization:** **4-bit NF4** (bitsandbytes) with optional double quantization; compute in bf16/fp16. |
|
|
* **Precision:** Quantized 4-bit runtime. |
|
|
|
|
|
--- |
|
|
|
|
|
## Glossary |
|
|
|
|
|
* **MCQA:** Multiple-Choice Question Answering |
|
|
* **SFT:** Supervised Finetuning |
|
|
* **DPO:** Direct Preference Optimization |
|
|
* **LoRA:** Low-Rank Adaptation |
|
|
* **NF4:** NormalFloat-4 quantization format (bnb) for 4-bit weight quantization |