ESmike's picture
chore: update model card
c655975 verified
---
library_name: transformers
license: mit
language:
- en
base_model:
- microsoft/phi-2
---
# Model Card for ShAIkespear/Phi-2_DPO_M3_Quantized_Alt
A **4-bit (NF4)**, **LoRA-finetuned**, **DPO-aligned** variant of **microsoft/phi-2** specialized for **multiple-choice question answering (MCQA)** in **STEM and general knowledge**.
This **Alt** checkpoint is the memory-efficient counterpart to the unquantized M3 Base Alt model: same SFT → DPO training, then **post-training 4-bit quantization** for fast, low-VRAM inference.
---
## Model Details
* **Developed by:** ShAIkespear team
* **Shared by:** ShAIkespear team
* **Model type:** Causal LM (Phi-2) with LoRA adapters; DPO-aligned; **4-bit NF4** quantized
* **Languages:** English
* **License:** MIT
* **Finetuned from:** microsoft/phi-2
### Model Sources
* **Repository:** [2.8B-Phi-2-LLM-QA](https://github.com/EricSaikali/2.8B-Phi-2-LLM-QA)
* **Report:** *“ShAIkespear – How to replace TAs: A comprehensive study on letting LLMs answer your questions”*
---
## Uses
### Direct Use
* MCQA inference for STEM & general knowledge (MMLU/ScienceQA style).
* Educational assistants and lightweight evaluation tools on **low-VRAM GPUs**.
### Out-of-Scope Use
* Safety-critical domains (medical/legal/financial) without human oversight.
* Long-form creative writing or tasks far from MCQA.
* Any misuse involving exam integrity or confidential assessments.
---
## Bias, Risks, and Limitations
* **Quantization trade-offs:** Small accuracy drop vs. full-precision; bigger memory savings than 8-bit.
* **STEM difficulty:** Multi-step reasoning can remain challenging.
* **Alignment bias:** DPO style preferences may influence verbosity/format.
### Recommendations
* Use the structured prompt format:
```
### Question ...
### Explanation ...
### Answer:
```
* Keep a human in the loop for teaching/grading.
* Prefer the **M3 Base Alt** (full precision) for further fine-tuning; use this **4-bit Alt** for deployment.
---
## How to Get Started
```python
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
model_id = "ShAIkespear/Phi-2_DPO_M3_Quantized_Alt"
bnb_cfg = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True, # often improves stability
bnb_4bit_compute_dtype="bfloat16" # or "float16" depending on your GPU
)
tok = AutoTokenizer.from_pretrained(model_id, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(
model_id, device_map="auto", quantization_config=bnb_cfg
)
prompt = "### Question: Which planet is known as the Red Planet?\n### Explanation: Identify the planet with the reddish appearance.\n### Answer:"
inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=15)
print(tok.decode(out[0], skip_special_tokens=True))
```
---
## Training Details
### Data (SFT → DPO)
* **SFT:** Mixed MCQA (MathQA, OpenBookQA, ScienceQA, TAL-SCQ5K) + EPFL MCQA; unified schema; ≤512 tokens; per-dataset caps.
* **DPO:** EPFL preference pairs + public preference data (chosen vs. rejected responses).
### Procedure & Hyperparameters
* **Pipeline:** SFT → DPO → **4-bit (NF4) quantization**.
* **LoRA:** rank=16, α=16, dropout=0.05.
* **Batch sizes:** 4 (SFT), 1 (DPO).
* **LR:** 1e-5 (public), 1e-4 (EPFL); cosine schedule w/ warmup.
* **Frameworks:** HF Transformers, TRL, PEFT (LoRA), bitsandbytes.
---
## Evaluation Summary
* **Configuration:** Balanced-then-DPO (**M3 Alt**).
* **Efficiency:** Fits comfortably on mid-range GPUs thanks to **4-bit** weights; faster/lighter than 8-bit with a modest accuracy trade-off vs. full precision.
* **Use case:** Best when **VRAM is tight** and you want DPO-aligned behavior with structured MCQA prompts.
---
## Technical Specifications
* **Architecture:** Phi-2 (~2.78B params), decoder-only transformer.
* **Objective:** SFT next-token prediction + DPO preference alignment.
* **Quantization:** **4-bit NF4** (bitsandbytes) with optional double quantization; compute in bf16/fp16.
* **Precision:** Quantized 4-bit runtime.
---
## Glossary
* **MCQA:** Multiple-Choice Question Answering
* **SFT:** Supervised Finetuning
* **DPO:** Direct Preference Optimization
* **LoRA:** Low-Rank Adaptation
* **NF4:** NormalFloat-4 quantization format (bnb) for 4-bit weight quantization