File size: 4,388 Bytes

f5e8d60
 
c655975
 
 
 
 
f5e8d60
 
 
c655975
f5e8d60
c655975
 
f5e8d60
c655975
f5e8d60
 
 
c655975
 
 
 
 
 
f5e8d60
c655975
f5e8d60
c655975
 
f5e8d60
c655975
f5e8d60
 
 
 
 
c655975
 
f5e8d60
 
 
c655975
 
 
f5e8d60
c655975
f5e8d60
 
 
c655975
 
 
f5e8d60
 
 
c655975
f5e8d60
c655975
 
 
 
 
 
 
f5e8d60
c655975
f5e8d60
c655975
f5e8d60
c655975
 
f5e8d60
c655975
f5e8d60
c655975
 
 
 
 
 
f5e8d60
c655975
 
 
 
f5e8d60
c655975
 
 
 
 
f5e8d60
c655975
f5e8d60
c655975
f5e8d60
c655975
f5e8d60
c655975
 
f5e8d60
c655975
f5e8d60
c655975
 
 
 
 
f5e8d60
c655975
f5e8d60
c655975
f5e8d60
c655975
 
 
f5e8d60
c655975
f5e8d60
c655975
f5e8d60
c655975
 
 
 
f5e8d60
c655975
f5e8d60
c655975
f5e8d60
c655975

---
library_name: transformers
license: mit
language:
- en
base_model:
- microsoft/phi-2
---


# Model Card for ShAIkespear/Phi-2_DPO_M3_Quantized_Alt

A **4-bit (NF4)**, **LoRA-finetuned**, **DPO-aligned** variant of **microsoft/phi-2** specialized for **multiple-choice question answering (MCQA)** in **STEM and general knowledge**.
This **Alt** checkpoint is the memory-efficient counterpart to the unquantized M3 Base Alt model: same SFT → DPO training, then **post-training 4-bit quantization** for fast, low-VRAM inference.

---

## Model Details

* **Developed by:** ShAIkespear team
* **Shared by:** ShAIkespear team
* **Model type:** Causal LM (Phi-2) with LoRA adapters; DPO-aligned; **4-bit NF4** quantized
* **Languages:** English
* **License:** MIT
* **Finetuned from:** microsoft/phi-2

### Model Sources

* **Repository:** [2.8B-Phi-2-LLM-QA](https://github.com/EricSaikali/2.8B-Phi-2-LLM-QA)
* **Report:** *“ShAIkespear – How to replace TAs: A comprehensive study on letting LLMs answer your questions”*

---

## Uses

### Direct Use

* MCQA inference for STEM & general knowledge (MMLU/ScienceQA style).
* Educational assistants and lightweight evaluation tools on **low-VRAM GPUs**.

### Out-of-Scope Use

* Safety-critical domains (medical/legal/financial) without human oversight.
* Long-form creative writing or tasks far from MCQA.
* Any misuse involving exam integrity or confidential assessments.

---

## Bias, Risks, and Limitations

* **Quantization trade-offs:** Small accuracy drop vs. full-precision; bigger memory savings than 8-bit.
* **STEM difficulty:** Multi-step reasoning can remain challenging.
* **Alignment bias:** DPO style preferences may influence verbosity/format.

### Recommendations

* Use the structured prompt format:

  ```
  ### Question ...
  ### Explanation ...
  ### Answer:
  ```
* Keep a human in the loop for teaching/grading.
* Prefer the **M3 Base Alt** (full precision) for further fine-tuning; use this **4-bit Alt** for deployment.

---

## How to Get Started

```python
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

model_id = "ShAIkespear/Phi-2_DPO_M3_Quantized_Alt"

bnb_cfg = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,   # often improves stability
    bnb_4bit_compute_dtype="bfloat16" # or "float16" depending on your GPU
)

tok = AutoTokenizer.from_pretrained(model_id, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id, device_map="auto", quantization_config=bnb_cfg
)

prompt = "### Question: Which planet is known as the Red Planet?\n### Explanation: Identify the planet with the reddish appearance.\n### Answer:"
inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=15)
print(tok.decode(out[0], skip_special_tokens=True))
```

---

## Training Details

### Data (SFT → DPO)

* **SFT:** Mixed MCQA (MathQA, OpenBookQA, ScienceQA, TAL-SCQ5K) + EPFL MCQA; unified schema; ≤512 tokens; per-dataset caps.
* **DPO:** EPFL preference pairs + public preference data (chosen vs. rejected responses).

### Procedure & Hyperparameters

* **Pipeline:** SFT → DPO → **4-bit (NF4) quantization**.
* **LoRA:** rank=16, α=16, dropout=0.05.
* **Batch sizes:** 4 (SFT), 1 (DPO).
* **LR:** 1e-5 (public), 1e-4 (EPFL); cosine schedule w/ warmup.
* **Frameworks:** HF Transformers, TRL, PEFT (LoRA), bitsandbytes.

---

## Evaluation Summary

* **Configuration:** Balanced-then-DPO (**M3 Alt**).
* **Efficiency:** Fits comfortably on mid-range GPUs thanks to **4-bit** weights; faster/lighter than 8-bit with a modest accuracy trade-off vs. full precision.
* **Use case:** Best when **VRAM is tight** and you want DPO-aligned behavior with structured MCQA prompts.

---

## Technical Specifications

* **Architecture:** Phi-2 (~2.78B params), decoder-only transformer.
* **Objective:** SFT next-token prediction + DPO preference alignment.
* **Quantization:** **4-bit NF4** (bitsandbytes) with optional double quantization; compute in bf16/fp16.
* **Precision:** Quantized 4-bit runtime.

---

## Glossary

* **MCQA:** Multiple-Choice Question Answering
* **SFT:** Supervised Finetuning
* **DPO:** Direct Preference Optimization
* **LoRA:** Low-Rank Adaptation
* **NF4:** NormalFloat-4 quantization format (bnb) for 4-bit weight quantization