---
library_name: transformers
license: mit
language:
- en
base_model:
- microsoft/phi-2
---


# Model Card for ShAIkespear/Phi-2_DPO_M3_Quantized_Alt

A **4-bit (NF4)**, **LoRA-finetuned**, **DPO-aligned** variant of **microsoft/phi-2** specialized for **multiple-choice question answering (MCQA)** in **STEM and general knowledge**.
This **Alt** checkpoint is the memory-efficient counterpart to the unquantized M3 Base Alt model: same SFT → DPO training, then **post-training 4-bit quantization** for fast, low-VRAM inference.

---

## Model Details

* **Developed by:** ShAIkespear team
* **Shared by:** ShAIkespear team
* **Model type:** Causal LM (Phi-2) with LoRA adapters; DPO-aligned; **4-bit NF4** quantized
* **Languages:** English
* **License:** MIT
* **Finetuned from:** microsoft/phi-2

### Model Sources

* **Repository:** [2.8B-Phi-2-LLM-QA](https://github.com/EricSaikali/2.8B-Phi-2-LLM-QA)
* **Report:** *“ShAIkespear – How to replace TAs: A comprehensive study on letting LLMs answer your questions”*

---

## Uses

### Direct Use

* MCQA inference for STEM & general knowledge (MMLU/ScienceQA style).
* Educational assistants and lightweight evaluation tools on **low-VRAM GPUs**.

### Out-of-Scope Use

* Safety-critical domains (medical/legal/financial) without human oversight.
* Long-form creative writing or tasks far from MCQA.
* Any misuse involving exam integrity or confidential assessments.

---

## Bias, Risks, and Limitations

* **Quantization trade-offs:** Small accuracy drop vs. full-precision; bigger memory savings than 8-bit.
* **STEM difficulty:** Multi-step reasoning can remain challenging.
* **Alignment bias:** DPO style preferences may influence verbosity/format.

### Recommendations

* Use the structured prompt format:

  ```
  ### Question ...
  ### Explanation ...
  ### Answer:
  ```
* Keep a human in the loop for teaching/grading.
* Prefer the **M3 Base Alt** (full precision) for further fine-tuning; use this **4-bit Alt** for deployment.

---

## How to Get Started

```python
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

model_id = "ShAIkespear/Phi-2_DPO_M3_Quantized_Alt"

bnb_cfg = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,   # often improves stability
    bnb_4bit_compute_dtype="bfloat16" # or "float16" depending on your GPU
)

tok = AutoTokenizer.from_pretrained(model_id, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id, device_map="auto", quantization_config=bnb_cfg
)

prompt = "### Question: Which planet is known as the Red Planet?\n### Explanation: Identify the planet with the reddish appearance.\n### Answer:"
inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=15)
print(tok.decode(out[0], skip_special_tokens=True))
```

---

## Training Details

### Data (SFT → DPO)

* **SFT:** Mixed MCQA (MathQA, OpenBookQA, ScienceQA, TAL-SCQ5K) + EPFL MCQA; unified schema; ≤512 tokens; per-dataset caps.
* **DPO:** EPFL preference pairs + public preference data (chosen vs. rejected responses).

### Procedure & Hyperparameters

* **Pipeline:** SFT → DPO → **4-bit (NF4) quantization**.
* **LoRA:** rank=16, α=16, dropout=0.05.
* **Batch sizes:** 4 (SFT), 1 (DPO).
* **LR:** 1e-5 (public), 1e-4 (EPFL); cosine schedule w/ warmup.
* **Frameworks:** HF Transformers, TRL, PEFT (LoRA), bitsandbytes.

---

## Evaluation Summary

* **Configuration:** Balanced-then-DPO (**M3 Alt**).
* **Efficiency:** Fits comfortably on mid-range GPUs thanks to **4-bit** weights; faster/lighter than 8-bit with a modest accuracy trade-off vs. full precision.
* **Use case:** Best when **VRAM is tight** and you want DPO-aligned behavior with structured MCQA prompts.

---

## Technical Specifications

* **Architecture:** Phi-2 (~2.78B params), decoder-only transformer.
* **Objective:** SFT next-token prediction + DPO preference alignment.
* **Quantization:** **4-bit NF4** (bitsandbytes) with optional double quantization; compute in bf16/fp16.
* **Precision:** Quantized 4-bit runtime.

---

## Glossary

* **MCQA:** Multiple-Choice Question Answering
* **SFT:** Supervised Finetuning
* **DPO:** Direct Preference Optimization
* **LoRA:** Low-Rank Adaptation
* **NF4:** NormalFloat-4 quantization format (bnb) for 4-bit weight quantization