---
library_name: transformers
license: mit
language:
- en
base_model:
- microsoft/phi-2
---

# Model Card for ShAIkespear/Phi-2_DPO_M3_Quantized

A **quantized (8-bit)**, **LoRA-finetuned** variant of **microsoft/phi-2** specialized for **multiple-choice question answering (MCQA)**, particularly in **STEM and general knowledge** domains.
This model represents the final **Direct Preference Optimization (DPO)** stage of the *ShAIkespear* project, fine-tuned on both public MCQA datasets and EPFL preference-annotated data, then quantized to 8-bit for efficient inference and deployment.

---

## Model Details

* **Developed by:** ShAIkespear team
* **Shared by:** ShAIkespear team
* **Model type:** Causal LM (Phi-2) with LoRA adapters; DPO-aligned and 8-bit quantized
* **Languages:** English
* **License:** MIT
* **Finetuned from:** microsoft/phi-2

### Model Sources

* **Repository:** [2.8B-Phi-2-LLM-QA](https://github.com/EricSaikali/2.8B-Phi-2-LLM-QA)
* **Report:** *“ShAIkespear – How to replace TAs: A comprehensive study on letting LLMs answer your questions”*

---

## Uses

### Direct Use

* Lightweight, low-memory MCQA reasoning for STEM and general knowledge domains.
* Educational tutoring or automated evaluation assistants following structured prompts.
* Deployment on GPUs with limited VRAM (8-bit quantization reduces memory from ~11 GB → ~3 GB).

### Out-of-Scope Use

* Critical decision-making (medical, legal, financial).
* Long-form reasoning or open-ended creative writing.
* Any application violating academic integrity or confidentiality of test materials.

---

## Bias, Risks, and Limitations

* **Quantization trade-off:** Slight loss in accuracy compared to full-precision base model.
* **STEM reasoning:** Difficult multi-step math/science questions may still yield near-random performance (~25 % accuracy).
* **Alignment drift:** DPO may slightly overfit stylistic preferences or verbosity.

### Recommendations

* Use structured prompts (`### Question → ### Explanation → ### Answer`) for best results.
* Include human oversight for evaluation or teaching uses.
* Avoid deployment where model-generated answers have direct consequences.

---

## How to Get Started

```python
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

bnb_cfg = BitsAndBytesConfig(load_in_8bit=True)
tok = AutoTokenizer.from_pretrained("ShAIkespear/Phi-2_DPO_M3_Quantized", use_fast=True)
model = AutoModelForCausalLM.from_pretrained(
    "ShAIkespear/Phi-2_DPO_M3_Quantized", device_map="auto", quantization_config=bnb_cfg
)

prompt = "### Question: What planet is known as the Red Planet?\n### Explanation: Identify the planet with a reddish appearance.\n### Answer:"
inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=15)
print(tok.decode(out[0], skip_special_tokens=True))
```

---

## Training Details

### Training Data

* **SFT stage:** Mixed MCQA sets — MathQA, OpenBookQA, ScienceQA, TAL-SCQ5K, and EPFL-curated questions.
* **DPO stage:** Human preference pairs (EPFL exams + HelpSteer-style pairs).
* **Preprocessing:** Filtered to ≤512 tokens, unified MCQA schema.
* **Split:** 50 % train, 25 % overfit test, 10 % comparison, 15 % quantization validation.

### Training Procedure

* **Pipeline:** SFT → DPO → 8-bit quantization.
* **LoRA:** rank = 16, α = 16, dropout = 0.05.
* **Batch size:** 4 (SFT), 1 (DPO).
* **Learning rates:** 1e-5 (public), 1e-4 (EPFL).
* **Scheduler:** Cosine with warmup.
* **Frameworks:** Hugging Face Transformers + TRL + PEFT + BitsAndBytes.

---

## Evaluation Summary

* **Configuration:** “Balanced-then-DPO” (M3) achieved best overall performance.
* **Accuracy:** ≈ 0.61 on MMLU (balanced set); STEM tasks lower (~0.25).
* **Memory:** Reduced to ~3 GB with minor quality loss.
* **Outcome:** Best trade-off between efficiency and alignment across ShAIkespear models.

---

## Technical Specifications

* **Architecture:** Phi-2 (2.78 B parameters), decoder-only transformer.
* **Objective:** SFT next-token prediction + DPO preference alignment.
* **Quantization:** Post-training 8-bit (BitsAndBytes).
* **Precision:** 8-bit integer with dynamic quantization layers.
* **Software:** Hugging Face Transformers, TRL, PEFT, BitsAndBytes.

---

## Glossary

* **MCQA:** Multiple-Choice Question Answering
* **SFT:** Supervised Finetuning
* **DPO:** Direct Preference Optimization
* **LoRA:** Low-Rank Adaptation for efficient fine-tuning
* **Quantization:** Reducing model precision for faster, memory-efficient inference