Model Card for ShAIkespear/Phi-2_DPO_M3_Quantized

A quantized (8-bit), LoRA-finetuned variant of microsoft/phi-2 specialized for multiple-choice question answering (MCQA), particularly in STEM and general knowledge domains. This model represents the final Direct Preference Optimization (DPO) stage of the ShAIkespear project, fine-tuned on both public MCQA datasets and EPFL preference-annotated data, then quantized to 8-bit for efficient inference and deployment.

Model Details

Developed by: ShAIkespear team
Shared by: ShAIkespear team
Model type: Causal LM (Phi-2) with LoRA adapters; DPO-aligned and 8-bit quantized
Languages: English
License: MIT
Finetuned from: microsoft/phi-2

Model Sources

Repository: 2.8B-Phi-2-LLM-QA
Report: “ShAIkespear – How to replace TAs: A comprehensive study on letting LLMs answer your questions”

Uses

Direct Use

Lightweight, low-memory MCQA reasoning for STEM and general knowledge domains.
Educational tutoring or automated evaluation assistants following structured prompts.
Deployment on GPUs with limited VRAM (8-bit quantization reduces memory from ~11 GB → ~3 GB).

Out-of-Scope Use

Critical decision-making (medical, legal, financial).
Long-form reasoning or open-ended creative writing.
Any application violating academic integrity or confidentiality of test materials.

Bias, Risks, and Limitations

Quantization trade-off: Slight loss in accuracy compared to full-precision base model.
STEM reasoning: Difficult multi-step math/science questions may still yield near-random performance (~25 % accuracy).
Alignment drift: DPO may slightly overfit stylistic preferences or verbosity.

Recommendations

Use structured prompts (### Question → ### Explanation → ### Answer) for best results.
Include human oversight for evaluation or teaching uses.
Avoid deployment where model-generated answers have direct consequences.

How to Get Started

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

bnb_cfg = BitsAndBytesConfig(load_in_8bit=True)
tok = AutoTokenizer.from_pretrained("ShAIkespear/Phi-2_DPO_M3_Quantized", use_fast=True)
model = AutoModelForCausalLM.from_pretrained(
    "ShAIkespear/Phi-2_DPO_M3_Quantized", device_map="auto", quantization_config=bnb_cfg
)

prompt = "### Question: What planet is known as the Red Planet?\n### Explanation: Identify the planet with a reddish appearance.\n### Answer:"
inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=15)
print(tok.decode(out[0], skip_special_tokens=True))

Training Details

Training Data

SFT stage: Mixed MCQA sets — MathQA, OpenBookQA, ScienceQA, TAL-SCQ5K, and EPFL-curated questions.
DPO stage: Human preference pairs (EPFL exams + HelpSteer-style pairs).
Preprocessing: Filtered to ≤512 tokens, unified MCQA schema.
Split: 50 % train, 25 % overfit test, 10 % comparison, 15 % quantization validation.

Training Procedure

Pipeline: SFT → DPO → 8-bit quantization.
LoRA: rank = 16, α = 16, dropout = 0.05.
Batch size: 4 (SFT), 1 (DPO).
Learning rates: 1e-5 (public), 1e-4 (EPFL).
Scheduler: Cosine with warmup.
Frameworks: Hugging Face Transformers + TRL + PEFT + BitsAndBytes.

Evaluation Summary

Configuration: “Balanced-then-DPO” (M3) achieved best overall performance.
Accuracy: ≈ 0.61 on MMLU (balanced set); STEM tasks lower (~0.25).
Memory: Reduced to ~3 GB with minor quality loss.
Outcome: Best trade-off between efficiency and alignment across ShAIkespear models.

Technical Specifications

Architecture: Phi-2 (2.78 B parameters), decoder-only transformer.
Objective: SFT next-token prediction + DPO preference alignment.
Quantization: Post-training 8-bit (BitsAndBytes).
Precision: 8-bit integer with dynamic quantization layers.
Software: Hugging Face Transformers, TRL, PEFT, BitsAndBytes.

Glossary

MCQA: Multiple-Choice Question Answering
SFT: Supervised Finetuning
DPO: Direct Preference Optimization
LoRA: Low-Rank Adaptation for efficient fine-tuning
Quantization: Reducing model precision for faster, memory-efficient inference

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ShAIkespear/Phi-2_DPO_M3_Quantized

Base model

microsoft/phi-2

Finetuned

(409)

this model

Collection including ShAIkespear/Phi-2_DPO_M3_Quantized

Microsoft/phi-2 finetuned

Collection

Collection of finetuned models of Microsoft phi-2 for Q&A. • 7 items • Updated Nov 1, 2025