Model Card for ShAIkespear/Phi-2_DPO_M3_Quantized

A quantized (8-bit), LoRA-finetuned variant of microsoft/phi-2 specialized for multiple-choice question answering (MCQA), particularly in STEM and general knowledge domains. This model represents the final Direct Preference Optimization (DPO) stage of the ShAIkespear project, fine-tuned on both public MCQA datasets and EPFL preference-annotated data, then quantized to 8-bit for efficient inference and deployment.


Model Details

  • Developed by: ShAIkespear team
  • Shared by: ShAIkespear team
  • Model type: Causal LM (Phi-2) with LoRA adapters; DPO-aligned and 8-bit quantized
  • Languages: English
  • License: MIT
  • Finetuned from: microsoft/phi-2

Model Sources

  • Repository: 2.8B-Phi-2-LLM-QA
  • Report: “ShAIkespear – How to replace TAs: A comprehensive study on letting LLMs answer your questions”

Uses

Direct Use

  • Lightweight, low-memory MCQA reasoning for STEM and general knowledge domains.
  • Educational tutoring or automated evaluation assistants following structured prompts.
  • Deployment on GPUs with limited VRAM (8-bit quantization reduces memory from ~11 GB → ~3 GB).

Out-of-Scope Use

  • Critical decision-making (medical, legal, financial).
  • Long-form reasoning or open-ended creative writing.
  • Any application violating academic integrity or confidentiality of test materials.

Bias, Risks, and Limitations

  • Quantization trade-off: Slight loss in accuracy compared to full-precision base model.
  • STEM reasoning: Difficult multi-step math/science questions may still yield near-random performance (~25 % accuracy).
  • Alignment drift: DPO may slightly overfit stylistic preferences or verbosity.

Recommendations

  • Use structured prompts (### Question → ### Explanation → ### Answer) for best results.
  • Include human oversight for evaluation or teaching uses.
  • Avoid deployment where model-generated answers have direct consequences.

How to Get Started

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

bnb_cfg = BitsAndBytesConfig(load_in_8bit=True)
tok = AutoTokenizer.from_pretrained("ShAIkespear/Phi-2_DPO_M3_Quantized", use_fast=True)
model = AutoModelForCausalLM.from_pretrained(
    "ShAIkespear/Phi-2_DPO_M3_Quantized", device_map="auto", quantization_config=bnb_cfg
)

prompt = "### Question: What planet is known as the Red Planet?\n### Explanation: Identify the planet with a reddish appearance.\n### Answer:"
inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=15)
print(tok.decode(out[0], skip_special_tokens=True))

Training Details

Training Data

  • SFT stage: Mixed MCQA sets — MathQA, OpenBookQA, ScienceQA, TAL-SCQ5K, and EPFL-curated questions.
  • DPO stage: Human preference pairs (EPFL exams + HelpSteer-style pairs).
  • Preprocessing: Filtered to ≤512 tokens, unified MCQA schema.
  • Split: 50 % train, 25 % overfit test, 10 % comparison, 15 % quantization validation.

Training Procedure

  • Pipeline: SFT → DPO → 8-bit quantization.
  • LoRA: rank = 16, α = 16, dropout = 0.05.
  • Batch size: 4 (SFT), 1 (DPO).
  • Learning rates: 1e-5 (public), 1e-4 (EPFL).
  • Scheduler: Cosine with warmup.
  • Frameworks: Hugging Face Transformers + TRL + PEFT + BitsAndBytes.

Evaluation Summary

  • Configuration: “Balanced-then-DPO” (M3) achieved best overall performance.
  • Accuracy: ≈ 0.61 on MMLU (balanced set); STEM tasks lower (~0.25).
  • Memory: Reduced to ~3 GB with minor quality loss.
  • Outcome: Best trade-off between efficiency and alignment across ShAIkespear models.

Technical Specifications

  • Architecture: Phi-2 (2.78 B parameters), decoder-only transformer.
  • Objective: SFT next-token prediction + DPO preference alignment.
  • Quantization: Post-training 8-bit (BitsAndBytes).
  • Precision: 8-bit integer with dynamic quantization layers.
  • Software: Hugging Face Transformers, TRL, PEFT, BitsAndBytes.

Glossary

  • MCQA: Multiple-Choice Question Answering
  • SFT: Supervised Finetuning
  • DPO: Direct Preference Optimization
  • LoRA: Low-Rank Adaptation for efficient fine-tuning
  • Quantization: Reducing model precision for faster, memory-efficient inference
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ShAIkespear/Phi-2_DPO_M3_Quantized

Base model

microsoft/phi-2
Finetuned
(397)
this model

Collection including ShAIkespear/Phi-2_DPO_M3_Quantized