chore: update model card

c655975 verified 3 months ago

4.39 kB

	---
	library_name: transformers
	license: mit
	language:
	- en
	base_model:
	- microsoft/phi-2
	---


	# Model Card for ShAIkespear/Phi-2_DPO_M3_Quantized_Alt

	A 4-bit (NF4), LoRA-finetuned, DPO-aligned variant of microsoft/phi-2 specialized for multiple-choice question answering (MCQA) in STEM and general knowledge.
	This Alt checkpoint is the memory-efficient counterpart to the unquantized M3 Base Alt model: same SFT → DPO training, then post-training 4-bit quantization for fast, low-VRAM inference.

	---

	## Model Details

	* Developed by: ShAIkespear team
	* Shared by: ShAIkespear team
	* Model type: Causal LM (Phi-2) with LoRA adapters; DPO-aligned; 4-bit NF4 quantized
	* Languages: English
	* License: MIT
	* Finetuned from: microsoft/phi-2

	### Model Sources

	* Repository: [2.8B-Phi-2-LLM-QA](https://github.com/EricSaikali/2.8B-Phi-2-LLM-QA)
	* Report: “ShAIkespear – How to replace TAs: A comprehensive study on letting LLMs answer your questions”

	---

	## Uses

	### Direct Use

	* MCQA inference for STEM & general knowledge (MMLU/ScienceQA style).
	* Educational assistants and lightweight evaluation tools on low-VRAM GPUs.

	### Out-of-Scope Use

	* Safety-critical domains (medical/legal/financial) without human oversight.
	* Long-form creative writing or tasks far from MCQA.
	* Any misuse involving exam integrity or confidential assessments.

	---

	## Bias, Risks, and Limitations

	* Quantization trade-offs: Small accuracy drop vs. full-precision; bigger memory savings than 8-bit.
	* STEM difficulty: Multi-step reasoning can remain challenging.
	* Alignment bias: DPO style preferences may influence verbosity/format.

	### Recommendations

	* Use the structured prompt format:

	```
	### Question ...
	### Explanation ...
	### Answer:
	```
	* Keep a human in the loop for teaching/grading.
	* Prefer the M3 Base Alt (full precision) for further fine-tuning; use this 4-bit Alt for deployment.

	---

	## How to Get Started

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

	model_id = "ShAIkespear/Phi-2_DPO_M3_Quantized_Alt"

	bnb_cfg = BitsAndBytesConfig(
	load_in_4bit=True,
	bnb_4bit_quant_type="nf4",
	bnb_4bit_use_double_quant=True, # often improves stability
	bnb_4bit_compute_dtype="bfloat16" # or "float16" depending on your GPU
	)

	tok = AutoTokenizer.from_pretrained(model_id, use_fast=True)
	model = AutoModelForCausalLM.from_pretrained(
	model_id, device_map="auto", quantization_config=bnb_cfg
	)

	prompt = "### Question: Which planet is known as the Red Planet?\n### Explanation: Identify the planet with the reddish appearance.\n### Answer:"
	inputs = tok(prompt, return_tensors="pt").to(model.device)
	out = model.generate(**inputs, max_new_tokens=15)
	print(tok.decode(out[0], skip_special_tokens=True))
	```

	---

	## Training Details

	### Data (SFT → DPO)

	* SFT: Mixed MCQA (MathQA, OpenBookQA, ScienceQA, TAL-SCQ5K) + EPFL MCQA; unified schema; ≤512 tokens; per-dataset caps.
	* DPO: EPFL preference pairs + public preference data (chosen vs. rejected responses).

	### Procedure & Hyperparameters

	* Pipeline: SFT → DPO → 4-bit (NF4) quantization.
	* LoRA: rank=16, α=16, dropout=0.05.
	* Batch sizes: 4 (SFT), 1 (DPO).
	* LR: 1e-5 (public), 1e-4 (EPFL); cosine schedule w/ warmup.
	* Frameworks: HF Transformers, TRL, PEFT (LoRA), bitsandbytes.

	---

	## Evaluation Summary

	* Configuration: Balanced-then-DPO (M3 Alt).
	* Efficiency: Fits comfortably on mid-range GPUs thanks to 4-bit weights; faster/lighter than 8-bit with a modest accuracy trade-off vs. full precision.
	* Use case: Best when VRAM is tight and you want DPO-aligned behavior with structured MCQA prompts.

	---

	## Technical Specifications

	* Architecture: Phi-2 (~2.78B params), decoder-only transformer.
	* Objective: SFT next-token prediction + DPO preference alignment.
	* Quantization: 4-bit NF4 (bitsandbytes) with optional double quantization; compute in bf16/fp16.
	* Precision: Quantized 4-bit runtime.

	---

	## Glossary

	* MCQA: Multiple-Choice Question Answering
	* SFT: Supervised Finetuning
	* DPO: Direct Preference Optimization
	* LoRA: Low-Rank Adaptation
	* NF4: NormalFloat-4 quantization format (bnb) for 4-bit weight quantization