chore: update readme

f5981fa verified 3 months ago

4.59 kB

	---
	library_name: transformers
	license: mit
	language:
	- en
	base_model:
	- microsoft/phi-2
	---

	# Model Card for ShAIkespear/Phi-2_DPO_M3_Quantized

	A quantized (8-bit), LoRA-finetuned variant of microsoft/phi-2 specialized for multiple-choice question answering (MCQA), particularly in STEM and general knowledge domains.
	This model represents the final Direct Preference Optimization (DPO) stage of the ShAIkespear project, fine-tuned on both public MCQA datasets and EPFL preference-annotated data, then quantized to 8-bit for efficient inference and deployment.

	---

	## Model Details

	* Developed by: ShAIkespear team
	* Shared by: ShAIkespear team
	* Model type: Causal LM (Phi-2) with LoRA adapters; DPO-aligned and 8-bit quantized
	* Languages: English
	* License: MIT
	* Finetuned from: microsoft/phi-2

	### Model Sources

	* Repository: [2.8B-Phi-2-LLM-QA](https://github.com/EricSaikali/2.8B-Phi-2-LLM-QA)
	* Report: “ShAIkespear – How to replace TAs: A comprehensive study on letting LLMs answer your questions”

	---

	## Uses

	### Direct Use

	* Lightweight, low-memory MCQA reasoning for STEM and general knowledge domains.
	* Educational tutoring or automated evaluation assistants following structured prompts.
	* Deployment on GPUs with limited VRAM (8-bit quantization reduces memory from ~11 GB → ~3 GB).

	### Out-of-Scope Use

	* Critical decision-making (medical, legal, financial).
	* Long-form reasoning or open-ended creative writing.
	* Any application violating academic integrity or confidentiality of test materials.

	---

	## Bias, Risks, and Limitations

	* Quantization trade-off: Slight loss in accuracy compared to full-precision base model.
	* STEM reasoning: Difficult multi-step math/science questions may still yield near-random performance (~25 % accuracy).
	* Alignment drift: DPO may slightly overfit stylistic preferences or verbosity.

	### Recommendations

	* Use structured prompts (`### Question → ### Explanation → ### Answer`) for best results.
	* Include human oversight for evaluation or teaching uses.
	* Avoid deployment where model-generated answers have direct consequences.

	---

	## How to Get Started

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

	bnb_cfg = BitsAndBytesConfig(load_in_8bit=True)
	tok = AutoTokenizer.from_pretrained("ShAIkespear/Phi-2_DPO_M3_Quantized", use_fast=True)
	model = AutoModelForCausalLM.from_pretrained(
	"ShAIkespear/Phi-2_DPO_M3_Quantized", device_map="auto", quantization_config=bnb_cfg
	)

	prompt = "### Question: What planet is known as the Red Planet?\n### Explanation: Identify the planet with a reddish appearance.\n### Answer:"
	inputs = tok(prompt, return_tensors="pt").to(model.device)
	out = model.generate(**inputs, max_new_tokens=15)
	print(tok.decode(out[0], skip_special_tokens=True))
	```

	---

	## Training Details

	### Training Data

	* SFT stage: Mixed MCQA sets — MathQA, OpenBookQA, ScienceQA, TAL-SCQ5K, and EPFL-curated questions.
	* DPO stage: Human preference pairs (EPFL exams + HelpSteer-style pairs).
	* Preprocessing: Filtered to ≤512 tokens, unified MCQA schema.
	* Split: 50 % train, 25 % overfit test, 10 % comparison, 15 % quantization validation.

	### Training Procedure

	* Pipeline: SFT → DPO → 8-bit quantization.
	* LoRA: rank = 16, α = 16, dropout = 0.05.
	* Batch size: 4 (SFT), 1 (DPO).
	* Learning rates: 1e-5 (public), 1e-4 (EPFL).
	* Scheduler: Cosine with warmup.
	* Frameworks: Hugging Face Transformers + TRL + PEFT + BitsAndBytes.

	---

	## Evaluation Summary

	* Configuration: “Balanced-then-DPO” (M3) achieved best overall performance.
	* Accuracy: ≈ 0.61 on MMLU (balanced set); STEM tasks lower (~0.25).
	* Memory: Reduced to ~3 GB with minor quality loss.
	* Outcome: Best trade-off between efficiency and alignment across ShAIkespear models.

	---

	## Technical Specifications

	* Architecture: Phi-2 (2.78 B parameters), decoder-only transformer.
	* Objective: SFT next-token prediction + DPO preference alignment.
	* Quantization: Post-training 8-bit (BitsAndBytes).
	* Precision: 8-bit integer with dynamic quantization layers.
	* Software: Hugging Face Transformers, TRL, PEFT, BitsAndBytes.

	---

	## Glossary

	* MCQA: Multiple-Choice Question Answering
	* SFT: Supervised Finetuning
	* DPO: Direct Preference Optimization
	* LoRA: Low-Rank Adaptation for efficient fine-tuning
	* Quantization: Reducing model precision for faster, memory-efficient inference