--- library_name: transformers tags: - qlora - quantization - 4bit - causal-lm - transformers - mcqa - dpo - multiple-choice - w4a16 - hf-trained --- # MNLP M3 - Quantized DPO + MCQA Model (W4A16, QLoRA) This model is a quantized and QLoRA-fine-tuned version of the base `albertfares/MNLP_SFT_DPO` model. It is trained on curated stabilization data for multiple-choice question answering (MCQA) using LoRA adapters over 4-bit weights and 16-bit activations (W4A16). It was developed as part of the CS-552 Multilingual NLP course at EPFL and is hosted for reproducible evaluation and downstream use. ## Model Details ### Model Description This model adapts the `MNLP_SFT_DPO` model to handle complex MCQA reasoning using QLoRA (4-bit weights, 16-bit activations). It was trained using the quantized dataset [`abdou-u/MNLP_M3_quantized_dataset`](https://huggingface.co/datasets/abdou-u/MNLP_M3_quantized_dataset) and aims to strike a strong balance between memory efficiency and downstream accuracy. - **Developed by:** Ahmed Abdelmalek - **Finetuned from model:** `albertfares/MNLP_SFT_DPO` - **Model type:** Causal Language Model (decoder-only, autoregressive) - **Language(s):** English - **License:** Apache 2.0 ### Model Sources - **Training Code:** Private GitHub Repository - **Datasets:** [`abdou-u/MNLP_M3_quantized_dataset`](https://huggingface.co/datasets/abdou-u/MNLP_M3_quantized_dataset) - **Base Model:** albertfares/MNLP_SFT_DPO ## Uses ### Direct Use This model can be directly used for answering multiple-choice questions (MCQA) in English with a short explanation output. ### Downstream Use Can be used in LLM pipelines requiring lightweight MCQA reasoning models with high accuracy and low VRAM cost. ### Out-of-Scope Use Not intended for generative open-ended long-form answers or other modalities beyond multiple-choice QA. ## Bias, Risks, and Limitations The model inherits biases from both the base DPO model and the MCQA dataset. It may underperform on non-English inputs or ambiguous multi-answer tasks. ### Recommendations Use as part of a controlled QA system with additional verification modules. Do not use in high-stakes decision-making without human oversight. ## How to Get Started with the Model ```python from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained("abdou-u/MNLP_M3_quantized_dpo_mcqa_model") tokenizer = AutoTokenizer.from_pretrained("abdou-u/MNLP_M3_quantized_dpo_mcqa_model") ``` ## Training Details ### Training Data This model was fine-tuned using the `abdou-u/MNLP_M3_quantized_dataset`, a mix of formatted MCQA questions from TheoremQA, AQuA, and synthetic examples with explanations. ### Training Procedure The model was fine-tuned using QLoRA with: - 4-bit NF4 quantization (W4A16) - `r=16`, `alpha=32`, and dropout=0.05 - 1–2 epochs on the quantized dataset #### Training Hyperparameters - **Precision:** FP16 with QLoRA (W4A16) - **Epochs:** 1–2 - **Batch size:** 8 (gradient accumulation: 4) - **LR:** 2e-5 ## Evaluation ### Testing Data The model was evaluated on a diverse set of MCQA tasks: - **MMLU** (16 subjects including Math, Physics, Bio, CS) - **NLP4Education** Tasks were tested under: - **Zero-shot settings** - **Few-shot settings** (2-shot context) ### Metrics - Accuracy (for multiple-choice selection) - Log-likelihood ranking (optional) ### Results - Strong zero-shot and few-shot MCQA performance on MMLU benchmarks - Robust to reasoning under minimal context ## Environmental Impact - **Hardware Type:** NVIDIA A100 80GB x2 - **Hours Used:** ~0.5–1h - **Cloud Provider:** EPFL RCP - **Region:** Switzerland - **Carbon Emitted:** Estimated < 0.5 kg CO2 ## Technical Specifications ### Model Architecture Quantized transformer decoder using QLoRA over the DPO-finetuned SFT model. ### Compute Infrastructure - **Hardware:** 2x A100 80GB - **Software:** PyTorch, Transformers, PEFT, Datasets, Huggingface Hub ## Citation **APA:** Ahmed Abdelmalek. (2025). MNLP_M3_quantized_dpo_mcqa_model [Computer software]. Hugging Face. **BibTeX:** @misc{abdelmalek2025quantizeddpo, author = {Ahmed Abdelmalek}, title = {MNLP_M3_quantized_dpo_mcqa_model}, year = {2025}, publisher = {Hugging Face}, howpublished = {\url{https://huggingface.co/abdou-u/MNLP_M3_quantized_dpo_mcqa_model}} } ## Model Card Contact For questions, contact: ahmed.abdelmalek@epfl.ch