File size: 4,435 Bytes
c46a27d b8b6035 c46a27d b8b6035 c46a27d b8b6035 c46a27d b8b6035 c46a27d b8b6035 c46a27d b8b6035 c46a27d b8b6035 c46a27d b8b6035 c46a27d b8b6035 c46a27d b8b6035 c46a27d b8b6035 c46a27d b8b6035 c46a27d b8b6035 c46a27d b8b6035 c46a27d b8b6035 c46a27d b8b6035 c46a27d b8b6035 c46a27d b8b6035 c46a27d b8b6035 c46a27d b8b6035 c46a27d b8b6035 c46a27d b8b6035 c46a27d b8b6035 c46a27d b8b6035 c46a27d b8b6035 c46a27d b8b6035 c46a27d b8b6035 c46a27d b8b6035 c46a27d b8b6035 c46a27d b8b6035 c46a27d b8b6035 c46a27d b8b6035 c46a27d b8b6035 c46a27d b8b6035 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 |
---
library_name: transformers
tags:
- qlora
- quantization
- 4bit
- causal-lm
- transformers
- mcqa
- dpo
- multiple-choice
- w4a16
- hf-trained
---
# MNLP M3 - Quantized DPO + MCQA Model (W4A16, QLoRA)
This model is a quantized and QLoRA-fine-tuned version of the base `albertfares/MNLP_SFT_DPO` model. It is trained on curated stabilization data for multiple-choice question answering (MCQA) using LoRA adapters over 4-bit weights and 16-bit activations (W4A16).
It was developed as part of the CS-552 Multilingual NLP course at EPFL and is hosted for reproducible evaluation and downstream use.
## Model Details
### Model Description
This model adapts the `MNLP_SFT_DPO` model to handle complex MCQA reasoning using QLoRA (4-bit weights, 16-bit activations). It was trained using the quantized dataset [`abdou-u/MNLP_M3_quantized_dataset`](https://huggingface.co/datasets/abdou-u/MNLP_M3_quantized_dataset) and aims to strike a strong balance between memory efficiency and downstream accuracy.
- **Developed by:** Ahmed Abdelmalek
- **Finetuned from model:** `albertfares/MNLP_SFT_DPO`
- **Model type:** Causal Language Model (decoder-only, autoregressive)
- **Language(s):** English
- **License:** Apache 2.0
### Model Sources
- **Training Code:** Private GitHub Repository
- **Datasets:** [`abdou-u/MNLP_M3_quantized_dataset`](https://huggingface.co/datasets/abdou-u/MNLP_M3_quantized_dataset)
- **Base Model:** albertfares/MNLP_SFT_DPO
## Uses
### Direct Use
This model can be directly used for answering multiple-choice questions (MCQA) in English with a short explanation output.
### Downstream Use
Can be used in LLM pipelines requiring lightweight MCQA reasoning models with high accuracy and low VRAM cost.
### Out-of-Scope Use
Not intended for generative open-ended long-form answers or other modalities beyond multiple-choice QA.
## Bias, Risks, and Limitations
The model inherits biases from both the base DPO model and the MCQA dataset. It may underperform on non-English inputs or ambiguous multi-answer tasks.
### Recommendations
Use as part of a controlled QA system with additional verification modules. Do not use in high-stakes decision-making without human oversight.
## How to Get Started with the Model
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("abdou-u/MNLP_M3_quantized_dpo_mcqa_model")
tokenizer = AutoTokenizer.from_pretrained("abdou-u/MNLP_M3_quantized_dpo_mcqa_model")
```
## Training Details
### Training Data
This model was fine-tuned using the `abdou-u/MNLP_M3_quantized_dataset`, a mix of formatted MCQA questions from TheoremQA, AQuA, and synthetic examples with explanations.
### Training Procedure
The model was fine-tuned using QLoRA with:
- 4-bit NF4 quantization (W4A16)
- `r=16`, `alpha=32`, and dropout=0.05
- 1–2 epochs on the quantized dataset
#### Training Hyperparameters
- **Precision:** FP16 with QLoRA (W4A16)
- **Epochs:** 1–2
- **Batch size:** 8 (gradient accumulation: 4)
- **LR:** 2e-5
## Evaluation
### Testing Data
The model was evaluated on a diverse set of MCQA tasks:
- **MMLU** (16 subjects including Math, Physics, Bio, CS)
- **NLP4Education**
Tasks were tested under:
- **Zero-shot settings**
- **Few-shot settings** (2-shot context)
### Metrics
- Accuracy (for multiple-choice selection)
- Log-likelihood ranking (optional)
### Results
- Strong zero-shot and few-shot MCQA performance on MMLU benchmarks
- Robust to reasoning under minimal context
## Environmental Impact
- **Hardware Type:** NVIDIA A100 80GB x2
- **Hours Used:** ~0.5–1h
- **Cloud Provider:** EPFL RCP
- **Region:** Switzerland
- **Carbon Emitted:** Estimated < 0.5 kg CO2
## Technical Specifications
### Model Architecture
Quantized transformer decoder using QLoRA over the DPO-finetuned SFT model.
### Compute Infrastructure
- **Hardware:** 2x A100 80GB
- **Software:** PyTorch, Transformers, PEFT, Datasets, Huggingface Hub
## Citation
**APA:**
Ahmed Abdelmalek. (2025). MNLP_M3_quantized_dpo_mcqa_model [Computer software]. Hugging Face.
**BibTeX:**
@misc{abdelmalek2025quantizeddpo,
author = {Ahmed Abdelmalek},
title = {MNLP_M3_quantized_dpo_mcqa_model},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/abdou-u/MNLP_M3_quantized_dpo_mcqa_model}}
}
## Model Card Contact
For questions, contact: ahmed.abdelmalek@epfl.ch
|