README.md · abdou-u/MNLP_M3_quantized_dpo_mcqa

File size: 4,435 Bytes

c46a27d
 
b8b6035
 
 
 
 
 
 
 
 
 
 
c46a27d
 
b8b6035
c46a27d
b8b6035
c46a27d
b8b6035
c46a27d
 
 
 
 
b8b6035
c46a27d
b8b6035
 
 
 
 
c46a27d
b8b6035
c46a27d
b8b6035
 
 
c46a27d
 
 
 
 
b8b6035
c46a27d
b8b6035
c46a27d
b8b6035
c46a27d
 
 
b8b6035
c46a27d
 
 
b8b6035
c46a27d
 
 
b8b6035
c46a27d
 
 
b8b6035
 
c46a27d
b8b6035
 
 
c46a27d
 
 
 
 
b8b6035
c46a27d
 
 
b8b6035
 
 
 
c46a27d
 
 
b8b6035
 
 
 
c46a27d
 
 
b8b6035
c46a27d
b8b6035
 
 
c46a27d
b8b6035
 
 
c46a27d
b8b6035
c46a27d
b8b6035
 
c46a27d
 
 
b8b6035
 
c46a27d
 
 
b8b6035
 
 
 
 
c46a27d
b8b6035
c46a27d
b8b6035
c46a27d
b8b6035
c46a27d
 
 
b8b6035
 
c46a27d
b8b6035
c46a27d
 
b8b6035
c46a27d
b8b6035
 
 
 
 
 
 
 
c46a27d
 
 
b8b6035

---
library_name: transformers
tags:
- qlora
- quantization
- 4bit
- causal-lm
- transformers
- mcqa
- dpo
- multiple-choice
- w4a16
- hf-trained
---

# MNLP M3 - Quantized DPO + MCQA Model (W4A16, QLoRA)

This model is a quantized and QLoRA-fine-tuned version of the base `albertfares/MNLP_SFT_DPO` model. It is trained on curated stabilization data for multiple-choice question answering (MCQA) using LoRA adapters over 4-bit weights and 16-bit activations (W4A16).

It was developed as part of the CS-552 Multilingual NLP course at EPFL and is hosted for reproducible evaluation and downstream use.

## Model Details

### Model Description

This model adapts the `MNLP_SFT_DPO` model to handle complex MCQA reasoning using QLoRA (4-bit weights, 16-bit activations). It was trained using the quantized dataset [`abdou-u/MNLP_M3_quantized_dataset`](https://huggingface.co/datasets/abdou-u/MNLP_M3_quantized_dataset) and aims to strike a strong balance between memory efficiency and downstream accuracy.

- **Developed by:** Ahmed Abdelmalek
- **Finetuned from model:** `albertfares/MNLP_SFT_DPO`
- **Model type:** Causal Language Model (decoder-only, autoregressive)
- **Language(s):** English
- **License:** Apache 2.0

### Model Sources

- **Training Code:** Private GitHub Repository
- **Datasets:** [`abdou-u/MNLP_M3_quantized_dataset`](https://huggingface.co/datasets/abdou-u/MNLP_M3_quantized_dataset)
- **Base Model:** albertfares/MNLP_SFT_DPO

## Uses

### Direct Use

This model can be directly used for answering multiple-choice questions (MCQA) in English with a short explanation output.

### Downstream Use

Can be used in LLM pipelines requiring lightweight MCQA reasoning models with high accuracy and low VRAM cost.

### Out-of-Scope Use

Not intended for generative open-ended long-form answers or other modalities beyond multiple-choice QA.

## Bias, Risks, and Limitations

The model inherits biases from both the base DPO model and the MCQA dataset. It may underperform on non-English inputs or ambiguous multi-answer tasks.

### Recommendations

Use as part of a controlled QA system with additional verification modules. Do not use in high-stakes decision-making without human oversight.

## How to Get Started with the Model

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("abdou-u/MNLP_M3_quantized_dpo_mcqa_model")
tokenizer = AutoTokenizer.from_pretrained("abdou-u/MNLP_M3_quantized_dpo_mcqa_model")
```

## Training Details

### Training Data

This model was fine-tuned using the `abdou-u/MNLP_M3_quantized_dataset`, a mix of formatted MCQA questions from TheoremQA, AQuA, and synthetic examples with explanations.

### Training Procedure

The model was fine-tuned using QLoRA with:
- 4-bit NF4 quantization (W4A16)
- `r=16`, `alpha=32`, and dropout=0.05
- 1–2 epochs on the quantized dataset

#### Training Hyperparameters

- **Precision:** FP16 with QLoRA (W4A16)
- **Epochs:** 1–2
- **Batch size:** 8 (gradient accumulation: 4)
- **LR:** 2e-5

## Evaluation

### Testing Data

The model was evaluated on a diverse set of MCQA tasks:
- **MMLU** (16 subjects including Math, Physics, Bio, CS)
- **NLP4Education**

Tasks were tested under:
- **Zero-shot settings**
- **Few-shot settings** (2-shot context)

### Metrics

- Accuracy (for multiple-choice selection)
- Log-likelihood ranking (optional)

### Results

- Strong zero-shot and few-shot MCQA performance on MMLU benchmarks
- Robust to reasoning under minimal context

## Environmental Impact

- **Hardware Type:** NVIDIA A100 80GB x2
- **Hours Used:** ~0.5–1h
- **Cloud Provider:** EPFL RCP
- **Region:** Switzerland
- **Carbon Emitted:** Estimated < 0.5 kg CO2

## Technical Specifications

### Model Architecture

Quantized transformer decoder using QLoRA over the DPO-finetuned SFT model.

### Compute Infrastructure

- **Hardware:** 2x A100 80GB
- **Software:** PyTorch, Transformers, PEFT, Datasets, Huggingface Hub

## Citation

**APA:**
Ahmed Abdelmalek. (2025). MNLP_M3_quantized_dpo_mcqa_model [Computer software]. Hugging Face.

**BibTeX:**
@misc{abdelmalek2025quantizeddpo,
  author = {Ahmed Abdelmalek},
  title = {MNLP_M3_quantized_dpo_mcqa_model},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/abdou-u/MNLP_M3_quantized_dpo_mcqa_model}}
}

## Model Card Contact

For questions, contact: ahmed.abdelmalek@epfl.ch