---
library_name: transformers
tags:
- quantization
- qlora
- w4a16
- mcqa
- cs552
---

# Model Card for `abdou-u/MNLP_M3_quantized_model`

This model is a quantized version of the MCQA model trained on multiple-choice question answering tasks. It uses **QLoRA** with **W4A16** (4-bit weights, 16-bit activations) to minimize memory usage while maintaining high accuracy. The model is fine-tuned on a carefully selected stabilization subset from the MCQA dataset.

## Model Details

### Model Description

- **Developed by:** Ahmed Abdelmalek (EPFL CS-552 Project)
- **Model type:** Causal Language Model (Transformer-based)
- **Language(s):** English
- **License:** Apache 2.0 (inherited from base models)
- **Fine-tuned from:** `mgatti/MNLP_M3_mcqa_model`
- **Quantization:** QLoRA (W4A16), using 4-bit NF4 weights and bfloat16 activations with LoRA adapters merged post-training.

### Model Sources

- **Repository:** Private GitHub repository (training code)
- **Model Hub:** [abdou-u/MNLP_M3_quantized_model](https://huggingface.co/abdou-u/MNLP_M3_quantized_model)

## Uses

### Direct Use

This model can be used for inference on multiple-choice question answering tasks, especially when deploying in resource-constrained environments (e.g., A100, T4, or consumer GPUs).

### Out-of-Scope Use

- Not intended for open-ended generation.
- Not suitable for dialogue applications.

## Bias, Risks, and Limitations

- Biases may be present from the original datasets.
- Not suitable for real-world high-stakes decision making.

## How to Get Started

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("abdou-u/MNLP_M3_quantized_model")
tokenizer = AutoTokenizer.from_pretrained("abdou-u/MNLP_M3_quantized_model")
```

## Training Details

### Training Data

The model was fine-tuned on a 15% stabilization subset that is `abdou-u/MNLP_M3_quantized_dataset`, a harmonized MCQA-style dataset consisting of curated subsets from MMLU, AQuA, and TheoremQA.

### Training Procedure

- Quantized with QLoRA W4A16 (NF4 weights, bfloat16 activations)
- Trained for 1 epoch
- Batch size: 8 (with gradient accumulation = 4)
- LoRA adapters merged post-training

#### Hyperparameters

- `learning_rate = 2e-5`
- `num_train_epochs = 1`
- `fp16 = True`
- `lora_alpha = 32`
- `r = 16`
- `lora_dropout = 0.05`

## Evaluation

- Fine-tuned model evaluated on internal stabilization subset using accuracy and F1 score (details in final report).

## Environmental Impact

- **Hardware Type:** A100 (80GB)
- **Training Duration:** ~20 minutes
- **Compute Region:** Europe (EPFL cluster)
- **Estimated CO₂ emissions:** < 0.1 kg

## Technical Specifications

- Framework: PyTorch (Transformers, PEFT)
- Quantization: BitsAndBytes (4-bit NF4), merged LoRA adapters

## Citation

**APA:**
Ahmed Abdelmalek. (2025). *MNLP_M3_quantized_model (QLoRA W4A16 MCQA)*. Hugging Face.

## Model Card Contact

- Ahmed Abdelmalek — [ahmed.abdelmalek@epfl.ch]