File size: 2,996 Bytes

9653d3b
 
d9964ae
 
 
 
 
 
9653d3b
 
4f11fd3
9653d3b
4f11fd3
9653d3b
 
 
 
 
4f11fd3
 
 
 
 
 
9653d3b
4f11fd3
9653d3b
4f11fd3
 
9653d3b
 
 
 
 
4f11fd3
9653d3b
 
 
4f11fd3
 
9653d3b
 
 
4f11fd3
 
9653d3b
4f11fd3
9653d3b
4f11fd3
 
9653d3b
4f11fd3
 
 
9653d3b
 
 
 
 
4f11fd3
9653d3b
 
 
4f11fd3
 
 
 
9653d3b
4f11fd3
9653d3b
4f11fd3
 
 
 
 
 
9653d3b
 
 
4f11fd3
9653d3b
 
 
4f11fd3
 
 
 
9653d3b
4f11fd3
9653d3b
4f11fd3
 
9653d3b
4f11fd3
9653d3b
 
4f11fd3
9653d3b
 
 
0663b4b

---
library_name: transformers
tags:
- quantization
- qlora
- w4a16
- mcqa
- cs552
---

# Model Card for `abdou-u/MNLP_M3_quantized_model`

This model is a quantized version of the MCQA model trained on multiple-choice question answering tasks. It uses **QLoRA** with **W4A16** (4-bit weights, 16-bit activations) to minimize memory usage while maintaining high accuracy. The model is fine-tuned on a carefully selected stabilization subset from the MCQA dataset.

## Model Details

### Model Description

- **Developed by:** Ahmed Abdelmalek (EPFL CS-552 Project)
- **Model type:** Causal Language Model (Transformer-based)
- **Language(s):** English
- **License:** Apache 2.0 (inherited from base models)
- **Fine-tuned from:** `mgatti/MNLP_M3_mcqa_model`
- **Quantization:** QLoRA (W4A16), using 4-bit NF4 weights and bfloat16 activations with LoRA adapters merged post-training.

### Model Sources

- **Repository:** Private GitHub repository (training code)
- **Model Hub:** [abdou-u/MNLP_M3_quantized_model](https://huggingface.co/abdou-u/MNLP_M3_quantized_model)

## Uses

### Direct Use

This model can be used for inference on multiple-choice question answering tasks, especially when deploying in resource-constrained environments (e.g., A100, T4, or consumer GPUs).

### Out-of-Scope Use

- Not intended for open-ended generation.
- Not suitable for dialogue applications.

## Bias, Risks, and Limitations

- Biases may be present from the original datasets.
- Not suitable for real-world high-stakes decision making.

## How to Get Started

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("abdou-u/MNLP_M3_quantized_model")
tokenizer = AutoTokenizer.from_pretrained("abdou-u/MNLP_M3_quantized_model")
```

## Training Details

### Training Data

The model was fine-tuned on a 15% stabilization subset that is `abdou-u/MNLP_M3_quantized_dataset`, a harmonized MCQA-style dataset consisting of curated subsets from MMLU, AQuA, and TheoremQA.

### Training Procedure

- Quantized with QLoRA W4A16 (NF4 weights, bfloat16 activations)
- Trained for 1 epoch
- Batch size: 8 (with gradient accumulation = 4)
- LoRA adapters merged post-training

#### Hyperparameters

- `learning_rate = 2e-5`
- `num_train_epochs = 1`
- `fp16 = True`
- `lora_alpha = 32`
- `r = 16`
- `lora_dropout = 0.05`

## Evaluation

- Fine-tuned model evaluated on internal stabilization subset using accuracy and F1 score (details in final report).

## Environmental Impact

- **Hardware Type:** A100 (80GB)
- **Training Duration:** ~20 minutes
- **Compute Region:** Europe (EPFL cluster)
- **Estimated CO₂ emissions:** < 0.1 kg

## Technical Specifications

- Framework: PyTorch (Transformers, PEFT)
- Quantization: BitsAndBytes (4-bit NF4), merged LoRA adapters

## Citation

**APA:**
Ahmed Abdelmalek. (2025). *MNLP_M3_quantized_model (QLoRA W4A16 MCQA)*. Hugging Face.

## Model Card Contact

- Ahmed Abdelmalek — [ahmed.abdelmalek@epfl.ch]