|
|
--- |
|
|
library_name: transformers |
|
|
tags: |
|
|
- quantization |
|
|
- qlora |
|
|
- w4a16 |
|
|
- mcqa |
|
|
- cs552 |
|
|
--- |
|
|
|
|
|
# Model Card for `abdou-u/MNLP_M3_quantized_model` |
|
|
|
|
|
This model is a quantized version of the MCQA model trained on multiple-choice question answering tasks. It uses **QLoRA** with **W4A16** (4-bit weights, 16-bit activations) to minimize memory usage while maintaining high accuracy. The model is fine-tuned on a carefully selected stabilization subset from the MCQA dataset. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
### Model Description |
|
|
|
|
|
- **Developed by:** Ahmed Abdelmalek (EPFL CS-552 Project) |
|
|
- **Model type:** Causal Language Model (Transformer-based) |
|
|
- **Language(s):** English |
|
|
- **License:** Apache 2.0 (inherited from base models) |
|
|
- **Fine-tuned from:** `mgatti/MNLP_M3_mcqa_model` |
|
|
- **Quantization:** QLoRA (W4A16), using 4-bit NF4 weights and bfloat16 activations with LoRA adapters merged post-training. |
|
|
|
|
|
### Model Sources |
|
|
|
|
|
- **Repository:** Private GitHub repository (training code) |
|
|
- **Model Hub:** [abdou-u/MNLP_M3_quantized_model](https://huggingface.co/abdou-u/MNLP_M3_quantized_model) |
|
|
|
|
|
## Uses |
|
|
|
|
|
### Direct Use |
|
|
|
|
|
This model can be used for inference on multiple-choice question answering tasks, especially when deploying in resource-constrained environments (e.g., A100, T4, or consumer GPUs). |
|
|
|
|
|
### Out-of-Scope Use |
|
|
|
|
|
- Not intended for open-ended generation. |
|
|
- Not suitable for dialogue applications. |
|
|
|
|
|
## Bias, Risks, and Limitations |
|
|
|
|
|
- Biases may be present from the original datasets. |
|
|
- Not suitable for real-world high-stakes decision making. |
|
|
|
|
|
## How to Get Started |
|
|
|
|
|
```python |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
|
|
model = AutoModelForCausalLM.from_pretrained("abdou-u/MNLP_M3_quantized_model") |
|
|
tokenizer = AutoTokenizer.from_pretrained("abdou-u/MNLP_M3_quantized_model") |
|
|
``` |
|
|
|
|
|
## Training Details |
|
|
|
|
|
### Training Data |
|
|
|
|
|
The model was fine-tuned on a 15% stabilization subset that is `abdou-u/MNLP_M3_quantized_dataset`, a harmonized MCQA-style dataset consisting of curated subsets from MMLU, AQuA, and TheoremQA. |
|
|
|
|
|
### Training Procedure |
|
|
|
|
|
- Quantized with QLoRA W4A16 (NF4 weights, bfloat16 activations) |
|
|
- Trained for 1 epoch |
|
|
- Batch size: 8 (with gradient accumulation = 4) |
|
|
- LoRA adapters merged post-training |
|
|
|
|
|
#### Hyperparameters |
|
|
|
|
|
- `learning_rate = 2e-5` |
|
|
- `num_train_epochs = 1` |
|
|
- `fp16 = True` |
|
|
- `lora_alpha = 32` |
|
|
- `r = 16` |
|
|
- `lora_dropout = 0.05` |
|
|
|
|
|
## Evaluation |
|
|
|
|
|
- Fine-tuned model evaluated on internal stabilization subset using accuracy and F1 score (details in final report). |
|
|
|
|
|
## Environmental Impact |
|
|
|
|
|
- **Hardware Type:** A100 (80GB) |
|
|
- **Training Duration:** ~20 minutes |
|
|
- **Compute Region:** Europe (EPFL cluster) |
|
|
- **Estimated CO₂ emissions:** < 0.1 kg |
|
|
|
|
|
## Technical Specifications |
|
|
|
|
|
- Framework: PyTorch (Transformers, PEFT) |
|
|
- Quantization: BitsAndBytes (4-bit NF4), merged LoRA adapters |
|
|
|
|
|
## Citation |
|
|
|
|
|
**APA:** |
|
|
Ahmed Abdelmalek. (2025). *MNLP_M3_quantized_model (QLoRA W4A16 MCQA)*. Hugging Face. |
|
|
|
|
|
## Model Card Contact |
|
|
|
|
|
- Ahmed Abdelmalek — [ahmed.abdelmalek@epfl.ch] |
|
|
|