File size: 2,996 Bytes
9653d3b d9964ae 9653d3b 4f11fd3 9653d3b 4f11fd3 9653d3b 4f11fd3 9653d3b 4f11fd3 9653d3b 4f11fd3 9653d3b 4f11fd3 9653d3b 4f11fd3 9653d3b 4f11fd3 9653d3b 4f11fd3 9653d3b 4f11fd3 9653d3b 4f11fd3 9653d3b 4f11fd3 9653d3b 4f11fd3 9653d3b 4f11fd3 9653d3b 4f11fd3 9653d3b 4f11fd3 9653d3b 4f11fd3 9653d3b 4f11fd3 9653d3b 4f11fd3 9653d3b 4f11fd3 9653d3b 4f11fd3 9653d3b 0663b4b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 |
---
library_name: transformers
tags:
- quantization
- qlora
- w4a16
- mcqa
- cs552
---
# Model Card for `abdou-u/MNLP_M3_quantized_model`
This model is a quantized version of the MCQA model trained on multiple-choice question answering tasks. It uses **QLoRA** with **W4A16** (4-bit weights, 16-bit activations) to minimize memory usage while maintaining high accuracy. The model is fine-tuned on a carefully selected stabilization subset from the MCQA dataset.
## Model Details
### Model Description
- **Developed by:** Ahmed Abdelmalek (EPFL CS-552 Project)
- **Model type:** Causal Language Model (Transformer-based)
- **Language(s):** English
- **License:** Apache 2.0 (inherited from base models)
- **Fine-tuned from:** `mgatti/MNLP_M3_mcqa_model`
- **Quantization:** QLoRA (W4A16), using 4-bit NF4 weights and bfloat16 activations with LoRA adapters merged post-training.
### Model Sources
- **Repository:** Private GitHub repository (training code)
- **Model Hub:** [abdou-u/MNLP_M3_quantized_model](https://huggingface.co/abdou-u/MNLP_M3_quantized_model)
## Uses
### Direct Use
This model can be used for inference on multiple-choice question answering tasks, especially when deploying in resource-constrained environments (e.g., A100, T4, or consumer GPUs).
### Out-of-Scope Use
- Not intended for open-ended generation.
- Not suitable for dialogue applications.
## Bias, Risks, and Limitations
- Biases may be present from the original datasets.
- Not suitable for real-world high-stakes decision making.
## How to Get Started
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("abdou-u/MNLP_M3_quantized_model")
tokenizer = AutoTokenizer.from_pretrained("abdou-u/MNLP_M3_quantized_model")
```
## Training Details
### Training Data
The model was fine-tuned on a 15% stabilization subset that is `abdou-u/MNLP_M3_quantized_dataset`, a harmonized MCQA-style dataset consisting of curated subsets from MMLU, AQuA, and TheoremQA.
### Training Procedure
- Quantized with QLoRA W4A16 (NF4 weights, bfloat16 activations)
- Trained for 1 epoch
- Batch size: 8 (with gradient accumulation = 4)
- LoRA adapters merged post-training
#### Hyperparameters
- `learning_rate = 2e-5`
- `num_train_epochs = 1`
- `fp16 = True`
- `lora_alpha = 32`
- `r = 16`
- `lora_dropout = 0.05`
## Evaluation
- Fine-tuned model evaluated on internal stabilization subset using accuracy and F1 score (details in final report).
## Environmental Impact
- **Hardware Type:** A100 (80GB)
- **Training Duration:** ~20 minutes
- **Compute Region:** Europe (EPFL cluster)
- **Estimated CO₂ emissions:** < 0.1 kg
## Technical Specifications
- Framework: PyTorch (Transformers, PEFT)
- Quantization: BitsAndBytes (4-bit NF4), merged LoRA adapters
## Citation
**APA:**
Ahmed Abdelmalek. (2025). *MNLP_M3_quantized_model (QLoRA W4A16 MCQA)*. Hugging Face.
## Model Card Contact
- Ahmed Abdelmalek — [ahmed.abdelmalek@epfl.ch]
|