abdou-u's picture
Upload Qwen3ForCausalLM
d9964ae verified
---
library_name: transformers
tags:
- quantization
- qlora
- w4a16
- mcqa
- cs552
---
# Model Card for `abdou-u/MNLP_M3_quantized_model`
This model is a quantized version of the MCQA model trained on multiple-choice question answering tasks. It uses **QLoRA** with **W4A16** (4-bit weights, 16-bit activations) to minimize memory usage while maintaining high accuracy. The model is fine-tuned on a carefully selected stabilization subset from the MCQA dataset.
## Model Details
### Model Description
- **Developed by:** Ahmed Abdelmalek (EPFL CS-552 Project)
- **Model type:** Causal Language Model (Transformer-based)
- **Language(s):** English
- **License:** Apache 2.0 (inherited from base models)
- **Fine-tuned from:** `mgatti/MNLP_M3_mcqa_model`
- **Quantization:** QLoRA (W4A16), using 4-bit NF4 weights and bfloat16 activations with LoRA adapters merged post-training.
### Model Sources
- **Repository:** Private GitHub repository (training code)
- **Model Hub:** [abdou-u/MNLP_M3_quantized_model](https://huggingface.co/abdou-u/MNLP_M3_quantized_model)
## Uses
### Direct Use
This model can be used for inference on multiple-choice question answering tasks, especially when deploying in resource-constrained environments (e.g., A100, T4, or consumer GPUs).
### Out-of-Scope Use
- Not intended for open-ended generation.
- Not suitable for dialogue applications.
## Bias, Risks, and Limitations
- Biases may be present from the original datasets.
- Not suitable for real-world high-stakes decision making.
## How to Get Started
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("abdou-u/MNLP_M3_quantized_model")
tokenizer = AutoTokenizer.from_pretrained("abdou-u/MNLP_M3_quantized_model")
```
## Training Details
### Training Data
The model was fine-tuned on a 15% stabilization subset that is `abdou-u/MNLP_M3_quantized_dataset`, a harmonized MCQA-style dataset consisting of curated subsets from MMLU, AQuA, and TheoremQA.
### Training Procedure
- Quantized with QLoRA W4A16 (NF4 weights, bfloat16 activations)
- Trained for 1 epoch
- Batch size: 8 (with gradient accumulation = 4)
- LoRA adapters merged post-training
#### Hyperparameters
- `learning_rate = 2e-5`
- `num_train_epochs = 1`
- `fp16 = True`
- `lora_alpha = 32`
- `r = 16`
- `lora_dropout = 0.05`
## Evaluation
- Fine-tuned model evaluated on internal stabilization subset using accuracy and F1 score (details in final report).
## Environmental Impact
- **Hardware Type:** A100 (80GB)
- **Training Duration:** ~20 minutes
- **Compute Region:** Europe (EPFL cluster)
- **Estimated CO₂ emissions:** < 0.1 kg
## Technical Specifications
- Framework: PyTorch (Transformers, PEFT)
- Quantization: BitsAndBytes (4-bit NF4), merged LoRA adapters
## Citation
**APA:**
Ahmed Abdelmalek. (2025). *MNLP_M3_quantized_model (QLoRA W4A16 MCQA)*. Hugging Face.
## Model Card Contact
- Ahmed Abdelmalek — [ahmed.abdelmalek@epfl.ch]