--- library_name: transformers tags: - quantization - qlora - w4a16 - mcqa - cs552 --- # Model Card for `abdou-u/MNLP_M3_quantized_model` This model is a quantized version of the MCQA model trained on multiple-choice question answering tasks. It uses **QLoRA** with **W4A16** (4-bit weights, 16-bit activations) to minimize memory usage while maintaining high accuracy. The model is fine-tuned on a carefully selected stabilization subset from the MCQA dataset. ## Model Details ### Model Description - **Developed by:** Ahmed Abdelmalek (EPFL CS-552 Project) - **Model type:** Causal Language Model (Transformer-based) - **Language(s):** English - **License:** Apache 2.0 (inherited from base models) - **Fine-tuned from:** `mgatti/MNLP_M3_mcqa_model` - **Quantization:** QLoRA (W4A16), using 4-bit NF4 weights and bfloat16 activations with LoRA adapters merged post-training. ### Model Sources - **Repository:** Private GitHub repository (training code) - **Model Hub:** [abdou-u/MNLP_M3_quantized_model](https://huggingface.co/abdou-u/MNLP_M3_quantized_model) ## Uses ### Direct Use This model can be used for inference on multiple-choice question answering tasks, especially when deploying in resource-constrained environments (e.g., A100, T4, or consumer GPUs). ### Out-of-Scope Use - Not intended for open-ended generation. - Not suitable for dialogue applications. ## Bias, Risks, and Limitations - Biases may be present from the original datasets. - Not suitable for real-world high-stakes decision making. ## How to Get Started ```python from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained("abdou-u/MNLP_M3_quantized_model") tokenizer = AutoTokenizer.from_pretrained("abdou-u/MNLP_M3_quantized_model") ``` ## Training Details ### Training Data The model was fine-tuned on a 15% stabilization subset that is `abdou-u/MNLP_M3_quantized_dataset`, a harmonized MCQA-style dataset consisting of curated subsets from MMLU, AQuA, and TheoremQA. ### Training Procedure - Quantized with QLoRA W4A16 (NF4 weights, bfloat16 activations) - Trained for 1 epoch - Batch size: 8 (with gradient accumulation = 4) - LoRA adapters merged post-training #### Hyperparameters - `learning_rate = 2e-5` - `num_train_epochs = 1` - `fp16 = True` - `lora_alpha = 32` - `r = 16` - `lora_dropout = 0.05` ## Evaluation - Fine-tuned model evaluated on internal stabilization subset using accuracy and F1 score (details in final report). ## Environmental Impact - **Hardware Type:** A100 (80GB) - **Training Duration:** ~20 minutes - **Compute Region:** Europe (EPFL cluster) - **Estimated CO₂ emissions:** < 0.1 kg ## Technical Specifications - Framework: PyTorch (Transformers, PEFT) - Quantization: BitsAndBytes (4-bit NF4), merged LoRA adapters ## Citation **APA:** Ahmed Abdelmalek. (2025). *MNLP_M3_quantized_model (QLoRA W4A16 MCQA)*. Hugging Face. ## Model Card Contact - Ahmed Abdelmalek — [ahmed.abdelmalek@epfl.ch]