abdou-u
/

MNLP_M3_quantized_model

Text Generation

text-generation-inference

4-bit precision

Model card Files Files and versions

MNLP_M3_quantized_model / README.md

abdou-u's picture

Upload Qwen3ForCausalLM

d9964ae verified 7 months ago

|

history blame contribute delete

3 kB

	---
	library_name: transformers
	tags:
	- quantization
	- qlora
	- w4a16
	- mcqa
	- cs552
	---

	# Model Card for `abdou-u/MNLP_M3_quantized_model`

	This model is a quantized version of the MCQA model trained on multiple-choice question answering tasks. It uses QLoRA with W4A16 (4-bit weights, 16-bit activations) to minimize memory usage while maintaining high accuracy. The model is fine-tuned on a carefully selected stabilization subset from the MCQA dataset.

	## Model Details

	### Model Description

	- Developed by: Ahmed Abdelmalek (EPFL CS-552 Project)
	- Model type: Causal Language Model (Transformer-based)
	- Language(s): English
	- License: Apache 2.0 (inherited from base models)
	- Fine-tuned from: `mgatti/MNLP_M3_mcqa_model`
	- Quantization: QLoRA (W4A16), using 4-bit NF4 weights and bfloat16 activations with LoRA adapters merged post-training.

	### Model Sources

	- Repository: Private GitHub repository (training code)
	- Model Hub: [abdou-u/MNLP_M3_quantized_model](https://huggingface.co/abdou-u/MNLP_M3_quantized_model)

	## Uses

	### Direct Use

	This model can be used for inference on multiple-choice question answering tasks, especially when deploying in resource-constrained environments (e.g., A100, T4, or consumer GPUs).

	### Out-of-Scope Use

	- Not intended for open-ended generation.
	- Not suitable for dialogue applications.

	## Bias, Risks, and Limitations

	- Biases may be present from the original datasets.
	- Not suitable for real-world high-stakes decision making.

	## How to Get Started

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model = AutoModelForCausalLM.from_pretrained("abdou-u/MNLP_M3_quantized_model")
	tokenizer = AutoTokenizer.from_pretrained("abdou-u/MNLP_M3_quantized_model")
	```

	## Training Details

	### Training Data

	The model was fine-tuned on a 15% stabilization subset that is `abdou-u/MNLP_M3_quantized_dataset`, a harmonized MCQA-style dataset consisting of curated subsets from MMLU, AQuA, and TheoremQA.

	### Training Procedure

	- Quantized with QLoRA W4A16 (NF4 weights, bfloat16 activations)
	- Trained for 1 epoch
	- Batch size: 8 (with gradient accumulation = 4)
	- LoRA adapters merged post-training

	#### Hyperparameters

	- `learning_rate = 2e-5`
	- `num_train_epochs = 1`
	- `fp16 = True`
	- `lora_alpha = 32`
	- `r = 16`
	- `lora_dropout = 0.05`

	## Evaluation

	- Fine-tuned model evaluated on internal stabilization subset using accuracy and F1 score (details in final report).

	## Environmental Impact

	- Hardware Type: A100 (80GB)
	- Training Duration: ~20 minutes
	- Compute Region: Europe (EPFL cluster)
	- Estimated CO₂ emissions: < 0.1 kg

	## Technical Specifications

	- Framework: PyTorch (Transformers, PEFT)
	- Quantization: BitsAndBytes (4-bit NF4), merged LoRA adapters

	## Citation

	APA:
	Ahmed Abdelmalek. (2025). MNLP_M3_quantized_model (QLoRA W4A16 MCQA). Hugging Face.

	## Model Card Contact

	- Ahmed Abdelmalek — [ahmed.abdelmalek@epfl.ch]