--- tags: - model_hub_mixin --- # Model Card for `abdou-u/MNLP_M3_w4a8_quantized_mcqa_model` ## Summary This model is a W4A8 (4-bit weights, 8-bit activations) quantized version of the `mgatti/MNLP_M3_mcqa_model`, obtained using [Optimum-Quanto](https://huggingface.co/docs/optimum/main/en/quanto/index). It has been pushed to the Hugging Face Hub using the `PyTorchModelHubMixin` interface. ## Model Details ### Model Description - **Name**: MNLP_M3_w4a8_quantized_mcqa_model - **Source model**: `mgatti/MNLP_M3_mcqa_model` - **Quantization**: Optimum-Quanto W4A8 (qint4 weights, qint8 activations) - **Usage**: Efficient inference for multiple-choice question answering (MCQA) tasks - **Developer**: Ahmed Abdelmalek, EPFL CS-552 2025 Project M3 - **License**: MIT - **Language(s)**: English - **Hardware target**: Consumer and cloud GPUs with low memory footprint ### Model Sources - **Repository**: *Private GitHub (Training script not public)* - **Paper**: Not published - **Docs**: This README ## Use Cases ### Direct Use This model is optimized for fast inference in MCQA tasks under constrained VRAM settings. ### Intended Users Researchers and engineers looking to deploy a small, high-performance MCQA model. ## Limitations This model is quantized and may have a slight performance drop compared to full-precision models. It is not suitable for generation or tasks beyond MCQA. ## Getting Started ```python from transformers import AutoTokenizer from optimum.quanto.models import QuantizedModelForCausalLM model = QuantizedModelForCausalLM.from_pretrained("abdou-u/MNLP_M3_w4a8_quantized_mcqa_model") tokenizer = AutoTokenizer.from_pretrained("abdou-u/MNLP_M3_w4a8_quantized_mcqa_model") ``` ## Technical Specifications - **Quantization library**: Optimum-Quanto - **Weights**: 4-bit (qint4) - **Activations**: 8-bit (qint8) - **Format**: Hugging Face Transformers-compatible ## Environmental Impact - **Hardware**: A100 80GB (used during validation) - **Quantization**: 1 pass, full model (approx. 3 mins) - **Carbon Emissions**: Negligible for quantization ## Citation If you use this model, please cite: ``` @misc{abdelmalek2025mnlp, title={MNLP M3 Quantized MCQA Model (W4A8)}, author={Ahmed Abdelmalek}, year={2025}, howpublished={\url{https://huggingface.co/abdou-u/MNLP_M3_w4a8_quantized_mcqa_model}}, note={CS-552 Project M3} } ``` ## Contact Ahmed Abdelmalek - ahmed.abdelmalek@epfl.ch