abdou-u's picture
Update README.md
c383ee8 verified
---
tags:
- model_hub_mixin
---
# Model Card for `abdou-u/MNLP_M3_w4a8_quantized_mcqa_model`
## Summary
This model is a W4A8 (4-bit weights, 8-bit activations) quantized version of the `mgatti/MNLP_M3_mcqa_model`, obtained using [Optimum-Quanto](https://huggingface.co/docs/optimum/main/en/quanto/index). It has been pushed to the Hugging Face Hub using the `PyTorchModelHubMixin` interface.
## Model Details
### Model Description
- **Name**: MNLP_M3_w4a8_quantized_mcqa_model
- **Source model**: `mgatti/MNLP_M3_mcqa_model`
- **Quantization**: Optimum-Quanto W4A8 (qint4 weights, qint8 activations)
- **Usage**: Efficient inference for multiple-choice question answering (MCQA) tasks
- **Developer**: Ahmed Abdelmalek, EPFL CS-552 2025 Project M3
- **License**: MIT
- **Language(s)**: English
- **Hardware target**: Consumer and cloud GPUs with low memory footprint
### Model Sources
- **Repository**: *Private GitHub (Training script not public)*
- **Paper**: Not published
- **Docs**: This README
## Use Cases
### Direct Use
This model is optimized for fast inference in MCQA tasks under constrained VRAM settings.
### Intended Users
Researchers and engineers looking to deploy a small, high-performance MCQA model.
## Limitations
This model is quantized and may have a slight performance drop compared to full-precision models. It is not suitable for generation or tasks beyond MCQA.
## Getting Started
```python
from transformers import AutoTokenizer
from optimum.quanto.models import QuantizedModelForCausalLM
model = QuantizedModelForCausalLM.from_pretrained("abdou-u/MNLP_M3_w4a8_quantized_mcqa_model")
tokenizer = AutoTokenizer.from_pretrained("abdou-u/MNLP_M3_w4a8_quantized_mcqa_model")
```
## Technical Specifications
- **Quantization library**: Optimum-Quanto
- **Weights**: 4-bit (qint4)
- **Activations**: 8-bit (qint8)
- **Format**: Hugging Face Transformers-compatible
## Environmental Impact
- **Hardware**: A100 80GB (used during validation)
- **Quantization**: 1 pass, full model (approx. 3 mins)
- **Carbon Emissions**: Negligible for quantization
## Citation
If you use this model, please cite:
```
@misc{abdelmalek2025mnlp,
title={MNLP M3 Quantized MCQA Model (W4A8)},
author={Ahmed Abdelmalek},
year={2025},
howpublished={\url{https://huggingface.co/abdou-u/MNLP_M3_w4a8_quantized_mcqa_model}},
note={CS-552 Project M3}
}
```
## Contact
Ahmed Abdelmalek - ahmed.abdelmalek@epfl.ch