|
|
--- |
|
|
tags: |
|
|
- model_hub_mixin |
|
|
--- |
|
|
|
|
|
# Model Card for `abdou-u/MNLP_M3_w4a8_quantized_mcqa_model` |
|
|
|
|
|
## Summary |
|
|
|
|
|
This model is a W4A8 (4-bit weights, 8-bit activations) quantized version of the `mgatti/MNLP_M3_mcqa_model`, obtained using [Optimum-Quanto](https://huggingface.co/docs/optimum/main/en/quanto/index). It has been pushed to the Hugging Face Hub using the `PyTorchModelHubMixin` interface. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
### Model Description |
|
|
|
|
|
- **Name**: MNLP_M3_w4a8_quantized_mcqa_model |
|
|
- **Source model**: `mgatti/MNLP_M3_mcqa_model` |
|
|
- **Quantization**: Optimum-Quanto W4A8 (qint4 weights, qint8 activations) |
|
|
- **Usage**: Efficient inference for multiple-choice question answering (MCQA) tasks |
|
|
- **Developer**: Ahmed Abdelmalek, EPFL CS-552 2025 Project M3 |
|
|
- **License**: MIT |
|
|
- **Language(s)**: English |
|
|
- **Hardware target**: Consumer and cloud GPUs with low memory footprint |
|
|
|
|
|
### Model Sources |
|
|
|
|
|
- **Repository**: *Private GitHub (Training script not public)* |
|
|
- **Paper**: Not published |
|
|
- **Docs**: This README |
|
|
|
|
|
## Use Cases |
|
|
|
|
|
### Direct Use |
|
|
|
|
|
This model is optimized for fast inference in MCQA tasks under constrained VRAM settings. |
|
|
|
|
|
### Intended Users |
|
|
|
|
|
Researchers and engineers looking to deploy a small, high-performance MCQA model. |
|
|
|
|
|
## Limitations |
|
|
|
|
|
This model is quantized and may have a slight performance drop compared to full-precision models. It is not suitable for generation or tasks beyond MCQA. |
|
|
|
|
|
## Getting Started |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer |
|
|
from optimum.quanto.models import QuantizedModelForCausalLM |
|
|
|
|
|
model = QuantizedModelForCausalLM.from_pretrained("abdou-u/MNLP_M3_w4a8_quantized_mcqa_model") |
|
|
tokenizer = AutoTokenizer.from_pretrained("abdou-u/MNLP_M3_w4a8_quantized_mcqa_model") |
|
|
``` |
|
|
|
|
|
## Technical Specifications |
|
|
|
|
|
- **Quantization library**: Optimum-Quanto |
|
|
- **Weights**: 4-bit (qint4) |
|
|
- **Activations**: 8-bit (qint8) |
|
|
- **Format**: Hugging Face Transformers-compatible |
|
|
|
|
|
## Environmental Impact |
|
|
|
|
|
- **Hardware**: A100 80GB (used during validation) |
|
|
- **Quantization**: 1 pass, full model (approx. 3 mins) |
|
|
- **Carbon Emissions**: Negligible for quantization |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model, please cite: |
|
|
|
|
|
``` |
|
|
@misc{abdelmalek2025mnlp, |
|
|
title={MNLP M3 Quantized MCQA Model (W4A8)}, |
|
|
author={Ahmed Abdelmalek}, |
|
|
year={2025}, |
|
|
howpublished={\url{https://huggingface.co/abdou-u/MNLP_M3_w4a8_quantized_mcqa_model}}, |
|
|
note={CS-552 Project M3} |
|
|
} |
|
|
``` |
|
|
|
|
|
## Contact |
|
|
|
|
|
Ahmed Abdelmalek - ahmed.abdelmalek@epfl.ch |