Model Card for abdou-u/MNLP_M3_w4a8_quantized_mcqa_model
Summary
This model is a W4A8 (4-bit weights, 8-bit activations) quantized version of the mgatti/MNLP_M3_mcqa_model, obtained using Optimum-Quanto. It has been pushed to the Hugging Face Hub using the PyTorchModelHubMixin interface.
Model Details
Model Description
- Name: MNLP_M3_w4a8_quantized_mcqa_model
- Source model:
mgatti/MNLP_M3_mcqa_model - Quantization: Optimum-Quanto W4A8 (qint4 weights, qint8 activations)
- Usage: Efficient inference for multiple-choice question answering (MCQA) tasks
- Developer: Ahmed Abdelmalek, EPFL CS-552 2025 Project M3
- License: MIT
- Language(s): English
- Hardware target: Consumer and cloud GPUs with low memory footprint
Model Sources
- Repository: Private GitHub (Training script not public)
- Paper: Not published
- Docs: This README
Use Cases
Direct Use
This model is optimized for fast inference in MCQA tasks under constrained VRAM settings.
Intended Users
Researchers and engineers looking to deploy a small, high-performance MCQA model.
Limitations
This model is quantized and may have a slight performance drop compared to full-precision models. It is not suitable for generation or tasks beyond MCQA.
Getting Started
from transformers import AutoTokenizer
from optimum.quanto.models import QuantizedModelForCausalLM
model = QuantizedModelForCausalLM.from_pretrained("abdou-u/MNLP_M3_w4a8_quantized_mcqa_model")
tokenizer = AutoTokenizer.from_pretrained("abdou-u/MNLP_M3_w4a8_quantized_mcqa_model")
Technical Specifications
- Quantization library: Optimum-Quanto
- Weights: 4-bit (qint4)
- Activations: 8-bit (qint8)
- Format: Hugging Face Transformers-compatible
Environmental Impact
- Hardware: A100 80GB (used during validation)
- Quantization: 1 pass, full model (approx. 3 mins)
- Carbon Emissions: Negligible for quantization
Citation
If you use this model, please cite:
@misc{abdelmalek2025mnlp,
title={MNLP M3 Quantized MCQA Model (W4A8)},
author={Ahmed Abdelmalek},
year={2025},
howpublished={\url{https://huggingface.co/abdou-u/MNLP_M3_w4a8_quantized_mcqa_model}},
note={CS-552 Project M3}
}
Contact
Ahmed Abdelmalek - ahmed.abdelmalek@epfl.ch
- Downloads last month
- 4
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support