Model Card for abdou-u/MNLP_M3_w4a8_quantized_mcqa_model

Summary

This model is a W4A8 (4-bit weights, 8-bit activations) quantized version of the mgatti/MNLP_M3_mcqa_model, obtained using Optimum-Quanto. It has been pushed to the Hugging Face Hub using the PyTorchModelHubMixin interface.

Model Details

Model Description

  • Name: MNLP_M3_w4a8_quantized_mcqa_model
  • Source model: mgatti/MNLP_M3_mcqa_model
  • Quantization: Optimum-Quanto W4A8 (qint4 weights, qint8 activations)
  • Usage: Efficient inference for multiple-choice question answering (MCQA) tasks
  • Developer: Ahmed Abdelmalek, EPFL CS-552 2025 Project M3
  • License: MIT
  • Language(s): English
  • Hardware target: Consumer and cloud GPUs with low memory footprint

Model Sources

  • Repository: Private GitHub (Training script not public)
  • Paper: Not published
  • Docs: This README

Use Cases

Direct Use

This model is optimized for fast inference in MCQA tasks under constrained VRAM settings.

Intended Users

Researchers and engineers looking to deploy a small, high-performance MCQA model.

Limitations

This model is quantized and may have a slight performance drop compared to full-precision models. It is not suitable for generation or tasks beyond MCQA.

Getting Started

from transformers import AutoTokenizer
from optimum.quanto.models import QuantizedModelForCausalLM

model = QuantizedModelForCausalLM.from_pretrained("abdou-u/MNLP_M3_w4a8_quantized_mcqa_model")
tokenizer = AutoTokenizer.from_pretrained("abdou-u/MNLP_M3_w4a8_quantized_mcqa_model")

Technical Specifications

  • Quantization library: Optimum-Quanto
  • Weights: 4-bit (qint4)
  • Activations: 8-bit (qint8)
  • Format: Hugging Face Transformers-compatible

Environmental Impact

  • Hardware: A100 80GB (used during validation)
  • Quantization: 1 pass, full model (approx. 3 mins)
  • Carbon Emissions: Negligible for quantization

Citation

If you use this model, please cite:

@misc{abdelmalek2025mnlp,
  title={MNLP M3 Quantized MCQA Model (W4A8)},
  author={Ahmed Abdelmalek},
  year={2025},
  howpublished={\url{https://huggingface.co/abdou-u/MNLP_M3_w4a8_quantized_mcqa_model}},
  note={CS-552 Project M3}
}

Contact

Ahmed Abdelmalek - ahmed.abdelmalek@epfl.ch

Downloads last month
4
Safetensors
Model size
0.5B params
Tensor type
F32
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support