abdou-u
/

MNLP_M3_w4a8_quantized_mcqa_model

model_hub_mixin

8-bit precision

Model card Files Files and versions

MNLP_M3_w4a8_quantized_mcqa_model / README.md

abdou-u's picture

Update README.md

c383ee8 verified 7 months ago

|

history blame contribute delete

2.44 kB

	---
	tags:
	- model_hub_mixin
	---

	# Model Card for `abdou-u/MNLP_M3_w4a8_quantized_mcqa_model`

	## Summary

	This model is a W4A8 (4-bit weights, 8-bit activations) quantized version of the `mgatti/MNLP_M3_mcqa_model`, obtained using [Optimum-Quanto](https://huggingface.co/docs/optimum/main/en/quanto/index). It has been pushed to the Hugging Face Hub using the `PyTorchModelHubMixin` interface.

	## Model Details

	### Model Description

	- Name: MNLP_M3_w4a8_quantized_mcqa_model
	- Source model: `mgatti/MNLP_M3_mcqa_model`
	- Quantization: Optimum-Quanto W4A8 (qint4 weights, qint8 activations)
	- Usage: Efficient inference for multiple-choice question answering (MCQA) tasks
	- Developer: Ahmed Abdelmalek, EPFL CS-552 2025 Project M3
	- License: MIT
	- Language(s): English
	- Hardware target: Consumer and cloud GPUs with low memory footprint

	### Model Sources

	- Repository: Private GitHub (Training script not public)
	- Paper: Not published
	- Docs: This README

	## Use Cases

	### Direct Use

	This model is optimized for fast inference in MCQA tasks under constrained VRAM settings.

	### Intended Users

	Researchers and engineers looking to deploy a small, high-performance MCQA model.

	## Limitations

	This model is quantized and may have a slight performance drop compared to full-precision models. It is not suitable for generation or tasks beyond MCQA.

	## Getting Started

	```python
	from transformers import AutoTokenizer
	from optimum.quanto.models import QuantizedModelForCausalLM

	model = QuantizedModelForCausalLM.from_pretrained("abdou-u/MNLP_M3_w4a8_quantized_mcqa_model")
	tokenizer = AutoTokenizer.from_pretrained("abdou-u/MNLP_M3_w4a8_quantized_mcqa_model")
	```

	## Technical Specifications

	- Quantization library: Optimum-Quanto
	- Weights: 4-bit (qint4)
	- Activations: 8-bit (qint8)
	- Format: Hugging Face Transformers-compatible

	## Environmental Impact

	- Hardware: A100 80GB (used during validation)
	- Quantization: 1 pass, full model (approx. 3 mins)
	- Carbon Emissions: Negligible for quantization

	## Citation

	If you use this model, please cite:

	```
	@misc{abdelmalek2025mnlp,
	title={MNLP M3 Quantized MCQA Model (W4A8)},
	author={Ahmed Abdelmalek},
	year={2025},
	howpublished={\url{https://huggingface.co/abdou-u/MNLP_M3_w4a8_quantized_mcqa_model}},
	note={CS-552 Project M3}
	}
	```

	## Contact

	Ahmed Abdelmalek - ahmed.abdelmalek@epfl.ch