abdou-u
/

MNLP_M3_w4a8_quantized_mcqa_model

model_hub_mixin

8-bit precision

Model card Files Files and versions

abdou-u commited on Jun 8, 2025

Commit

c383ee8

·

verified ·

1 Parent(s): 1495683

Update README.md

Files changed (1) hide show

README.md +79 -4

README.md CHANGED Viewed

@@ -3,7 +3,82 @@ tags:
 - model_hub_mixin
 ---
-This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
-- Code: [More Information Needed]
-- Paper: [More Information Needed]
-- Docs: [More Information Needed]

 - model_hub_mixin
 ---
+# Model Card for `abdou-u/MNLP_M3_w4a8_quantized_mcqa_model`
+## Summary
+This model is a W4A8 (4-bit weights, 8-bit activations) quantized version of the `mgatti/MNLP_M3_mcqa_model`, obtained using [Optimum-Quanto](https://huggingface.co/docs/optimum/main/en/quanto/index). It has been pushed to the Hugging Face Hub using the `PyTorchModelHubMixin` interface.
+## Model Details
+### Model Description
+- **Name**: MNLP_M3_w4a8_quantized_mcqa_model
+- **Source model**: `mgatti/MNLP_M3_mcqa_model`
+- **Quantization**: Optimum-Quanto W4A8 (qint4 weights, qint8 activations)
+- **Usage**: Efficient inference for multiple-choice question answering (MCQA) tasks
+- **Developer**: Ahmed Abdelmalek, EPFL CS-552 2025 Project M3
+- **License**: MIT
+- **Language(s)**: English
+- **Hardware target**: Consumer and cloud GPUs with low memory footprint
+### Model Sources
+- **Repository**: *Private GitHub (Training script not public)*
+- **Paper**: Not published
+- **Docs**: This README
+## Use Cases
+### Direct Use
+This model is optimized for fast inference in MCQA tasks under constrained VRAM settings.
+### Intended Users
+Researchers and engineers looking to deploy a small, high-performance MCQA model.
+## Limitations
+This model is quantized and may have a slight performance drop compared to full-precision models. It is not suitable for generation or tasks beyond MCQA.
+## Getting Started
+```python
+from transformers import AutoTokenizer
+from optimum.quanto.models import QuantizedModelForCausalLM
+model = QuantizedModelForCausalLM.from_pretrained("abdou-u/MNLP_M3_w4a8_quantized_mcqa_model")
+tokenizer = AutoTokenizer.from_pretrained("abdou-u/MNLP_M3_w4a8_quantized_mcqa_model")
+```
+## Technical Specifications
+- **Quantization library**: Optimum-Quanto
+- **Weights**: 4-bit (qint4)
+- **Activations**: 8-bit (qint8)
+- **Format**: Hugging Face Transformers-compatible
+## Environmental Impact
+- **Hardware**: A100 80GB (used during validation)
+- **Quantization**: 1 pass, full model (approx. 3 mins)
+- **Carbon Emissions**: Negligible for quantization
+## Citation
+If you use this model, please cite:
+```
+@misc{abdelmalek2025mnlp,
+  title={MNLP M3 Quantized MCQA Model (W4A8)},
+  author={Ahmed Abdelmalek},
+  year={2025},
+  howpublished={\url{https://huggingface.co/abdou-u/MNLP_M3_w4a8_quantized_mcqa_model}},
+  note={CS-552 Project M3}
+}
+```
+## Contact
+Ahmed Abdelmalek - ahmed.abdelmalek@epfl.ch