abdou-u commited on
Commit
c383ee8
·
verified ·
1 Parent(s): 1495683

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +79 -4
README.md CHANGED
@@ -3,7 +3,82 @@ tags:
3
  - model_hub_mixin
4
  ---
5
 
6
- This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
7
- - Code: [More Information Needed]
8
- - Paper: [More Information Needed]
9
- - Docs: [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  - model_hub_mixin
4
  ---
5
 
6
+ # Model Card for `abdou-u/MNLP_M3_w4a8_quantized_mcqa_model`
7
+
8
+ ## Summary
9
+
10
+ This model is a W4A8 (4-bit weights, 8-bit activations) quantized version of the `mgatti/MNLP_M3_mcqa_model`, obtained using [Optimum-Quanto](https://huggingface.co/docs/optimum/main/en/quanto/index). It has been pushed to the Hugging Face Hub using the `PyTorchModelHubMixin` interface.
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ - **Name**: MNLP_M3_w4a8_quantized_mcqa_model
17
+ - **Source model**: `mgatti/MNLP_M3_mcqa_model`
18
+ - **Quantization**: Optimum-Quanto W4A8 (qint4 weights, qint8 activations)
19
+ - **Usage**: Efficient inference for multiple-choice question answering (MCQA) tasks
20
+ - **Developer**: Ahmed Abdelmalek, EPFL CS-552 2025 Project M3
21
+ - **License**: MIT
22
+ - **Language(s)**: English
23
+ - **Hardware target**: Consumer and cloud GPUs with low memory footprint
24
+
25
+ ### Model Sources
26
+
27
+ - **Repository**: *Private GitHub (Training script not public)*
28
+ - **Paper**: Not published
29
+ - **Docs**: This README
30
+
31
+ ## Use Cases
32
+
33
+ ### Direct Use
34
+
35
+ This model is optimized for fast inference in MCQA tasks under constrained VRAM settings.
36
+
37
+ ### Intended Users
38
+
39
+ Researchers and engineers looking to deploy a small, high-performance MCQA model.
40
+
41
+ ## Limitations
42
+
43
+ This model is quantized and may have a slight performance drop compared to full-precision models. It is not suitable for generation or tasks beyond MCQA.
44
+
45
+ ## Getting Started
46
+
47
+ ```python
48
+ from transformers import AutoTokenizer
49
+ from optimum.quanto.models import QuantizedModelForCausalLM
50
+
51
+ model = QuantizedModelForCausalLM.from_pretrained("abdou-u/MNLP_M3_w4a8_quantized_mcqa_model")
52
+ tokenizer = AutoTokenizer.from_pretrained("abdou-u/MNLP_M3_w4a8_quantized_mcqa_model")
53
+ ```
54
+
55
+ ## Technical Specifications
56
+
57
+ - **Quantization library**: Optimum-Quanto
58
+ - **Weights**: 4-bit (qint4)
59
+ - **Activations**: 8-bit (qint8)
60
+ - **Format**: Hugging Face Transformers-compatible
61
+
62
+ ## Environmental Impact
63
+
64
+ - **Hardware**: A100 80GB (used during validation)
65
+ - **Quantization**: 1 pass, full model (approx. 3 mins)
66
+ - **Carbon Emissions**: Negligible for quantization
67
+
68
+ ## Citation
69
+
70
+ If you use this model, please cite:
71
+
72
+ ```
73
+ @misc{abdelmalek2025mnlp,
74
+ title={MNLP M3 Quantized MCQA Model (W4A8)},
75
+ author={Ahmed Abdelmalek},
76
+ year={2025},
77
+ howpublished={\url{https://huggingface.co/abdou-u/MNLP_M3_w4a8_quantized_mcqa_model}},
78
+ note={CS-552 Project M3}
79
+ }
80
+ ```
81
+
82
+ ## Contact
83
+
84
+ Ahmed Abdelmalek - ahmed.abdelmalek@epfl.ch