File size: 1,910 Bytes
a603b4b d2b2a1e a603b4b d2b2a1e a603b4b d2b2a1e a603b4b d2b2a1e a603b4b d2b2a1e a603b4b d2b2a1e a603b4b d2b2a1e a603b4b d2b2a1e a603b4b 4d62aa7 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 |
---
library_name: transformers
tags: []
---
# Model Card for `zay25/MNLP_M3_quantized_model`
This model is a quantized version of a multiple-choice question answering (MCQA) model fine-tuned on STEM datasets. It uses Activation-aware Weight Quantization (AWQ) to reduce model size and VRAM usage while preserving strong performance. The model is well-suited for memory- and latency-constrained environments.
---
## Model Details
- **Developed by**: Zeineb Mellouli (EPFL, CS-552 Project)
- **Base model**: `hssawhney/Best-Performing-Model` (Qwen3-0.6B-Base)
- **Quantization**: AWQ (4-bit weights, 16-bit activations)
- **Architecture**: Transformer-based Causal Language Model
- **Language**: English
- **License**: Apache 2.0
---
## Uses
### Direct Use
This model is intended for multiple-choice question answering (MCQA) tasks, particularly in science, math, and engineering education datasets. It is optimized for inference on GPUs with limited VRAM (e.g., A10, T4, or laptop GPUs).
### Out-of-Scope Use
- Not intended for open-ended or dialog generation
- Not suitable for high-stakes decision-making or critical applications without human oversight
## Training Details
- **Quantization method**: Post-training quantization using [AWQ (Activation-aware Weight Quantization)](https://github.com/mit-han-lab/awq) via the `awq` library
- **Base model**: `hssawhney/Best-Performing-Model`, fine-tuned on MCQA-style reasoning tasks
- **Quantization configuration**:
- 4-bit weights (`w_bit = 4`)
- Group size: 64
- Per-channel zero point: enabled
- **Calibration dataset**: 512 samples from `hssawhney/Reasoning-Dataset`
---
## How to Use
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("zay25/MNLP_M3_quantized_model", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("zay25/MNLP_M3_quantized_model")
|