---
library_name: transformers
tags: []
---
# Model Card for `zay25/MNLP_M3_quantized_model`

This model is a quantized version of a multiple-choice question answering (MCQA) model fine-tuned on STEM datasets. It uses Activation-aware Weight Quantization (AWQ) to reduce model size and VRAM usage while preserving strong performance. The model is well-suited for memory- and latency-constrained environments.

---

## Model Details

- **Developed by**: Zeineb Mellouli (EPFL, CS-552 Project)
- **Base model**: `hssawhney/Best-Performing-Model` (Qwen3-0.6B-Base)
- **Quantization**: AWQ (4-bit weights, 16-bit activations)
- **Architecture**: Transformer-based Causal Language Model
- **Language**: English
- **License**: Apache 2.0

---

## Uses

### Direct Use

This model is intended for multiple-choice question answering (MCQA) tasks, particularly in science, math, and engineering education datasets. It is optimized for inference on GPUs with limited VRAM (e.g., A10, T4, or laptop GPUs).

### Out-of-Scope Use

- Not intended for open-ended or dialog generation
- Not suitable for high-stakes decision-making or critical applications without human oversight


## Training Details

- **Quantization method**: Post-training quantization using [AWQ (Activation-aware Weight Quantization)](https://github.com/mit-han-lab/awq) via the `awq` library
- **Base model**: `hssawhney/Best-Performing-Model`, fine-tuned on MCQA-style reasoning tasks
- **Quantization configuration**:
  - 4-bit weights (`w_bit = 4`)
  - Group size: 64
  - Per-channel zero point: enabled
- **Calibration dataset**: 512 samples from `hssawhney/Reasoning-Dataset`

---

## How to Use

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("zay25/MNLP_M3_quantized_model", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("zay25/MNLP_M3_quantized_model")