|
|
--- |
|
|
library_name: transformers |
|
|
tags: [] |
|
|
--- |
|
|
# Model Card for `zay25/MNLP_M3_quantized_model` |
|
|
|
|
|
This model is a quantized version of a multiple-choice question answering (MCQA) model fine-tuned on STEM datasets. It uses Activation-aware Weight Quantization (AWQ) to reduce model size and VRAM usage while preserving strong performance. The model is well-suited for memory- and latency-constrained environments. |
|
|
|
|
|
--- |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Developed by**: Zeineb Mellouli (EPFL, CS-552 Project) |
|
|
- **Base model**: `hssawhney/Best-Performing-Model` (Qwen3-0.6B-Base) |
|
|
- **Quantization**: AWQ (4-bit weights, 16-bit activations) |
|
|
- **Architecture**: Transformer-based Causal Language Model |
|
|
- **Language**: English |
|
|
- **License**: Apache 2.0 |
|
|
|
|
|
--- |
|
|
|
|
|
## Uses |
|
|
|
|
|
### Direct Use |
|
|
|
|
|
This model is intended for multiple-choice question answering (MCQA) tasks, particularly in science, math, and engineering education datasets. It is optimized for inference on GPUs with limited VRAM (e.g., A10, T4, or laptop GPUs). |
|
|
|
|
|
### Out-of-Scope Use |
|
|
|
|
|
- Not intended for open-ended or dialog generation |
|
|
- Not suitable for high-stakes decision-making or critical applications without human oversight |
|
|
|
|
|
|
|
|
## Training Details |
|
|
|
|
|
- **Quantization method**: Post-training quantization using [AWQ (Activation-aware Weight Quantization)](https://github.com/mit-han-lab/awq) via the `awq` library |
|
|
- **Base model**: `hssawhney/Best-Performing-Model`, fine-tuned on MCQA-style reasoning tasks |
|
|
- **Quantization configuration**: |
|
|
- 4-bit weights (`w_bit = 4`) |
|
|
- Group size: 64 |
|
|
- Per-channel zero point: enabled |
|
|
- **Calibration dataset**: 512 samples from `hssawhney/Reasoning-Dataset` |
|
|
|
|
|
--- |
|
|
|
|
|
## How to Use |
|
|
|
|
|
```python |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
|
|
model = AutoModelForCausalLM.from_pretrained("zay25/MNLP_M3_quantized_model", trust_remote_code=True) |
|
|
tokenizer = AutoTokenizer.from_pretrained("zay25/MNLP_M3_quantized_model") |
|
|
|