---
library_name: transformers
tags: []
---
# MNLP_M3_quantized_model

This model is a quantized version of the best-performing MCQA model from our CS-552 Modern NLP project (Milestone 3). It was optimized for efficient inference while maintaining strong accuracy on STEM multiple-choice question answering tasks.

## Model Summary

- **Base model**: [hssawhney/Best-Performing-Model](https://huggingface.co/hssawhney/Best-Performing-Model)
- **Quantization type**: Post-Training Quantization (PTQ)
- **Precision**: W8A8
- **Method**: SmoothQuant + GPTQ via [LLMCompressor](https://github.com/vllm-project/llm-compressor)
- **Excluded layers**: `lm_head` (to preserve logits quality)
- **Final model size**: ~717 MB

## Calibration Details

- **Calibration dataset**: 512 samples randomly selected from [`zay25/MNLP_M3_quantized_dataset`](https://huggingface.co/datasets/zay25/MNLP_M3_quantized_dataset)
- The calibration set preserves the original format (STEM MCQA) and was selected to represent a broad distribution of question types.


## Intended Use

This model is intended for:
- STEM-focused multiple-choice question answering
- Educational assistant systems
- Low-resource inference environments (e.g., CPU, edge devices)

It is not intended for freeform generation or use outside the MCQA format.

## License

This model inherits the license of the base model. Check the [hssawhney/Best-Performing-Model](https://huggingface.co/hssawhney/Best-Performing-Model) repo for license terms.