zay25's picture
update readme
24f020d verified
---
library_name: transformers
tags: []
---
# MNLP_M3_quantized_model
This model is a quantized version of the best-performing MCQA model from our CS-552 Modern NLP project (Milestone 3). It was optimized for efficient inference while maintaining strong accuracy on STEM multiple-choice question answering tasks.
## Model Summary
- **Base model**: [hssawhney/Best-Performing-Model](https://huggingface.co/hssawhney/Best-Performing-Model)
- **Quantization type**: Post-Training Quantization (PTQ)
- **Precision**: W8A8
- **Method**: SmoothQuant + GPTQ via [LLMCompressor](https://github.com/vllm-project/llm-compressor)
- **Excluded layers**: `lm_head` (to preserve logits quality)
- **Final model size**: ~717 MB
## Calibration Details
- **Calibration dataset**: 512 samples randomly selected from [`zay25/MNLP_M3_quantized_dataset`](https://huggingface.co/datasets/zay25/MNLP_M3_quantized_dataset)
- The calibration set preserves the original format (STEM MCQA) and was selected to represent a broad distribution of question types.
## Intended Use
This model is intended for:
- STEM-focused multiple-choice question answering
- Educational assistant systems
- Low-resource inference environments (e.g., CPU, edge devices)
It is not intended for freeform generation or use outside the MCQA format.
## License
This model inherits the license of the base model. Check the [hssawhney/Best-Performing-Model](https://huggingface.co/hssawhney/Best-Performing-Model) repo for license terms.