--- library_name: transformers tags: [] --- # MNLP_M3_quantized_model This model is a quantized version of the best-performing MCQA model from our CS-552 Modern NLP project (Milestone 3). It was optimized for efficient inference while maintaining strong accuracy on STEM multiple-choice question answering tasks. ## Model Summary - **Base model**: [hssawhney/Best-Performing-Model](https://huggingface.co/hssawhney/Best-Performing-Model) - **Quantization type**: Post-Training Quantization (PTQ) - **Precision**: W8A8 - **Method**: SmoothQuant + GPTQ via [LLMCompressor](https://github.com/vllm-project/llm-compressor) - **Excluded layers**: `lm_head` (to preserve logits quality) - **Final model size**: ~717 MB ## Calibration Details - **Calibration dataset**: 512 samples randomly selected from [`zay25/MNLP_M3_quantized_dataset`](https://huggingface.co/datasets/zay25/MNLP_M3_quantized_dataset) - The calibration set preserves the original format (STEM MCQA) and was selected to represent a broad distribution of question types. ## Intended Use This model is intended for: - STEM-focused multiple-choice question answering - Educational assistant systems - Low-resource inference environments (e.g., CPU, edge devices) It is not intended for freeform generation or use outside the MCQA format. ## License This model inherits the license of the base model. Check the [hssawhney/Best-Performing-Model](https://huggingface.co/hssawhney/Best-Performing-Model) repo for license terms.