|
|
--- |
|
|
library_name: transformers |
|
|
tags: [] |
|
|
--- |
|
|
# MNLP_M3_quantized_model |
|
|
|
|
|
This model is a quantized version of the best-performing MCQA model from our CS-552 Modern NLP project (Milestone 3). It was optimized for efficient inference while maintaining strong accuracy on STEM multiple-choice question answering tasks. |
|
|
|
|
|
## Model Summary |
|
|
|
|
|
- **Base model**: [hssawhney/Best-Performing-Model](https://huggingface.co/hssawhney/Best-Performing-Model) |
|
|
- **Quantization type**: Post-Training Quantization (PTQ) |
|
|
- **Precision**: W8A8 |
|
|
- **Method**: SmoothQuant + GPTQ via [LLMCompressor](https://github.com/vllm-project/llm-compressor) |
|
|
- **Excluded layers**: `lm_head` (to preserve logits quality) |
|
|
- **Final model size**: ~717 MB |
|
|
|
|
|
## Calibration Details |
|
|
|
|
|
- **Calibration dataset**: 512 samples randomly selected from [`zay25/MNLP_M3_quantized_dataset`](https://huggingface.co/datasets/zay25/MNLP_M3_quantized_dataset) |
|
|
- The calibration set preserves the original format (STEM MCQA) and was selected to represent a broad distribution of question types. |
|
|
|
|
|
|
|
|
## Intended Use |
|
|
|
|
|
This model is intended for: |
|
|
- STEM-focused multiple-choice question answering |
|
|
- Educational assistant systems |
|
|
- Low-resource inference environments (e.g., CPU, edge devices) |
|
|
|
|
|
It is not intended for freeform generation or use outside the MCQA format. |
|
|
|
|
|
## License |
|
|
|
|
|
This model inherits the license of the base model. Check the [hssawhney/Best-Performing-Model](https://huggingface.co/hssawhney/Best-Performing-Model) repo for license terms. |
|
|
|