--- library_name: transformers tags: [] --- # Model Card for `zay25/MNLP_M3_quantized_model` This model is a quantized version of a multiple-choice question answering (MCQA) model fine-tuned on STEM datasets. It uses Activation-aware Weight Quantization (AWQ) to reduce model size and VRAM usage while preserving strong performance. The model is well-suited for memory- and latency-constrained environments. --- ## Model Details - **Developed by**: Zeineb Mellouli (EPFL, CS-552 Project) - **Base model**: `hssawhney/Best-Performing-Model` (Qwen3-0.6B-Base) - **Quantization**: AWQ (4-bit weights, 16-bit activations) - **Architecture**: Transformer-based Causal Language Model - **Language**: English - **License**: Apache 2.0 --- ## Uses ### Direct Use This model is intended for multiple-choice question answering (MCQA) tasks, particularly in science, math, and engineering education datasets. It is optimized for inference on GPUs with limited VRAM (e.g., A10, T4, or laptop GPUs). ### Out-of-Scope Use - Not intended for open-ended or dialog generation - Not suitable for high-stakes decision-making or critical applications without human oversight ## Training Details - **Quantization method**: Post-training quantization using [AWQ (Activation-aware Weight Quantization)](https://github.com/mit-han-lab/awq) via the `awq` library - **Base model**: `hssawhney/Best-Performing-Model`, fine-tuned on MCQA-style reasoning tasks - **Quantization configuration**: - 4-bit weights (`w_bit = 4`) - Group size: 64 - Per-channel zero point: enabled - **Calibration dataset**: 512 samples from `hssawhney/Reasoning-Dataset` --- ## How to Use ```python from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained("zay25/MNLP_M3_quantized_model", trust_remote_code=True) tokenizer = AutoTokenizer.from_pretrained("zay25/MNLP_M3_quantized_model")