File size: 1,842 Bytes
44d3cb1 8f95f37 8c5d94f 44d3cb1 8f95f37 44d3cb1 8f95f37 44d3cb1 8f95f37 44d3cb1 17d20fc 44d3cb1 8f95f37 44d3cb1 17d20fc 44d3cb1 8f95f37 44d3cb1 8f95f37 17d20fc 8f95f37 44d3cb1 8f95f37 44d3cb1 8f95f37 44d3cb1 8f95f37 44d3cb1 8c5d94f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 |
---
language: en
license: apache-2.0
library_name: transformers
---
# SQFT Base Model: sqft-mistral-7b-v0.3-50-base-gptq
- Source Model: [mistralai/Mistral-7B-v0.3](https://huggingface.co/mistralai/Mistral-7B-v0.3)
- Sparse Method: [Wanda](https://github.com/locuslab/wanda)
- Sparsity: 50%
- Quantization: GPTQ-INT4
## Model Sources
**Repository:** [https://github.com/IntelLabs/Hardware-Aware-Automated-Machine-Learning/tree/main/SQFT](https://github.com/IntelLabs/Hardware-Aware-Automated-Machine-Learning/tree/main/SQFT)
**Paper:**
- [SQFT: Low-cost Model Adaptation in Low-precision Sparse Foundation Models](https://arxiv.org/abs/2410.03750)
- [Low-Rank Adapters Meet Neural Architecture Search for LLM Compression](https://arxiv.org/abs/2501.16372)
## How to get this model
Refer to the commands in [SQFT/run_command/mistral-7b-v0.3/sparse_quantization.sh](https://github.com/IntelLabs/Hardware-Aware-Automated-Machine-Learning/tree/main/SQFT/legacy/run_command/mistral-7b-v0.3/sparse_quantization.sh).
## Citation
```bash
@inproceedings{munoz-etal-2024-sqft,
title = "{SQFT}: Low-cost Model Adaptation in Low-precision Sparse Foundation Models",
author = "Munoz, Juan Pablo and
Yuan, Jinjie and
Jain, Nilesh",
editor = "Al-Onaizan, Yaser and
Bansal, Mohit and
Chen, Yun-Nung",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2024",
month = nov,
year = "2024",
address = "Miami, Florida, USA",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.findings-emnlp.749",
pages = "12817--12832",
}
```
## Acknowledgement
Thanks to the sparse algorithm [Wanda]((https://arxiv.org/abs/2306.11695)) and the quantization method [GPTQ](https://arxiv.org/abs/2210.17323).
## License
Apache-2.0 |