Upload ms-swift/docs/source_en/Instruction/Export-and-push.md with huggingface_hub
Browse files
ms-swift/docs/source_en/Instruction/Export-and-push.md
ADDED
|
@@ -0,0 +1,57 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Export and Push
|
| 2 |
+
|
| 3 |
+
## Merge LoRA
|
| 4 |
+
|
| 5 |
+
- See [here](https://github.com/modelscope/ms-swift/blob/main/examples/export/merge_lora.sh).
|
| 6 |
+
|
| 7 |
+
## Quantization
|
| 8 |
+
|
| 9 |
+
SWIFT supports quantization exports for AWQ, GPTQ, and BNB models. AWQ and GPTQ require a calibration dataset, which yields better quantization performance but takes longer to quantize. On the other hand, BNB does not require a calibration dataset and is quicker to quantize.
|
| 10 |
+
|
| 11 |
+
| Quantization Technique | Multimodal | Inference Acceleration | Continued Training |
|
| 12 |
+
| ---------------------- | ---------- | ---------------------- | ------------------ |
|
| 13 |
+
| GPTQ | ✅ | ✅ | ✅ |
|
| 14 |
+
| AWQ | ✅ | ✅ | ✅ |
|
| 15 |
+
| BNB | ❌ | ✅ | ✅ |
|
| 16 |
+
|
| 17 |
+
In addition to the SWIFT installation, the following additional dependencies need to be installed:
|
| 18 |
+
|
| 19 |
+
```shell
|
| 20 |
+
# For AWQ quantization:
|
| 21 |
+
# The versions of autoawq and CUDA are correlated; please choose the version according to `https://github.com/casper-hansen/AutoAWQ`.
|
| 22 |
+
# If there are dependency conflicts with torch, please add the `--no-deps` option.
|
| 23 |
+
pip install autoawq -U
|
| 24 |
+
|
| 25 |
+
# For GPTQ quantization:
|
| 26 |
+
# The versions of auto_gptq and CUDA are correlated; please choose the version according to `https://github.com/PanQiWei/AutoGPTQ#quick-installation`.
|
| 27 |
+
pip install auto_gptq optimum -U
|
| 28 |
+
|
| 29 |
+
# For BNB quantization:
|
| 30 |
+
pip install bitsandbytes -U
|
| 31 |
+
```
|
| 32 |
+
|
| 33 |
+
We provide a series of scripts to demonstrate SWIFT's quantization export capabilities:
|
| 34 |
+
|
| 35 |
+
- Supports [AWQ](https://github.com/modelscope/ms-swift/blob/main/examples/export/quantize/awq.sh)/[GPTQ](https://github.com/modelscope/ms-swift/blob/main/examples/export/quantize/gptq.sh)/[BNB](https://github.com/modelscope/ms-swift/blob/main/examples/export/quantize/bnb.sh) quantization exports.
|
| 36 |
+
- Multimodal quantization: Supports quantizing multimodal models using GPTQ and AWQ, with limited multimodal models supported by AWQ. Refer to [here](https://github.com/modelscope/ms-swift/tree/main/examples/export/quantize/mllm).
|
| 37 |
+
- Support for more model series: Supports quantization exports for [BERT](https://github.com/modelscope/ms-swift/tree/main/examples/export/quantize/bert) and [Reward Model](https://github.com/modelscope/ms-swift/tree/main/examples/export/quantize/reward_model).
|
| 38 |
+
- Models exported with SWIFT's quantization support inference acceleration using vllm/lmdeploy; they also support further SFT/RLHF using QLoRA.
|
| 39 |
+
|
| 40 |
+
|
| 41 |
+
## Push Model
|
| 42 |
+
|
| 43 |
+
SWIFT supports re-pushing trained/quantized models to ModelScope/Hugging Face. By default, it pushes to ModelScope, but you can specify `--use_hf true` to push to Hugging Face.
|
| 44 |
+
|
| 45 |
+
```shell
|
| 46 |
+
swift export \
|
| 47 |
+
--model output/vx-xxx/checkpoint-xxx \
|
| 48 |
+
--push_to_hub true \
|
| 49 |
+
--hub_model_id '<model-id>' \
|
| 50 |
+
--hub_token '<sdk-token>' \
|
| 51 |
+
--use_hf false
|
| 52 |
+
```
|
| 53 |
+
|
| 54 |
+
Tips:
|
| 55 |
+
|
| 56 |
+
- You can use `--model <checkpoint-dir>` or `--adapters <checkpoint-dir>` to specify the checkpoint directory to be pushed. There is no difference between these two methods in the model pushing scenario.
|
| 57 |
+
- When pushing to ModelScope, you need to make sure you have registered for a ModelScope account. Your SDK token can be obtained from [this page](https://www.modelscope.cn/my/myaccesstoken). Ensure that the account associated with the SDK token has edit permissions for the organization corresponding to the model_id. The model pushing process will automatically create a model repository corresponding to the model_id (if it does not already exist), and you can use `--hub_private_repo true` to automatically create a private model repository.
|