Student0809 commited on Jun 6, 2025

Commit

511365c

verified ·

1 Parent(s): 6586463

Add files using upload-large-folder tool

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

ms-swift/.dev_scripts/build_docs.sh +8 -0
ms-swift/.github/ISSUE_TEMPLATE/bug_report.md +19 -0
ms-swift/.github/workflows/lint.yaml +22 -0
ms-swift/dataset_OSST.json +0 -0
ms-swift/dataset_Overlap.json +0 -0
ms-swift/dataset_new_3.json +0 -0
ms-swift/docs/source/BestPractices/Embedding训练.md +97 -0
ms-swift/docs/source/BestPractices/GRPO完整流程.md +179 -0
ms-swift/docs/source/BestPractices/NPU支持.md +225 -0
ms-swift/docs/source/Customization/自定义数据集.md +241 -0
ms-swift/docs/source/GetStarted/SWIFT安装.md +87 -0
ms-swift/docs/source/GetStarted/快速开始.md +103 -0
ms-swift/docs/source/Instruction/命令行参数.md +660 -0
ms-swift/docs/source/Instruction/导出与推送.md +55 -0
ms-swift/docs/source/_templates/autosummary/class.rst +10 -0
ms-swift/docs/source_en/BestPractices/Embedding.md +97 -0
ms-swift/docs/source_en/BestPractices/GRPO-Code-Training.md +136 -0
ms-swift/docs/source_en/BestPractices/GRPO-Multi-Modal-Training.md +349 -0
ms-swift/docs/source_en/BestPractices/GRPO.md +298 -0
ms-swift/docs/source_en/BestPractices/Rapidly-Training-VL-model.md +228 -0
ms-swift/docs/source_en/Customization/Custom-dataset.md +255 -0
ms-swift/docs/source_en/GetStarted/Quick-start.md +105 -0
ms-swift/docs/source_en/Instruction/Export-and-push.md +57 -0
ms-swift/docs/source_en/Instruction/ReleaseNote3.0.md +93 -0
ms-swift/docs/source_en/_templates/classtemplate.rst +12 -0
ms-swift/docs/source_en/index.rst +61 -0
ms-swift/examples/app/mllm.sh +13 -0
ms-swift/examples/deploy/bert/server.sh +10 -0
ms-swift/examples/deploy/client/llm/base/openai_client.py +41 -0
ms-swift/examples/deploy/client/llm/chat/swift_client.py +60 -0
ms-swift/examples/deploy/client/mllm/openai_client.py +97 -0
ms-swift/examples/deploy/client/mllm/swift_client.py +127 -0
ms-swift/examples/deploy/lora/client.py +27 -0
ms-swift/examples/deploy/server/README.md +9 -0
ms-swift/examples/eval/llm/eval.sh +7 -0
ms-swift/examples/export/merge_lora.sh +5 -0
ms-swift/examples/export/quantize/mllm/gptq.sh +18 -0
ms-swift/examples/export/quantize/moe/gptq.sh +13 -0
ms-swift/examples/export/quantize/omni/gptq.sh +18 -0
ms-swift/examples/export/quantize/reward_model/bnb.sh +12 -0
ms-swift/examples/export/quantize/reward_model/gptq.sh +13 -0
ms-swift/examples/infer/cli_demo.sh +6 -0
ms-swift/examples/infer/demo.py +73 -0
ms-swift/examples/infer/demo_agent.py +118 -0
ms-swift/examples/infer/demo_bert.py +53 -0
ms-swift/examples/infer/demo_hf.py +61 -0
ms-swift/examples/infer/demo_lora.py +68 -0
ms-swift/examples/infer/demo_mllm.py +145 -0
ms-swift/examples/infer/pt/batch_ddp.sh +9 -0
ms-swift/examples/infer/pt/bert.sh +8 -0

ms-swift/.dev_scripts/build_docs.sh ADDED Viewed

	@@ -0,0 +1,8 @@

+pip install -r requirements/docs.txt
+cd docs
+rm -rf build
+# update api rst
+#rm -rf source/api/
+#sphinx-apidoc --module-first -o source/api/ ../modelscope/
+make html

ms-swift/.github/ISSUE_TEMPLATE/bug_report.md ADDED Viewed

	@@ -0,0 +1,19 @@

+---
+name: Bug report
+about: Create a report to help us improve
+title: ''
+labels: ''
+assignees: ''
+---
+**Describe the bug**
+What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程，最好有截图)
+**Your hardware and system info**
+Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息，如CUDA版本，系统，GPU型号和torch版本等)
+**Additional context**
+Add any other context about the problem here(在这里补充其他信息)

ms-swift/.github/workflows/lint.yaml ADDED Viewed

	@@ -0,0 +1,22 @@

+name: Lint test
+on: [push, pull_request]
+concurrency:
+  group: ${{ github.workflow }}-${{ github.ref }}
+  cancel-in-progress: true
+jobs:
+  lint:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v2
+      - name: Set up Python 3.10
+        uses: actions/setup-python@v2
+        with:
+          python-version: '3.10'
+      - name: Install pre-commit hook
+        run: |
+          pip install pre-commit
+      - name: Linting
+        run: pre-commit run --all-files

ms-swift/dataset_OSST.json ADDED Viewed

The diff for this file is too large to render. See raw diff

ms-swift/dataset_Overlap.json ADDED Viewed

The diff for this file is too large to render. See raw diff

ms-swift/dataset_new_3.json ADDED Viewed

The diff for this file is too large to render. See raw diff

ms-swift/docs/source/BestPractices/Embedding训练.md ADDED Viewed

	@@ -0,0 +1,97 @@

+# Embedding训练
+SWIFT已经支持Embedding模型的训练，包括纯文本和多模态两个类型。目前已经支持的模型有：
+1. modernbert embedding模型
+   - [ModelScope](https://modelscope.cn/models/iic/gte-modernbert-base) [Hugging Face](https://huggingface.co/Alibaba-NLP/gte-modernbert-base)
+2. gte embedding模型
+   - 1.5B: [ModelScope](https://www.modelscope.cn/models/iic/gte_Qwen2-1.5B-instruct) [Hugging Face](https://huggingface.co/Alibaba-NLP/gte-Qwen2-1.5B-instruct)
+   - 7B: [ModelScope](https://www.modelscope.cn/models/iic/gte_Qwen2-7B-instruct) [Hugging Face](https://huggingface.co/Alibaba-NLP/gte-Qwen2-7B-instruct)
+3. gme embedding模型
+   - 2B: [ModelScope](https://www.modelscope.cn/models/iic/gme-Qwen2-VL-2B-Instruct) [Hugging Face](https://huggingface.co/Alibaba-NLP/gme-Qwen2-VL-2B-Instruct)
+   - 7B: [ModelScope](https://www.modelscope.cn/models/iic/gme-Qwen2-VL-7B-Instruct) [Hugging Face](https://huggingface.co/Alibaba-NLP/gme-Qwen2-VL-7B-Instruct)
+开发者可以自行集成自己的模型，模型forward输出值需要满足：
+```json
+{"last_hidden_state": some-embedding-tensor}
+```
+返回值是一个json，具有`last_hidden_state` key，value是embedding tensor即可，输入部分可以使用我们已经支持的template。
+需要注意的是，SWIFT目前支持的embedding模型均为符合纯文本或多模态LLM，目前并不支持CLIP类型的模型训练。
+此外，SWIFT支持的所有embedding模型在模型forward最后都增加了normalize，如自行增加新模型请注意增加normalize层。
+## loss
+目前SWIFT支持的Embedding模型可以使用的loss有：
+- cosine_similarity: cosine相似度loss，计算两个embedding的相似度，并根据label的值拟合，实际为MSE loss
+- contrastive: 可调margin的对比学习loss，label仅支持0和1两个值
+- online_contrastive: 考虑hard negative和hard positive部分的contrastive loss，label仅支持0和1两个值
+- infonce: 在同一个batch中不同row两两计算cosine相似度，并使row内部相似度最大，不同row相似度最小，不需要label
+loss的源代码可以在[这里](https://github.com/modelscope/ms-swift/blob/main/swift/plugin/loss.py)找到。
+## 数据集格式
+> 注：
+> 1. 下面的多模态部分<image>标签可以出现在query/response/rejected_response的任意位置，只需要标签数量和images的值数量相等即可
+> 2. 标签和images的对应顺序为先对应query中的<image>标签，然后是response中的，之后按顺序解析rejected_response中的
+> 3. query代表anchor sample，response代表positive sample或对比sample，rejected_response是hard negative samples
+> 4. 也支持<video>, <audio>标签，即天然支持video和audio的embedding
+### cosine_similarity loss对应的格式
+```json lines
+# LLM
+{"query": "sentence1", "response":  "sentence2", "label": 0.8}
+# MLLM
+{"query": "<image>", "response":  "<image>sentence", "images": ["/some/images1.jpg", "/some/images2.jpg"], "label": 0.7}
+{"query": "sentence1", "response":  "<image>sentence2", "images": ["/some/images1.jpg"], "label": 0.7}
+```
+### contrastive/online_contrastive loss对应的格式
+```json lines
+# LLM
+{"query": "sentence1", "response":  "sentence2", "label": 1}
+# MLLM
+{"query": "<image>", "response":  "sentence", "images": "/some/images.jpg", "label": 1}
+{"query": "<image>sentence1", "response":  "sentence2", "images": "/some/images.jpg", "label": 0}
+```
+评测的指标分别是两个embedding的欧式距离、点积等的pearson系数以及spearman系数，共八个指标。
+### infonce 格式
+```json lines
+# LLM
+{"query": "sentence1", "response":  "sentence2"}
+# MLLM
+{"query": "<image>", "response":  "sentence", "images": "/some/images.jpg"}
+{"query": "<image>sentence1", "response":  "<image>sentence2", "rejected_response": ["<image>sentence1", "<image>sentence2"], "images": ["/some/images.jpg", "/some/images.jpg", "/some/images.jpg", "/some/images.jpg"]}
+```
+infonce loss支持几个环境变量：
+1. INFONCE_TEMPERATURE temperature参数，不设置的话默认值是0.01
+2. INFONCE_USE_BATCH 使用sample内部的rejected_response（hard negative样例）还是使用一个batch的所有responses，默认为True代表使用batch内部的responses
+3. INFONCE_HARD_NEGATIVES hard negatives的数量，如果不设置会使用rejected_response的所有samples，由于长度未必一致，因此会采用for循环计算loss（计算会慢），如果设置为某个数值，则如果不够会对缺失数量进行随机采样，超长会选用前`INFONCE_HARD_NEGATIVES`个
+4. INFONCE_MASK_FAKE_NEGATIVE mask掉假negative。默认为False，开启时会判断positive sample的similarity+0.1，比该值大的sample的similarity会被设置为-inf，防止positive sample泄露问题
+> 也可以在数据集中将hard negatives数量设置为数量相等，这样即使不设置也不会使用for循环方式，加快计算速度
+> rejected_response也可以没有，这种情况下INFONCE_USE_BATCH保持为True，会使用一个batch内部的其他samples作为rejected responses
+infonce loss的评测会有下面几个指标：
+- mean_neg 所有hard_negative的平均值
+- mean_pos 所有positive的平均值
+- margin positive-max_hard_negative的平均值
+## 脚手架
+SWIFT提供了两个脚手架训练脚本：
+- [gte模型](https://github.com/tastelikefeet/swift/blob/main/examples/train/embedding/train_gte.sh)
+- [gme模型](https://github.com/tastelikefeet/swift/blob/main/examples/train/embedding/train_gme.sh)

ms-swift/docs/source/BestPractices/GRPO完整流程.md ADDED Viewed

	@@ -0,0 +1,179 @@

+# GRPO完整实验流程
+本文从较为简单的数学任务 Coundown Game 出发，从数据集定义、奖励函数定义和GRPO训练几个步骤介绍完整的GRPO训练流程。任务定义和训练参数等参考了 [mini-deepseek-r1](https://github.com/philschmid/deep-learning-pytorch-huggingface/blob/main/training/mini-deepseek-r1-aha-grpo.ipynb)。
+## 任务与数据集定义
+Coundown Game 的任务目标是根据给定的几个数字和加减乘除四种运算，得到目标数字，因此，我们定义数据集如下：
+```python
+class CoundownTaskPreprocessor(ResponsePreprocessor):
+    def preprocess(self, row: Dict[str, Any]) -> Dict[str, Any]:
+        numbers = row['nums']
+        target = row.pop('response', None)
+        query = f"""
+        Using the numbers {numbers}, create an equation that equals {target}.
+        You can use basic arithmetic operations (+, -, *, /) and each number can only be used once.
+        Show your work in <think> </think> tags. And return the final equation and answer in <answer> </answer> tags,
+        for example <answer> (1 + 2) / 3 * 4 = 4 </answer>.
+        """
+        row.update({'target': target, 'query': query})
+        return super().preprocess(row)
+register_dataset(
+    DatasetMeta(
+        ms_dataset_id='zouxuhong/Countdown-Tasks-3to4',
+        subsets=['default'],
+        preprocess_func=CoundownTaskPreprocessor(),
+        tags=['math']))
+```
+通过 template， 使用 numbers 和 target 完成任务定义，并给到 query 字段供模型采样使用。同时，我们需要保留 nums 和 target 两个字段，用于后续的奖励函数计算。
+## 奖励函数定义：
+本任务使用的奖励函数有两个，一个是 Deepseek-R1 中提到的格式奖励函数，另一是 Coundown Game 的准确性奖励函数。前者已经在swift中内置，通过 `--reward_funcs format` 可以直接使用，而后者需要我们自己定义，在这里我们使用 external_plugin 的方式定义准确性奖励函数，将代码放在`swift/examples/train/grpo/plugin/plugin.py`中。
+在这里，奖励函数的输入包括 completions、target 和 nums 三个字段，分别表示模型生成的文本、目标答案和可用的数字。每个都是list，支持多个 completion 同时计算。注意，在这里，除了 completions 之外的参数都是数据集中定义的字段透传而来，如果有任务上的变动，可以分别对数据集和奖励函数做对应的改变即可。
+```python
+class CountdownORM(ORM):
+    def __call__(self, completions, target, nums, **kwargs) -> List[float]:
+        """
+        Evaluates completions based on Mathematical correctness of the answer
+        Args:
+            completions (list[str]): Generated outputs
+            target (list[str]): Expected answers
+            nums (list[str]): Available numbers
+        Returns:
+            list[float]: Reward scores
+        """
+        rewards = []
+        for completion, gt, numbers in zip(completions, target, nums):
+            try:
+                # Check if the format is correct
+                match = re.search(r"<answer>(.*?)<\/answer>", completion)
+                if match is None:
+                    rewards.append(0.0)
+                    continue
+                # Extract the "answer" part from the completion
+                equation = match.group(1).strip()
+                if '=' in equation:
+                    equation = equation.split('=')[0]
+                # Extract all numbers from the equation
+                used_numbers = [int(n) for n in re.findall(r'\d+', equation)]
+                # Check if all numbers are used exactly once
+                if sorted(used_numbers) != sorted(numbers):
+                    rewards.append(0.0)
+                    continue
+                # Define a regex pattern that only allows numbers, operators, parentheses, and whitespace
+                allowed_pattern = r'^[\d+\-*/().\s]+$'
+                if not re.match(allowed_pattern, equation):
+                    rewards.append(0.0)
+                    continue
+                # Evaluate the equation with restricted globals and locals
+                result = eval(equation, {"__builti'ns__": None}, {})
+                # Check if the equation is correct and matches the ground truth
+                if abs(float(result) - float(gt)) < 1e-5:
+                    rewards.append(1.0)
+                else:
+                    rewards.append(0.0)
+            except Exception as e:
+                # If evaluation fails, reward is 0
+                rewards.append(0.0)
+        return rewards
+orms['external_countdown'] = CountdownORM
+```
+## GRPO训练实验记录
+首先贴上GRPO的公式：
+$$
+\begin{aligned}
+\mathcal{J}_{G R P O}(\theta) & =\mathbb{E}\left[q \sim P(Q),\left\{o_i\right\}_{i=1}^G \sim \pi_{\theta_{o l d}}(O \mid q)\right] \\
+& \frac{1}{G} \sum_{i=1}^G \frac{1}{\left|o_i\right|} \sum_{t=1}^{\left|o_i\right|}\left\{\min \left[\frac{\pi_\theta\left(o_{i, t} \mid q, o_{i,<t}\right)}{\pi_{\theta_{o l d}}\left(o_{i, t} \mid q, o_{i,<t}\right)} \hat{A}_{i, t}, \operatorname{clip}\left(\frac{\pi_\theta\left(o_{i, t} \mid q, o_{i,<t}\right)}{\pi_{\theta_{o l d}}\left(o_{i, t} \mid q, o_{i,<t}\right)}, 1-\varepsilon, 1+\varepsilon\right) \hat{A}_{i, t}\right]-\beta \mathbb{D}_{K L}\left[\pi_\theta| | \pi_{r e f}\right]\right\}
+\end{aligned}
+$$
+### 训练参数：
+我们选取 Qwen2.5-3B-Instruct 作为基础模型进行训练，选取 Instruct 而不是基模的主要原因是可以更快地获取 format reward。我们在三卡 GPU 上进行实验，因此vllm的推理部署在最后一张卡上，而进程数设置为2，在剩下两张卡上进行梯度更新。
+由于任务较为简单，我们设置 max_completion_length 和 vllm_max_model_len 为1024，如果有更复杂的任务，可以适当加大模型输出长度，但请注意，**这两个参数越大，模型训练需要的显存越多，训练速度越慢，单个step的训练时间与max_completion_length呈现线性关系**。
+在我们的实验中，总batch_size为 $num\_processes \times per\_device\_train\_batch\_size \times gradient\_accumulation\_steps = 2 \times 8 \times 8 = 128$ 而参数设置有一个限制，即：$num\_processes \times per\_device\_train\_batch\_size$ 必须整除 $num\_generations$，其中，$num\_generations$就是GRPO公式中的 $G$，故我们设置为8。 注意，这里单卡batch_size设置也与显存息息相关，请根据显存上限设置一个合适的值。 同时，还有一个公式，即总的steps数量 :$num\_steps = epochs \times len(datasets) \times num\_generations \div batch\_size $，需要根据这个来合理规划训练的学习率和warmup设置。
+最后比较重要的设置是学习率和 beta，学习率比较好理解，而beta则是是以上公式的 $\beta$，即KL散度的梯度的权重。这两个参数设置的越大，模型收敛原则上更快，但训练往往会不稳定。经过实验，我们分别设置为 `5e-7` 和 `0.001`。在实际训练中，请根据是否出现不稳定的震荡情况适当调整这两个参数。
+对于KL散度，社区有很多的讨论，可以参考[为什么GRPO坚持用KL散度](https://zhuanlan.zhihu.com/p/25862547100)。
+其他参数的设置，没有做太多探讨，所以这里不进行详细说明。
+```bash
+CUDA_VISIBLE_DEVICES=0,1,2 \
+WANDB_API_KEY=your_wandb_key \
+NPROC_PER_NODE=2 \
+swift rlhf \
+    --rlhf_type grpo \
+    --model Qwen/Qwen2.5-3B-Instruct \
+    --external_plugins examples/train/grpo/plugin/plugin.py \
+    --reward_funcs external_countdown format \
+    --use_vllm true \
+    --vllm_device auto \
+    --vllm_gpu_memory_utilization 0.6 \
+    --train_type full \
+    --torch_dtype bfloat16 \
+    --dataset 'zouxuhong/Countdown-Tasks-3to4#50000' \
+    --max_length 2048 \
+    --max_completion_length 1024 \
+    --num_train_epochs 1 \
+    --per_device_train_batch_size 8 \
+    --per_device_eval_batch_size 8 \
+    --learning_rate 5e-7 \
+    --gradient_accumulation_steps 8 \
+    --eval_steps 500 \
+    --save_steps 100 \
+    --save_total_limit 20 \
+    --logging_steps 1 \
+    --output_dir output/GRPO_COUNTDOWN \
+    --warmup_ratio 0.01 \
+    --dataloader_num_workers 4 \
+    --num_generations 8 \
+    --temperature 1.0 \
+    --system 'You are a helpful assistant. You first thinks about the reasoning process in the mind and then provides the user with the answer.' \
+    --deepspeed zero3 \
+    --log_completions true \
+    --vllm_max_model_len 1024 \
+    --report_to wandb \
+    --beta 0.001 \
+    --num_iterations 1
+```
+### 实验现象
+![image.png](../../resources/grpo_countdown.png)
+从训练了2000 step的实验结果来看，有几点观察：
+1. 自定义的CoundownORM在不断增加，证明模型学习到了如何完成这一任务，最终任务成功率在0.5左右。
+2. Format Reward整体处在较高的水平，在0.9-1左右震荡。
+3. reward_std 最终在0.2-0.3左右，仍未完全收敛，模型仍然有继续提升reward的空间。
+4. 观察模型输出可以发现，相当一部分失败案例是由于模型简单加减乘除做不对，因此模型最终的收敛方向也变成了逐步运算上，如果换成更大的模型，可能有不同的收敛方向。
+5. completion length呈现先下降再增加，在200 step左右最低，然后逐渐上升到350左右。证明模型经历了思考方式的转变。
+6. 模型的思考方式在不断变化，最终模型发现做排列组合并逐步运算能更好地完成这一任务，最终的采样输出也大多是这一格式，以下是不同step的completion变化：
+step 1:
+```
+"<think>\nWe need to create an equation using the given numbers [70, 5, 42, 30] exactly once and make it equal to 77. \n\nFirst, let's consider the possibility of multiplication or division to get closer to 77. Notice that 77 is close to 70 and 42. If we can use these numbers together, it might work.\n\nMultiplying 70 by 1.1 would give us approximately 77, which is very close. So, let's try to make 1.1 using the remaining numbers 5 and 30. \nIf we take 30/5 = 6, and then 5/6 = 0.8333..., this isn't useful here, so let's look at other possibilities.\n\nNext, let's consider if there's another way to achieve 77 using addition, subtraction, or another combination of our numbers. Given the numbers provided, it seems that multiplication or division will be key. \n\nWe can try subtracting 5 from 30 to get 25. Then, if we divide 42 by 25, we don't get a whole number, but we can round it to the nearest whole number, 1. Now, we need to make the equation fit 77. Let's multiply 25 by 3 and add 7 (since 77 = 25*3 + 7). However, we only have one 7 available and we need to find a way to use the remaining numbers to get 7.\n\nLet's try:  \n70 - 42 = 28  \nThen, 28 + 5 = 33  \nNext, 33 + 30 = 63  \nFinally, 77 - 63 = 14, which doesn't help.\n\nConsidering different combinations, one viable way is:\n(70 - 42) * (30 / 5) = 28 * 6 = 168 / 2 = 84 / 2 = 42 / 2 = 21 / 3 = 7\n\nSo, (70 - 42) * (30 / 5) = 77.\n</think>\n<answer>(70 - 42) * (30 / 5) = 77</answer>"
+```
+step 200:
+```
+"<think>\nWe need to use all the given numbers in a way that the resultant value of the calculation is 99. Given these restrictions, it's clear we should look into how the numbers relate and the mathematical operations that will bring them together to meet our condition.\n\nFirstly, let's evaluate the sums, differences, and multiplication/division among the numbers and their relations to see how this could lead us towards an outcome of 99.\n- Summing the three numbers: 96 + 70 + 73 = 239, which is far from 99. This shows addition isn't practical.\n- Subtracting numbers: For instance, subtracting the lowest two (i.e., 73 - 70 = 3) and combining with the highest one (96) does not work. Also, there is no other combination of subtraction that would result in the number needing to add up to 99.\n- Multiplying or dividing the numbers alone wouldn’t reach to any near solution since none of the products or division results of the numbers themselves is very close to 99. We might think more carefully about the arrangement of them using a combination of the operations provided.\nTherefore, considering the possible arrangement, a combination like taking (73 - 70 + 96) may produce an effective solution as this will sum the values in such a manner that it aligns with the goal of 99.\n\nSo we conclude that the operation: 73 - 70 + 96 gives the right sequence to equal 99.\n</think>\n\n<answer>73 - 70 + 96 = 99</answer>"
+```
+step 500:
+```
+"<think>\nLet's begin by exploring various possible combinations of the given numbers using the arithmetic operators we have access to.\n\n- If I add the numbers, I get: 56 + 73 + 51 = 180 which is not equal to our target number (68)\n- If I subtract one number from another I will still exceed our target value as well since the maximum difference is: |(difference between max-min):| | (73 - 51) or | (73 - 56), etc.|\n- Let's multiply the numbers individually and see if the division operation can bring us closer to the goal:\n    - (73 * 51) - 56 = 3723 - 56 = 3667 which isn't the goal either.\n    - The only operator left is addition and subtraction; I'll try the combination: \n    ((73 - 56) + 51) gives me (17 + 51) = 68.\n    \nThus, our solution fits the provided problem constraints and meets the required outcome.\n</think>\n<answer>(73 - 56) + 51 = 68</answer>"
+```
+step 1000:
+```
+"<think>\nWe need to form an expression using the given numbers (50, 86, and 49) such that it equals 85. Let's try different combinations:\n\n(86 - 50) + 49 = 76 + 49 = 125 (too big)\n(86 - 49) + 50 = 37 + 50 = 87 (too big)\n(86 - 50) + 49 = 37 + 49 = 86 (too big)\n(86 + 50) - 49 = 136 - 49 = 87 (too big)\n(86 - 50) + (49 - x)? Not working!\n(50 + 86) - 49: this is 136 which isn't our aim.\n50 + (86 - 49) = 50 + 37 = 87 which wasn't correct.\n50 - 86 + 49 = -16 + 49: This doesn't match up as well.\n86 - (50 + 49) = 86 - 99 which doesn't work too.\n\nWe try one more:\n(86 + 49) - 50 = 135 - 50 = 85!\n\nThat's it!\n</think>\n\n<answer>(86 + 49) - 50 = 85</answer>"
+```
+step 2000:
+```
+"<think>\nWe need to form an expression using the numbers 58, 9, 78, and 44 which evaluates to 73. Let's try different combinations:\n(78 - 58) + (9 + 44) = 10 + 53 = 63 (too low)\n(78 - 58) + (9 - 44) = 20 - 35 = -15 (too low)\n(78 - 58) + (44 - 9) = 20 + 35 = 55 (too low)\n(78 + 58) - (9 + 44) = 136 - 53 = 83 (too high)\n(78 + 58) - (9 - 44) = 136 + 35 = 171 (too high)\n(78 + 58) - (44 + 9) = 136 + 53 = 189 (too high)\n(78 + 9) - (58 + 44) = 87 - 102 = -15 (too low)\n(78 + 9) - (58 - 44) = 87 - 14 = 73\n\nSo our solution is: (78 + 9) - (58 - 44) = 73</think>\n\n<answer>(78 + 9) - (58 - 44) = 73</answer>"
+```
+附learning_rate和beta分别取值1e-6和0.04的不稳定实验记录，模型在step 200左右出现了震荡，format和CountdownORM瞬间走低：
+![](../../resources/grpo_countdown_1.png)

ms-swift/docs/source/BestPractices/NPU支持.md ADDED Viewed

	@@ -0,0 +1,225 @@

+# NPU支持
+作者: [chuanzhubin](https://github.com/chuanzhubin)
+## 环境准备
+实验环境：8 * 昇腾910B3 64G (设备由[@chuanzhubin](https://github.com/chuanzhubin)提供, 感谢对modelscope和swift的支持～)
+```shell
+# 创建新的conda虚拟环境(可选)
+conda create -n swift-npu python=3.10 -y
+conda activate swift-npu
+# 设置pip全局镜像 (可选,加速下载)
+pip config set global.index-url https://mirrors.aliyun.com/pypi/simple/
+pip install ms-swift -U
+# 安装torch-npu
+pip install torch-npu decorator
+# 如果你想要使用deepspeed (控制显存占用,训练速度会有一定下降)
+pip install deepspeed
+```
+测试环境是否安装正确，NPU能否被正常加载：
+```python
+from transformers.utils import is_torch_npu_available
+import torch
+print(is_torch_npu_available())  # True
+print(torch.npu.device_count())  # 8
+print(torch.randn(10, device='npu:0'))
+```
+查看NPU的P2P连接，这里看到每个NPU都通过7条HCCS与其他NPU互联
+```shell
+(valle) root@valle:~/src# npu-smi info -t topo
+	   NPU0       NPU1       NPU2       NPU3       NPU4       NPU5       NPU6       NPU7       CPU Affinity
+NPU0       X          HCCS       HCCS       HCCS       HCCS       HCCS       HCCS       HCCS       144-167
+NPU1       HCCS       X          HCCS       HCCS       HCCS       HCCS       HCCS       HCCS       144-167
+NPU2       HCCS       HCCS       X          HCCS       HCCS       HCCS       HCCS       HCCS       96-119
+NPU3       HCCS       HCCS       HCCS       X          HCCS       HCCS       HCCS       HCCS       96-119
+NPU4       HCCS       HCCS       HCCS       HCCS       X          HCCS       HCCS       HCCS       0-23
+NPU5       HCCS       HCCS       HCCS       HCCS       HCCS       X          HCCS       HCCS       0-23
+NPU6       HCCS       HCCS       HCCS       HCCS       HCCS       HCCS       X          HCCS       48-71
+NPU7       HCCS       HCCS       HCCS       HCCS       HCCS       HCCS       HCCS       X          48-71
+Legend:
+  X    = Self
+  SYS  = Path traversing PCIe and NUMA nodes. Nodes are connected through SMP, such as QPI, UPI.
+  PHB  = Path traversing PCIe and the PCIe host bridge of a CPU.
+  PIX  = Path traversing a single PCIe switch
+  PXB  = Path traversing multiple PCIe switches
+  HCCS = Connection traversing HCCS.
+  NA   = Unknown relationship.
+```
+查看NPU状态, npu-smi命令详解可以查看[官方文档](https://support.huawei.com/enterprise/zh/doc/EDOC1100079287/10dcd668)
+```shell
+(valle) root@valle:~/src# npu-smi info
++------------------------------------------------------------------------------------------------+
+| npu-smi 24.1.rc1.b030            Version: 24.1.rc1.b030                                        |
++---------------------------+---------------+----------------------------------------------------+
+| NPU   Name                | Health        | Power(W)    Temp(C)           Hugepages-Usage(page)|
+| Chip                      | Bus-Id        | AICore(%)   Memory-Usage(MB)  HBM-Usage(MB)        |
++===========================+===============+====================================================+
+| 0     910B3               | OK            | 101.8       43                0    / 0             |
+| 0                         | 0000:C1:00.0  | 0           0    / 0          3318 / 65536         |
++===========================+===============+====================================================+
+| 1     910B3               | OK            | 92.0        39                0    / 0             |
+| 0                         | 0000:C2:00.0  | 0           0    / 0          3314 / 65536         |
++===========================+===============+====================================================+
+| 2     910B3               | OK            | 102.0       40                0    / 0             |
+| 0                         | 0000:81:00.0  | 0           0    / 0          3314 / 65536         |
++===========================+===============+====================================================+
+| 3     910B3               | OK            | 99.8        40                0    / 0             |
+| 0                         | 0000:82:00.0  | 0           0    / 0          3314 / 65536         |
++===========================+===============+====================================================+
+| 4     910B3               | OK            | 98.6        45                0    / 0             |
+| 0                         | 0000:01:00.0  | 0           0    / 0          3314 / 65536         |
++===========================+===============+====================================================+
+| 5     910B3               | OK            | 99.7        44                0    / 0             |
+| 0                         | 0000:02:00.0  | 0           0    / 0          3314 / 65536         |
++===========================+===============+====================================================+
+| 6     910B3               | OK            | 103.8       45                0    / 0             |
+| 0                         | 0000:41:00.0  | 0           0    / 0          3314 / 65536         |
++===========================+===============+====================================================+
+| 7     910B3               | OK            | 98.2        44                0    / 0             |
+| 0                         | 0000:42:00.0  | 0           0    / 0          3315 / 65536         |
++===========================+===============+====================================================+
+```
+## 微调
+以下介绍LoRA的微调, 全参数微调设置参数`--train_type full`即可.
+| 模型大小 | NPU数量 | deepspeed类型 | 最大显存占用量   |
+|------|-------|-------------|-----------|
+| 7B   | 1     | None        | 1 * 28 GB |
+| 7B   | 4     | None        | 4 * 22 GB |
+| 7B   | 4     | zero2       | 4 * 28 GB |
+| 7B   | 4     | zero3       | 4 * 22 GB |
+| 7B   | 8     | None        | 8 * 22 GB |
+| 14B  | 1     | None        | 1 * 45 GB |
+| 14B  | 8     | None        | 8 * 51 GB |
+| 14B  | 8     | zero2       | 8 * 49 GB |
+| 14B  | 8     | zero3       | 8 * 31 GB |
+### 单卡训练
+通过如下命令启动单卡微调: （注意: 如果微调期间出现nan的情况, 请设置`--torch_dtype float32`.）
+```shell
+# 实验环境: 昇腾910B3
+# 显存需求: 28 GB
+# 运行时长: 8小时
+ASCEND_RT_VISIBLE_DEVICES=0 \
+swift sft \
+    --model Qwen/Qwen2-7B-Instruct \
+    --dataset AI-ModelScope/blossom-math-v2 \
+    --num_train_epochs 5 \
+    --train_type lora \
+    --output_dir output \
+    --learning_rate 1e-4 \
+    --gradient_accumulation_steps 16 \
+    --save_steps 100 \
+    --eval_steps 100
+```
+### 数据并行训练
+我们使用其中的4卡进行ddp训练
+```shell
+# 实验环境: 4 * 昇腾910B3
+# 显存需求: 4 * 22 GB
+# 运行时长: 2小时
+NPROC_PER_NODE=4 \
+ASCEND_RT_VISIBLE_DEVICES=0,1,2,3 \
+swift sft \
+    --model Qwen/Qwen2-7B-Instruct \
+    --dataset AI-ModelScope/blossom-math-v2 \
+    --num_train_epochs 5 \
+    --train_type lora \
+    --output_dir output \
+    ...
+```
+### Deepspeed训练
+ZeRO2:
+```shell
+# 实验环境: 4 * 昇腾910B3
+# 显存需求: 4 * 28GB
+# 运行时长: 3.5小时
+NPROC_PER_NODE=4 \
+ASCEND_RT_VISIBLE_DEVICES=0,1,2,3 \
+swift sft \
+    --model Qwen/Qwen2-7B-Instruct \
+    --dataset AI-ModelScope/blossom-math-v2 \
+    --num_train_epochs 5 \
+    --train_type lora \
+    --output_dir output \
+    --deepspeed zero2 \
+    ...
+```
+ZeRO3:
+```shell
+# 实验环境: 4 * 昇腾910B3
+# 显存需求: 4 * 22 GB
+# 运行时长: 8.5小时
+NPROC_PER_NODE=4 \
+ASCEND_RT_VISIBLE_DEVICES=0,1,2,3 \
+swift sft \
+    --model Qwen/Qwen2-7B-Instruct \
+    --dataset AI-ModelScope/blossom-math-v2 \
+    --num_train_epochs 5 \
+    --train_type lora \
+    --output_dir output \
+    --deepspeed zero3 \
+    ...
+```
+## 推理
+原始模型:
+```shell
+ASCEND_RT_VISIBLE_DEVICES=0 swift infer \
+    --model Qwen/Qwen2-7B-Instruct \
+    --stream true --max_new_tokens 2048
+```
+LoRA微调后:
+```shell
+ASCEND_RT_VISIBLE_DEVICES=0 swift infer \
+    --adapters xxx/checkpoint-xxx --load_data_args true \
+    --stream true --max_new_tokens 2048
+# merge-lora并推理
+ASCEND_RT_VISIBLE_DEVICES=0 swift export --adapters xx/checkpoint-xxx --merge_lora true
+ASCEND_RT_VISIBLE_DEVICES=0 swift infer \
+    --model xxx/checkpoint-xxx-merged --load_data_args true \
+    --stream true --max_new_tokens 2048
+```
+## 部署
+NPU不支持使用vllm进行推理/部署加速, 但是可以使用原生pytorch进行部署.
+原始模型:
+```shell
+ASCEND_RT_VISIBLE_DEVICES=0 swift deploy --model Qwen/Qwen2-7B-Instruct --max_new_tokens 2048
+```
+LoRA微调后:
+```shell
+ASCEND_RT_VISIBLE_DEVICES=0 swift deploy --adapters xxx/checkpoint-xxx --max_new_tokens 2048
+# merge-lora并推理
+ASCEND_RT_VISIBLE_DEVICES=0 swift export --adapters xx/checkpoint-xxx --merge_lora true
+ASCEND_RT_VISIBLE_DEVICES=0 swift deploy --model xxx/checkpoint-xxx-merged --max_new_tokens 2048
+```

ms-swift/docs/source/Customization/自定义数据集.md ADDED Viewed

	@@ -0,0 +1,241 @@

+# 自定义数据集
+自定义数据集的接入方法有三种，对预处理函数的控制能力逐渐加强，但接入难度逐步增加。例如，方案一最为方便，但对预处理函数的控制能力最弱，需要预先对数据集进行转换，传入特定格式的数据集：
+1. 【推荐】直接使用命令行传参的方式接入，即`--dataset <dataset_path1> <dataset_path2>`。这将使用AutoPreprocessor将数据集转换为标准格式（支持4种数据集格式，具体查看下面对AutoPreprocessor的介绍）。你可以使用`--columns`进行列名转换。支持传入csv、json、jsonl、txt、文件夹（例如git clone开源数据集）。该方案不需要修改dataset_info.json，适合刚接触ms-swift的用户，下面两种方案适合对ms-swift进行拓展的开发者。
+2. 添加数据集到`dataset_info.json`中，可以参考ms-swift内置的[dataset_info.json](https://github.com/modelscope/ms-swift/blob/main/swift/llm/dataset/data/dataset_info.json)。该方案也将使用AutoPreprocessor将数据集转换为标准格式。dataset_info.json为数据集元信息的list，每一项元信息必填ms_dataset_id/hf_dataset_id/dataset_path中的一项，通过`columns`字段进行列名转换。添加到`dataset_info.json`或者注册的数据集在运行[run_dataset_info.py](https://github.com/modelscope/ms-swift/blob/main/scripts/utils/run_dataset_info.py)时将自动产生[支持的数据集文档](https://swift.readthedocs.io/zh-cn/latest/Instruction/%E6%94%AF%E6%8C%81%E7%9A%84%E6%A8%A1%E5%9E%8B%E5%92%8C%E6%95%B0%E6%8D%AE%E9%9B%86.html)。此外，你可以采用外接`dataset_info.json`的方式，使用`--custom_dataset_info xxx.json`解析json文件（方便pip install而非git clone的用户），然后指定`--dataset <dataset_id/dataset_dir/dataset_path>`。
+3. 手动注册数据集，具有最灵活的预处理函数定制能力，支持使用函数对数据集进行预处理，但难度较高。可以参考[内置数据集](https://github.com/modelscope/ms-swift/blob/main/swift/llm/dataset/dataset/llm.py)或者[examples](https://github.com/modelscope/swift/blob/main/examples/custom)中的样例。你可以通过指定`--custom_register_path xxx.py`解析外置注册内容（方便pip install而非git clone的用户）。
+   - 方案一和二在实现中借助了方案三，只是注册的过程为自动发生。
+以下将对`AutoPreprocessor`可以处理的数据集格式进行介绍：
+ms-swift的标准数据集格式可接受的keys包括: 'messages'、'rejected_response'、'label'、'images'、'videos'、'audios'、'tools'和'objects'。其中'messages'是必需的key，'rejected_response'用于DPO等RLHF训练，'label'用于KTO训练和分类模型训练，'images'、'videos'、'audios'用于存储多模态数据的路径或者url，'tools'用于Agent任务，'objects'用于grounding任务。
+ms-swift中存在三种核心预处理器：`MessagesPreprocessor`、`AlpacaPreprocessor`、`ResponsePreprocessor`。MessagesPreprocessor用于将类messages和sharegpt格式的数据集转换为标准格式，AlpacaPreprocessor则转换alpaca格式的数据集，ResponsePreprocessor则转换类query/response格式的数据集。`AutoPreprocessor`则自动选择合适的预处理进行处理。
+以下四种格式在`AutoPreprocessor`处理下都会转换成ms-swift标准格式中的messages字段，即都可以直接使用`--dataset <dataset-path>`接入：
+messages格式（标准格式）:
+```jsonl
+{"messages": [{"role": "system", "content": "<system>"}, {"role": "user", "content": "<query1>"}, {"role": "assistant", "content": "<response1>"}, {"role": "user", "content": "<query2>"}, {"role": "assistant", "content": "<response2>"}]}
+```
+- 注意：system部分是可选的。数据集中的system优先级高于命令行传入的`--system`，最后是定义在template中的`default_system`。
+sharegpt格式:
+```jsonl
+{"system": "<system>", "conversation": [{"human": "<query1>", "assistant": "<resonse1>"}, {"human": "<query2>", "assistant": "<resonse2>"}]}
+```
+alpaca格式:
+```jsonl
+{"system": "<system>", "instruction": "<query-inst>", "input": "<query-input>", "output": "<response>"}
+```
+query-response格式：
+```jsonl
+{"system": "<system>", "query": "<query2>", "response": "<response2>", "history": [["<query1>", "<response1>"]]}
+```
+## 标准数据集格式
+以下给出ms-swift的标准数据集格式，其中system字段是可选的，默认使用template中定义的`default_system`。之前介绍的4种数据集格式也可以被AutoPreprocessor处理成标准数据集格式。
+### 预训练
+```jsonl
+{"messages": [{"role": "assistant", "content": "I love music"}]}
+{"messages": [{"role": "assistant", "content": "教练我要打篮球"}]}
+{"messages": [{"role": "assistant", "content": "西红柿鸡蛋盖饭和地三鲜盖饭哪个更权威"}]}
+```
+### 监督微调
+```jsonl
+{"messages": [{"role": "system", "content": "你是个有用无害的助手"}, {"role": "user", "content": "告诉我明天��天气"}, {"role": "assistant", "content": "明天天气晴朗"}]}
+{"messages": [{"role": "system", "content": "你是个有用无害的数学计算器"}, {"role": "user", "content": "1+1等于几"}, {"role": "assistant", "content": "等于2"}, {"role": "user", "content": "再加1呢"}, {"role": "assistant", "content": "等于3"}]}
+```
+### RLHF
+#### DPO/ORPO/CPO/SimPO/RM
+```jsonl
+{"messages": [{"role": "system", "content": "你是个有用无害的助手"}, {"role": "user", "content": "告诉我明天的天气"}, {"role": "assistant", "content": "明天天气晴朗"}], "rejected_response": "我不知道"}
+{"messages": [{"role": "system", "content": "你是个有用无害的数学计算器"}, {"role": "user", "content": "1+1等于几"}, {"role": "assistant", "content": "等于2"}, {"role": "user", "content": "再加1呢"}, {"role": "assistant", "content": "等于3"}], "rejected_response": "我不知道"}
+```
+#### KTO
+```jsonl
+{"messages": [{"role": "system", "content": "你是个有用无害的助手"}, {"role": "user", "content": "告诉我明天的天气"}, {"role": "assistant", "content": "我不知道"}], "label": false}
+{"messages": [{"role": "system", "content": "你是个有用无害的数学计算器"}, {"role": "user", "content": "1+1等于几"}, {"role": "assistant", "content": "等于2"}, {"role": "user", "content": "再加1呢"}, {"role": "assistant", "content": "等于3"}], "label": true}
+```
+#### PPO/GRPO
+```jsonl
+{"messages": [{"role": "system", "content": "你是个有用无害的助手"}, {"role": "user", "content": "告诉我明天的天气"}]}
+{"messages": [{"role": "system", "content": "你是个有用无害的数学计算器"}, {"role": "user", "content": "1+1等于几"}, {"role": "assistant", "content": "等于2"}, {"role": "user", "content": "再加1呢"}]}
+{"messages": [{"role": "user", "content": "你的名字是什么"}]}
+```
+- 注意：GRPO会透传所有额外的字段内容给ORM，而不像其他训练方法，默认将额外的字段删除。例如: 你可以额外传入'solution'。自定义的ORM需要包含一个位置参数completions，其他为关键词参数，由数据集额外字段透传。
+### 序列分类
+**单标签任务**：
+```jsonl
+{"messages": [{"role": "user", "content": "今天天气真好呀"}], "label": 1}
+{"messages": [{"role": "user", "content": "今天真倒霉"}], "label": 0}
+{"messages": [{"role": "user", "content": "好开心"}], "label": 1}
+```
+**多标签任务**：
+```jsonl
+{"messages": [{"role": "user", "content": "<sentence>"}], "label": [1, 3, 5]}
+```
+**单回归任务**：
+```jsonl
+{"messages": [{"role": "user", "content": "求两句话的相似度，范围为0-1。\nsentence1: <sentence1>\nsentence2: <sentence2>"}], "label": 0.8}
+```
+**多回归任务**：
+```jsonl
+{"messages": [{"role": "user", "content": "<sentence>"}], "label": [1.2, -0.6, 0.8]}
+```
+### Embedding
+请参考[embedding训练文档](../BestPractices/Embedding训练.md#数据集格式)
+### 多模态
+对于多模态数据集，和上述任务的格式相同。区别在于增加了`images`, `videos`, `audios`几个key，分别代表多模态资源的url或者path（推荐使用绝对路径），`<image>` `<video>` `<audio>`标签代表了插入图片/视频/音频的位置，ms-swift支持多图片/视频/音频的情况。这些特殊tokens将在预处理的时候进行替换，参考[这里](https://github.com/modelscope/ms-swift/blob/main/swift/llm/template/template/qwen.py#L198)。下面给出的四条示例分别展示了纯文本，以及包含图像、视频和音频数据的数据格式。
+预训练：
+```
+{"messages": [{"role": "assistant", "content": "预训练的文本在这里"}]}
+{"messages": [{"role": "assistant", "content": "<image>是一只小狗，<image>是一只小猫"}], "images": ["/xxx/x.jpg", "/xxx/x.png"]}
+{"messages": [{"role": "assistant", "content": "<audio>描述了今天天气真不错"}], "audios": ["/xxx/x.wav"]}
+{"messages": [{"role": "assistant", "content": "<image>是一个大象，<video>是一只狮子在跑步"}], "images": ["/xxx/x.jpg"], "videos": ["/xxx/x.mp4"]}
+```
+微调：
+```jsonl
+{"messages": [{"role": "user", "content": "浙江的省会在哪？"}, {"role": "assistant", "content": "浙江的省会在杭州。"}]}
+{"messages": [{"role": "user", "content": "<image><image>两张图片有什么区别"}, {"role": "assistant", "content": "前一张是小猫，后一张是小狗"}], "images": ["/xxx/x.jpg", "/xxx/x.png"]}
+{"messages": [{"role": "user", "content": "<audio>语音说了什么"}, {"role": "assistant", "content": "今天天气真好呀"}], "audios": ["/xxx/x.mp3"]}
+{"messages": [{"role": "system", "content": "你是个有用无害的助手"}, {"role": "user", "content": "<image>图片中是什么，<video>视频中是什么"}, {"role": "assistant", "content": "图片中是一个大象，视频中是一只小狗在草地上奔跑"}], "images": ["/xxx/x.jpg"], "videos": ["/xxx/x.mp4"]}
+```
+多模态模型的RLHF和序列分类的数据格式可以参考���文本大模型的格式，并在此基础上增加`images`等字段。
+#### grounding
+如果是grounding（物体检测）任务，SWIFT支持两种方式：
+1. 直接使用对应模型grounding任务的数据集格式，例如qwen2-vl的格式如下：
+```jsonl
+{"messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "<image>描述图像"}, {"role": "assistant", "content": "<|object_ref_start|>一只狗<|object_ref_end|><|box_start|>(221,423),(569,886)<|box_end|>和<|object_ref_start|>一个女人<|object_ref_end|><|box_start|>(451,381),(733,793)<|box_end|>正在沙滩上玩耍"}], "images": ["/xxx/x.jpg"]}
+{"messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "<image>找到图像中的<|object_ref_start|>羊<|object_ref_end|>"}, {"role": "assistant", "content": "<|box_start|>(101,201),(150,266)<|box_end|><|box_start|>(401,601),(550,666)<|box_end|>"}], "images": ["/xxx/x.jpg"]}
+{"messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "<image>帮我打开谷歌浏览器"}, {"role": "assistant", "content": "Action: click(start_box='<|box_start|>(246,113)<|box_end|>')"}], "images": ["/xxx/x.jpg"]}
+```
+使用这种类型的数据需要注意：
+  - 不同模型grounding任务的特殊字符和数据集格式不同。
+  - 不同模型对bbox是否归一化的处理不同。例如：qwen2.5-vl使用绝对坐标，而qwen2-vl、internvl2.5需要对bbox的坐标进行千分位坐标归一化。
+2. 使用SWIFT的grounding数据格式：
+```jsonl
+{"messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "<image>描述图像"}, {"role": "assistant", "content": "<ref-object><bbox>和<ref-object><bbox>正在沙滩上玩耍"}], "images": ["/xxx/x.jpg"], "objects": {"ref": ["一只狗", "一个女人"], "bbox": [[331.5, 761.4, 853.5, 1594.8], [676.5, 685.8, 1099.5, 1427.4]]}}
+{"messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "<image>找到图像中的<ref-object>"}, {"role": "assistant", "content": "<bbox><bbox>"}], "images": ["/xxx/x.jpg"], "objects": {"ref": ["羊"], "bbox": [[90.9, 160.8, 135, 212.8], [360.9, 480.8, 495, 532.8]]}}
+{"messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "<image>帮我打开谷歌浏览器"}, {"role": "assistant", "content": "Action: click(start_box='<bbox>')"}], "images": ["/xxx/x.jpg"], "objects": {"ref": [], "bbox": [[615, 226]]}}
+```
+该格式将自动转换数据集格式为对应模型的grounding任务格式，且选择对应模型的bbox归一化方式。该格式比通用格式多了objects字段，该字段包含的字段有：
+ - ref: 用于替换`<ref-object>`。
+ - bbox: 用于替换`<bbox>`。若bbox中每个box长度为2，则代表x和y坐标，若box长度为4，则代表2个点的x和y坐标。
+ - bbox_type: 可选项为'real'，'norm1'。默认为'real'，即bbox为真实bbox值。若是'norm1'，则bbox已经归一化为0~1。
+ - image_id: 该参数只有当bbox_type为'real'时生效。代表bbox对应的图片是第几张，用于缩放bbox。索引从0开始，默认全为第0张。
+### 文生图格式
+```jsonl
+{"messages": [{"role": "system", "content": "你是个有用无害的助手"}, {"role": "user", "content": "给我画出一个苹果"}, {"role": "assistant", "content": "<image>"}], "images": ["/xxx/x.jpg"]}
+```
+### Agent格式
+这里分别提供了纯文本Agent和多模态Agent的示例数据样本：
+```jsonl
+{"tools": ["{\"type\": \"function\", \"function\": {\"name\": \"realtime_aqi\", \"description\": \"天气预报。获取实时空气质量。当前空气质量，PM2.5，PM10信息\", \"parameters\": {\"type\": \"object\", \"properties\": {\"city\": {\"type\": \"string\", \"description\": \"城市名，例如：上海\"}}, \"required\": [\"city\"]}}}"], "messages": [{"role": "user", "content": "北京和上海今天的天气情况"}, {"role": "tool_call", "content": "{\"name\": \"realtime_aqi\", \"arguments\": {\"city\": \"北京\"}}"}, {"role": "tool_call", "content": "{\"name\": \"realtime_aqi\", \"arguments\": {\"city\": \"上海\"}}"}, {"role": "tool_response", "content": "{\"city\": \"北京\", \"aqi\": \"10\", \"unit\": \"celsius\"}"}, {"role": "tool_response", "content": "{\"city\": \"上海\", \"aqi\": \"72\", \"unit\": \"fahrenheit\"}"}, {"role": "assistant", "content": "根据天气预报工具，北京今天的空气质量指数为10，属于良好水平；上海今天的空气质量指数为72，属于轻度污染水平。"}]}
+{"tools": ["{\"type\": \"function\", \"function\": {\"name\": \"click\", \"description\": \"点击屏幕中的某个位置\", \"parameters\": {\"type\": \"object\", \"properties\": {\"x\": {\"type\": \"integer\", \"description\": \"横坐标，表示屏幕上的水平位置\"}, \"y\": {\"type\": \"integer\", \"description\": \"纵坐标，表示屏幕上的垂直位置\"}}, \"required\": [\"x\", \"y\"]}}}"], "messages": [{"role": "user", "content": "<image>现在几点了？"}, {"role": "assistant", "content": "<think>\n我可以通过打开日历App来获取当前时间。\n</think>\n"}, {"role": "tool_call", "content": "{\"name\": \"click\", \"arguments\": {\"x\": 105, \"y\": 132}}"}, {"role": "tool_response", "content": "{\"images\": \"<image>\", \"status\": \"success\"}"}, {"role": "assistant", "content": "成功打开日历App，现在的时间为中午11点"}], "images": ["desktop.png", "calendar.png"]}
+```
+- agent_template为"react_en", "hermes"等情况下，该格式适配所有模型Agent训练，可以轻松在不同模型间切换。
+- 其中tools是一个`List[str]`，其中每一个tool需要是json字符串，messages中role为'tool_call'和'tool_response/tool'的content部分都需要是json字符串。
+- tools字段将在训练/推理时和`{"role": "system", ...}"`部分组合，根据agent_template组成完整的system部分。
+- `{"role": "tool_call", ...}`部分将根据agent_template自动转成对应格式的`{"role": "assistant", ...}`，多条连续的`{"role": "assistant", ...}`将拼接在一起组成完整的assistant_content。
+- `{"role": "tool_response", ...}`也可以写成`{"role": "tool", ...}`，这两种写法是等价的。该部分也将根据`agent_template`自动转换格式。该部分在训练时将不进行损失的计算，角色类似于`{"role": "user", ...}`。
+- 该格式支持并行调用工具，例子参考第一条数据样本。多模态Agent数据样本中`<image>`标签数量应与"images"长度相同，其标签位置代表图像特征的插入位置。当然也支持其他模态，例如audios, videos。
+- 更多请参考[Agent文档](../Instruction/Agent支持.md)。
+## dataset_info.json
+可以参考ms-swift内置的[dataset_info.json](https://github.com/modelscope/ms-swift/blob/main/swift/llm/dataset/data/dataset_info.json)。该方案使用AutoPreprocessor预处理函数将数据集转换为标准格式。dataset_info.json文件中包含了数据集元信息的list，以下为一些例子：
+```json
+[
+  {
+    "ms_dataset_id": "xxx/xxx"
+  },
+  {
+    "dataset_path": "<dataset_dir/dataset_path>"
+  },
+  {
+    "ms_dataset_id": "<dataset_id>",
+    "subsets": ["v1"],
+    "split": ["train", "validation"],
+    "columns": {
+      "input": "query",
+      "output": "response"
+    }
+  },
+  {
+    "ms_dataset_id": "<dataset_id>",
+    "hf_dataset_id": "<hf_dataset_id>",
+    "subsets": [{
+      "subset": "subset1",
+      "columns": {
+        "problem": "query",
+        "content": "response"
+      }
+    },
+    {
+      "subset": "subset2",
+      "columns": {
+        "messages": "_",
+        "new_messages": "messages"
+      }
+    }]
+  }
+]
+```
+支持以下参数：
+- ms_dataset_id: 参考DatasetMeta参数。
+- hf_dataset_id: 参考DatasetMeta参数。
+- dataset_path: 参考DatasetMeta参数。
+- dataset_name: 参考DatasetMeta参数。
+- subsets: 参考DatasetMeta参数。
+- split: 参考DatasetMeta参数。
+- columns: 在数据集进行预处理前，对数据集进行列名转换。
+## 数据集注册
+register_dataset会在`DATASET_MAPPING`中注册数据集，调用函数`register_dataset(dataset_meta)`即可完成数据集注册，其中dataset_meta将存储模型的元信息。DatasetMeta的参数列表如下：
+- ms_dataset_id: ModelScope的dataset_id，默认为None。
+- hf_dataset_id: HuggingFace的dataset_id，默认为None。
+- dataset_path: dataset的本地路径（推荐使用绝对路径）。默认为None。
+- dataset_name: 数据集别名，可以通过`--dataset <dataset_name>`指定数据集，这在dataset_path很长时很方便。默认为None。
+- subsets: 子数据集的名字列表或者`SubsetDataset`对象的列表，默认为`['default']`。（只有dataset_id或者dataset_dir（git clone开源数据集）有子数据集和split的概念）。
+- split: 默认为`['train']`。
+- preprocess_func: 预处理函数或可调用对象，默认为`AutoPreprocessor()`。该预处理函数接口为传入`HfDataset`，并返回满足标准格式的`HfDataset`。
+- load_function: 默认为`DatasetLoader.load`。若需要自定义载入函数，则该载入函数需返回满足标准格式的`HfDataset`，这将抛弃ms-swift的数据集载入机制，提供给用户最大的自由度。通常该参数不需要进行修改。

ms-swift/docs/source/GetStarted/SWIFT安装.md ADDED Viewed

	@@ -0,0 +1,87 @@

+# SWIFT安装
+## Wheel包安装
+可以使用pip进行安装：
+```shell
+pip install 'ms-swift'
+# 使用评测
+pip install 'ms-swift[eval]' -U
+# 全能力
+pip install 'ms-swift[all]' -U
+```
+## 源代码安装
+```shell
+# pip install git+https://github.com/modelscope/ms-swift.git
+# 全能力
+# pip install "git+https://github.com/modelscope/ms-swift.git#egg=ms-swift[all]"
+git clone https://github.com/modelscope/ms-swift.git
+cd ms-swift
+pip install -e .
+# 全能力
+# pip install -e '.[all]'
+```
+## 旧版本
+SWIFT在3.0版本开始进行了不兼容式重构，如果需要使用2.x旧版本，请执行如下命令进行安装：
+```shell
+pip install ms-swift==2.*
+```
+## 镜像
+```
+# vllm0.8.3 (该版本vllm可能导致部分GRPO训练卡住，GRPO建议优先使用vllm0.7.3)
+modelscope-registry.cn-hangzhou.cr.aliyuncs.com/modelscope-repo/modelscope:ubuntu22.04-cuda12.4.0-py311-torch2.6.0-vllm0.8.3-modelscope1.25.0-swift3.3.0.post1
+modelscope-registry.us-west-1.cr.aliyuncs.com/modelscope-repo/modelscope:ubuntu22.04-cuda12.4.0-py311-torch2.6.0-vllm0.8.3-modelscope1.25.0-swift3.3.0.post1
+# vllm0.7.3
+modelscope-registry.us-west-1.cr.aliyuncs.com/modelscope-repo/modelscope:ubuntu22.04-cuda12.4.0-py311-torch2.5.1-modelscope1.25.0-swift3.2.2
+```
+更多镜像可以查看[这里](https://modelscope.cn/docs/intro/environment-setup#%E6%9C%80%E6%96%B0%E9%95%9C%E5%83%8F)
+## 支持的硬件
+| 硬件环境        | 备注                        |
+| --------------- | --------------------------- |
+| A10/A100/H100   |                             |
+| RTX20/30/40系列 |                             |
+| T4/V100         | 部分模型出现NAN             |
+| Ascend NPU      | 部分模型出现NAN或算子不支持 |
+| MPS             |                             |
+| CPU             |                             |
+## 运行环境
+|        | 范围           | 推荐 | 备注 |
+| ------ |--------------| ---- | --|
+| python | >=3.9        | 3.10 ||
+| cuda |              | cuda12 |使用cpu、npu、mps则无需安装|
+| torch | >=2.0        |  ||
+| transformers | >=4.33       | 4.51 ||
+| modelscope | >=1.23       |  ||
+| peft | >=0.11,<0.16 | ||
+| trl | >=0.13,<0.18 | 0.17 |RLHF|
+| deepspeed | >=0.14       | 0.14.5 |训练|
+| vllm | >=0.5.1      | 0.7.3/0.8 |推理/部署/评测|
+| lmdeploy | >=0.5        | 0.8 |推理/部署/评测|
+| evalscope | >=0.11       | |评测|
+更多可选依赖可以参考[这里](https://github.com/modelscope/ms-swift/blob/main/requirements/install_all.sh)。
+## Notebook环境
+Swift支持训练的绝大多数模型都可以在`A10`显卡上使用，用户可以使用ModelScope官方提供的免费显卡资源：
+1. 进入[ModelScope](https://www.modelscope.cn)官方网站并登录
+2. 点击左侧的`我的Notebook`并开启一个免费GPU实例
+3. 愉快地薅A10显卡羊毛

ms-swift/docs/source/GetStarted/快速开始.md ADDED Viewed

	@@ -0,0 +1,103 @@

+# 快速开始
+ms-swift是魔搭社区提供的大模型与多模态大模型训练部署框架，现已支持500+大模型与200+多模态大模型的训练（预训练、微调、人类对齐）、推理、评测、量化与部署。模型开发者可以在ms-swift框架中一站式完成围绕大模型的各类需求。目前ms-swift的主要能力包含：
+- 🍎 模型类型：支持500+纯文本大模型、200+多模态大模型以及All-to-All全模态模型、序列分类模型、Embedding模型训练到部署全流程。
+- 数据集类型：内置150+预训练、微调、人类对齐、多模态等各种类型的数据集，并支持自定义数据集。
+- 硬件支持：CPU、RTX系列、T4/V100、A10/A100/H100、Ascend NPU、MPS等。
+- 🍊 轻量训练：支持了LoRA、QLoRA、DoRA、LoRA+、ReFT、RS-LoRA、LLaMAPro、Adapter、GaLore、Q-Galore、LISA、UnSloth、Liger-Kernel等轻量微调方式。
+- 分布式训练：支持分布式数据并行（DDP）、device_map简易模型并行、DeepSpeed ZeRO2 ZeRO3、FSDP等分布式训练技术。
+- 量化训练：支持对BNB、AWQ、GPTQ、AQLM、HQQ、EETQ量化模型进行训练。
+- RLHF训练：支持纯文本大模型和多模态大模型的DPO、GRPO、RM、PPO、KTO、CPO、SimPO、ORPO等人类对齐训练方法。
+- 🍓 多模态训练：支持对图像、视频和语音不同模态模型进行训练，支持VQA、Caption、OCR、Grounding任务的训练。
+- 界面训练：以界面的方式提供训练、推理、评测、量化的能力，完成大模型的全链路。
+- 插件化与拓展：支持自定义模型和数据集拓展，支持对loss、metric、trainer、loss-scale、callback、optimizer等组件进行自定义。
+- 🍉 工具箱能力：除了对大模型和多模态大模型的训练支持外，还支持其推理、评测、量化和部署全流程。
+- 推理加速：支持PyTorch、vLLM、LmDeploy推理加速引擎，并提供OpenAI接口，为推理、部署和评测模块提供加速。
+- 模型评测：以EvalScope作为评测后端，支持100+评测数据集对纯文本和多模态模型进行评测。
+- 模型量化：支持AWQ、GPTQ和BNB的量化导出，导出的模型支持使用vLLM/LmDeploy推理加速，并支持继续训练。
+## 安装
+ms-swift的安装请参考[安装文档](./SWIFT安装.md)。
+## 使用样例
+10分钟在单卡3090上对Qwen2.5-7B-Instruct进行自我认知微调：
+```shell
+# 22GB
+CUDA_VISIBLE_DEVICES=0 \
+swift sft \
+    --model Qwen/Qwen2.5-7B-Instruct \
+    --train_type lora \
+    --dataset 'AI-ModelScope/alpaca-gpt4-data-zh#500' \
+              'AI-ModelScope/alpaca-gpt4-data-en#500' \
+              'swift/self-cognition#500' \
+    --torch_dtype bfloat16 \
+    --num_train_epochs 1 \
+    --per_device_train_batch_size 1 \
+    --per_device_eval_batch_size 1 \
+    --learning_rate 1e-4 \
+    --lora_rank 8 \
+    --lora_alpha 32 \
+    --target_modules all-linear \
+    --gradient_accumulation_steps 16 \
+    --eval_steps 50 \
+    --save_steps 50 \
+    --save_total_limit 2 \
+    --logging_steps 5 \
+    --max_length 2048 \
+    --output_dir output \
+    --system 'You are a helpful assistant.' \
+    --warmup_ratio 0.05 \
+    --dataloader_num_workers 4 \
+    --model_author swift \
+    --model_name swift-robot
+```
+小贴士：
+- 如果要使用自定义数据集进行训练，你可以参考[这里](../Customization/自定义数据集.md)组织数据集格式，并指定`--dataset <dataset_path>`。
+- `--model_author`和`--model_name`参数只有当数据集中包含`swift/self-cognition`时才生效。
+- 如果要使用其他模型进行训练，你只需要修改`--model <model_id/model_path>`即可。
+- 默认使用ModelScope进行模型和数据集的下载。如果要使用HuggingFace，指定`--use_hf true`即可。
+训练完成后，使用以下命令对训练后的权重进行推理：
+- 这里的`--adapters`需要替换成训练生成的last checkpoint文件夹。由于adapters文件夹中包含了训练的参数文件`args.json`，因此不需要额外指定`--model`，`--system`，swift会自动读取这些参数。如果要关闭此行为，可以设置`--load_args false`。
+```shell
+# 使用交互式命令行进行推理
+CUDA_VISIBLE_DEVICES=0 \
+swift infer \
+    --adapters output/vx-xxx/checkpoint-xxx \
+    --stream true \
+    --temperature 0 \
+    --max_new_tokens 2048
+# merge-lora并使用vLLM进行推理加速
+CUDA_VISIBLE_DEVICES=0 \
+swift infer \
+    --adapters output/vx-xxx/checkpoint-xxx \
+    --stream true \
+    --merge_lora true \
+    --infer_backend vllm \
+    --max_model_len 8192 \
+    --temperature 0 \
+    --max_new_tokens 2048
+```
+最后，使用以下命令将模型推送到ModelScope：
+```shell
+CUDA_VISIBLE_DEVICES=0 \
+swift export \
+    --adapters output/vx-xxx/checkpoint-xxx \
+    --push_to_hub true \
+    --hub_model_id '<your-model-id>' \
+    --hub_token '<your-sdk-token>' \
+    --use_hf false
+```
+## 了解更多
+- 更多Shell脚本：[https://github.com/modelscope/ms-swift/tree/main/examples](https://github.com/modelscope/ms-swift/tree/main/examples)
+- 使用Python：[https://github.com/modelscope/ms-swift/blob/main/examples/notebook/qwen2_5-self-cognition/self-cognition-sft.ipynb](https://github.com/modelscope/ms-swift/blob/main/examples/notebook/qwen2_5-self-cognition/self-cognition-sft.ipynb)

ms-swift/docs/source/Instruction/命令行参数.md ADDED Viewed

	@@ -0,0 +1,660 @@

+# 命令行参数
+命令行参数的介绍会分为基本参数，原子参数、集成参数和特定模型参数。命令行最终使用的参数列表为集成参数。集成参数继承自基本参数和一些原子参数。特定模型参数是针对于具体模型的参数，可以通过`--model_kwargs'`或者环境变量进行设置。Megatron-SWIFT命令行参数介绍可以在[Megatron-SWIFT训练文档](./Megatron-SWIFT训练.md)中找到。
+提示：
+- 命令行传入list使用空格隔开即可。例如：`--dataset <dataset_path1> <dataset_path2>`。
+- 命令行传入dict使用json。例如：`--model_kwargs '{"fps_max_frames": 12}'`。
+- 带🔥的参数为重要参数，刚熟悉ms-swift的用户可以先关注这些命令行参数。
+## 基本参数
+- 🔥tuner_backend: 可选为'peft'，'unsloth'。默认为'peft'。
+- 🔥train_type: 可选为: 'lora'、'full'、'longlora'、'adalora'、'llamapro'、'adapter'、'vera'、'boft'、'fourierft'、'reft'。默认为'lora'。
+- 🔥adapters: 用于指定adapter的id/path的list，默认为`[]`。
+- external_plugins: 外部plugin py文件列表，这些文件会被注册进plugin模块中，例子请参见[这里](https://github.com/modelscope/ms-swift/tree/main/examples/train/grpo/plugin/run_external_rm.sh)。
+- seed: 默认为42。
+- model_kwargs: 特定模型可传入的额外参数，该参数列表会在训练推理时打印日志进行提示。例如`--model_kwargs '{"fps_max_frames": 12}'`。
+- load_args: 当指定`--resume_from_checkpoint`、`--model`、`--adapters`会读取保存文件中的`args.json`，将默认为None的`基本参数`（除去数据参数和生成参数）进行赋值（可通过手动传入进行覆盖）。推理和导出时默认为True，训练时默认为False。
+- load_data_args: 如果将该参数设置为True，则会额外读取`args.json`中的数据参数。默认为False。
+- use_hf: 控制模型下载、数据集下载、模型推送使用ModelScope还是HuggingFace。默认为False，使用ModelScope。
+- hub_token: hub token. modelscope的hub token可以查看[这里](https://modelscope.cn/my/myaccesstoken)。
+- custom_register_path: 自定义模型、对话模板和数据集注册的`.py`文件路径的list。默认为`[]`。
+### 模型参数
+- 🔥model: 模型id或模型本地路径。如果是自定义模型请配合`model_type`和`template`使用，具体可以参考[自定义模型](../Customization/自定义模型.md)。
+- model_type: 模型类型。相同的模型架构、template、模型加载过程被定义为一个model_type。默认为None，根据`--model`的后缀和config.json中的architectures属性进行自动选择。
+- model_revision: 模型版本，默认为None。
+- task_type: 默认为'causal_lm'。可选为'causal_lm'、'seq_cls'、'embedding'。seq_cls的例子可以查看[这里](https://github.com/modelscope/ms-swift/tree/main/examples/train/seq_cls)，embedding的例子查看[这里](https://github.com/modelscope/ms-swift/tree/main/examples/train/embedding)。
+- 🔥torch_dtype: 模型权重的数据类型，支持`float16`,`bfloat16`,`float32`。默认为None，从config.json文件中读取。
+- attn_impl: attention类型，可选项为`flash_attn`, `sdpa`, `eager`。默认使用sdpa，若不支持则使用eager。
+  - 注意：这三种实现并不一定都支持，这取决于对应模型的支持情况。
+- num_labels: 分类模型（即`--task_type seq_cls`）需要指定该参数。代表标签数量，默认为None。
+- problem_type: 分类模型（即`--task_type seq_cls`）需要指定该参数。可选为'regression', 'single_label_classification', 'multi_label_classification'。默认为None，根据num_labels和数据集类型进行自动设置。
+- rope_scaling: rope类型，支持`linear`和`dynamic`，请配合`max_length`共同使用。默认为None。
+- device_map: 模型使用的device_map配置，例如：'auto'、'cpu'、json字符串、json文件路径。默认为None，根据设备和分布式训练情况自动设置。
+- max_memory: device_map设置为'auto'或者'sequential'时，会根据max_memory进行模型权重的device分配，例如：`--max_memory '{0: "20GB", 1: "20GB"}'`。默认为None。
+- local_repo_path: 部分模型在加载时依赖于github repo。为了避免`git clone`时遇到网络问题，可以直接使用本地repo。该参数需要传入本地repo的路径, 默认为`None`。
+- init_strategy: 加载模型时，初始化模型中所有未初始化的参数。可选为'zero', 'uniform', 'normal', 'xavier_uniform', 'xavier_normal', 'kaiming_uniform', 'kaiming_normal', 'orthogonal'。默认为None。
+### 数据参数
+- 🔥dataset: 数据集id或路径的list。默认为`[]`。每个数据集的传入格式为：`数据集id or 数据集路径:子数据集#采样数量`，其中子数据集和取样数据可选。本地数据集支持jsonl、csv、json、文件夹等。开源数据集可以通过git clone到本地并将文件夹传入而离线使用。自定义数据集格式可以参考[自定义数据集](../Customization/自定义数据集.md)。你可以传入`--dataset <dataset1> <dataset2>`来使用多个数据集。
+  - 子数据集: 该参数只有当dataset为ID或者文件夹时生效。若注册时指定了subsets，且只有一个子数据集，则默认选择注册时指定的子数据集，否则默认为'default'。你可以使用`/`来选择多个子数据集，例如：`<dataset_id>:subset1/subset2`。你也可以使用'all'来选择所有的子数据集，例如：`<dataset_id>:all`。
+  - 采样数量: 默认使用完整的数据集。若采样数少于数据样本总数，则进行随机选择（不重复采样）。若采样数高于数据样本总数，则只额外随机采样`采样数%数据样本总数`的样本，数据样本重复采样`采样数//数据样本总数`次。注意：流式数据集只进行顺序采样。若设置`--dataset_shuffle false`，则非流式数据集也进行顺序采样。
+- 🔥val_dataset: 验证集id或路径的list。默认为`[]`。
+- 🔥split_dataset_ratio: 不指定val_dataset时如何拆分训练集和验证集，默认为0.01。若不需要切分验证集，设置为0即可。
+- data_seed: 数据集随机种子，默认为42。
+- 🔥dataset_num_proc: 数据集预处理的进程数，默认为1。
+- 🔥load_from_cache_file: 是否从缓存中加载数据集，默认为False。
+- dataset_shuffle: 是否对dataset进行随机操作。默认为True。
+  - 注意：CPT/SFT的随机包括两个部分：数据集的随机，由`dataset_shuffle`控制；train_dataloader中的随机，由`train_dataloader_shuffle`控制。
+- val_dataset_shuffle: 是否对val_dataset进行随机操作。默认为False。
+- 🔥streaming: 流式读取并处理数据集，默认False。通常在处理大型数据集时，设置为True。
+  - 注意：需要额外设置`--max_steps`，因为流式数据集无法获得其长度。你可以通过设置`--save_strategy epoch`并设置较大的max_steps来实现与`--num_train_epochs`等效的训练。或者，你也可以设置`max_epochs`确保训练到对应epochs时退出训练，并对权重进行验证和保存。
+- interleave_prob: 默认值为 None。在组合多个数据集时，默认使用 `concatenate_datasets` 函数；如果设置了该参数，则会使用 `interleave_datasets` 函数。该参数通常用于流式数据集的组合，并会作为参数传入 `interleave_datasets` 函数中。
+- stopping_strategy: 可选为"first_exhausted", "all_exhausted"，默认为"first_exhausted"。传入interleave_datasets函数中。
+- shuffle_buffer_size: 该参数用于指定流式数据集的随机buffer大小，默认为1000。
+- download_mode: 数据集下载模式，包含`reuse_dataset_if_exists`和`force_redownload`，默认为reuse_dataset_if_exists。
+- columns: 用于对数据集进行列映射，使数据集满足AutoPreprocessor可以处理的样式，具体查看[这里](../Customization/自定义数据集.md)。你可以传入json字符串，例如：`'{"text1": "query", "text2": "response"}'`，默认为None。
+- strict: 如果为True，则数据集只要某行有问题直接抛错，否则会丢弃出错数据样本。默认False。
+- remove_unused_columns: 是否删除数据集中不被使用的列，默认为True。
+- 🔥model_name: 仅用于自我认知任务，只对`swift/self-cognition`数据集生效，替换掉数据集中的`{{NAME}}`通配符。传入模型中文名和英文名，以空格分隔，例如：`--model_name 小黄 'Xiao Huang'`。默认为None。
+- 🔥model_author: 仅用于自我认知任务，只对`swift/self-cognition`数据集生效，替换掉数据集中的`{{AUTHOR}}`通配符。传入模型作者的中文名和英文名，以空格分隔，例如：`--model_author '魔搭' 'ModelScope'`。默认为None。
+- custom_dataset_info: 自定义数据集注册的json文件路径，参考[自定义数据集](../Customization/自定义数据集.md)。默认为`[]`。
+### 模板参数
+- 🔥template: 对话模板类型。默认为None，自动选择对应model的template类型。
+- 🔥system: 自定义system字段，可以传入字符串或者txt文件路径。默认为None，使用template的默认system。
+  - 注意：数据集中的system优先级最高，然后是`--system`，最后是定义在template中的`default_system`。
+- 🔥max_length: 单样本的tokens最大长度。默认为None，设置为模型支持的tokens最大长度(max_model_len)。
+  - 注意：PPO、GRPO和推理情况下，max_length代表max_prompt_length。
+- truncation_strategy: 如果单样本的tokens超过`max_length`如何处理，支持`delete`, `left`和`right`，代表删除、左侧裁剪和右侧裁剪，默认为'delete'。
+- 🔥max_pixels: 多模态模型输入图片的最大像素数（H\*W），将超过该限制的图像进行缩放。默认为None，不限制最大像素数。
+- 🔥agent_template: Agent模板，确定如何将工具列表转换成system，如何从模型回复中提取toolcall，以及确定`{"role": "tool_call", "content": "xxx"}`, `{"role": "tool_response", "content": "xxx"}`的模板格式。可选为"react_en", "hermes", "glm4", "qwen_en", "toolbench"等，更多请查看[这里](https://github.com/modelscope/ms-swift/blob/main/swift/plugin/agent_template/__init__.py)。默认为None，根据模型类型进行选择。
+- response_prefix: response的前缀字符，例如QwQ-32B将response_prefix设置为`'<think>\n'`。默认为None，根据模型自动设置。
+  - 注意：若对deepseek-r1/qwq模型使用不包含`<think>...</think>`的数据集进行训练，请加在推理训练后模型时额外传入`--response_prefix ''`。
+- padding_side: 当训练`batch_size>=2`时的padding_side，可选值为'left'、'right'，默认为'right'。（推理时的batch_size>=2时，只进行左padding）。
+- loss_scale: 训练tokens的loss权重设置。默认为`'default'`，代表所有response（含history）以1计算交叉熵损失，忽略对应agent_template的`tool_response`的损失。可选值为'default'、'last_round'、'all'、'ignore_empty_think'，以及agent需要的loss_scale: 'react'、'hermes'、'qwen'、'agentflan'、'alpha_umi'。agent部分可以查看[插件化](../Customization/插件化.md)和[Agent文档](./Agent支持.md)。
+  - 'last_round': 只计算最后一轮response的损失。
+  - 'all': 计算所有tokens的损失。
+  - 'ignore_empty_think': 在`'default'`的基础上，忽略空的`'<think>\n\n</think>\n\n'`损失计算，具体请参考[此issue](https://github.com/modelscope/ms-swift/issues/4030)。
+  - 'react', 'hermes', 'qwen': 在`'default'`的基础上，将`tool_call`部分的loss权重调整为2。
+- sequence_parallel_size: 序列并行大小，默认是1。当前支持pt/sft/dpo。训练脚本参考[这里](https://github.com/modelscope/ms-swift/tree/main/examples/train/long_text/sequence_parallel.sh)。
+- use_chat_template: 使用chat模板或generation模板，默认为`True`。`swift pt`会自动设置为generation模板。
+- template_backend: 选择template后端，可选为'swift'、'jinja'，默认为'swift'。如果使用jinja，则使用transformers的`apply_chat_template`。
+  - 注意：jinja的template后端只支持推理，不支持训练。
+### 生成参数
+参考[generation_config](https://huggingface.co/docs/transformers/main_classes/text_generation#transformers.GenerationConfig)文档。
+- 🔥max_new_tokens: 推理最大生成新tokens的数量。默认为None，无限制。
+- temperature: 温度参数。默认为None，读取generation_config.json。
+  - 注意：do_sample参数在本版本中移除了，请将temperature配置为0来达到相同效果。
+- top_k: top_k参数，默认为None。读取generation_config.json。
+- top_p: top_p参数，默认为None。读取generation_config.json。
+- repetition_penalty: 重复惩罚项。默认为None，读取generation_config.json。
+- num_beams: beam search的并行保留数量，默认为1。
+- 🔥stream: 流式输出，默认为`False`。
+- stop_words: 除了eos_token外额外的停止词，默认为`[]`。
+  - 注意：eos_token会在输出respsone中被删除，额外停止词会在输出中保留。
+- logprobs: 是否输出logprobs，默认为False。
+- top_logprobs: 输出top_logprobs的数量，默认为None。
+### 量化参数
+以下为加载模型时量化的参数，具体含义可以查看[量化](https://huggingface.co/docs/transformers/main/en/main_classes/quantization)文档。这里不包含`swift export`中涉及的`gptq`、`awq`量化参数。
+- 🔥quant_method: 加载模型时采用的量化方法，可选项为`bnb`、`hqq`、`eetq`。
+- 🔥quant_bits: 量化bits数，默认为None。
+- hqq_axis: hqq量化axis，默认为None。
+- bnb_4bit_compute_dtype: bnb量化计算类型，可选为`float16`、`bfloat16`、`float32`。默认为None，设置为`torch_dtype`。
+- bnb_4bit_quant_type: bnb量化类型，支持`fp4`和`nf4`，默认为`nf4`。
+- bnb_4bit_use_double_quant: 是否使用双重量化，默认为`True`。
+- bnb_4bit_quant_storage: bnb量化存储类型，默认为None。
+## 原子参数
+### Seq2SeqTrainer参数
+该参数列表继承自transformers `Seq2SeqTrainingArguments`，ms-swift对其默认值进行了覆盖。未列出的请参考[HF官方文档](https://huggingface.co/docs/transformers/main/en/main_classes/trainer#transformers.Seq2SeqTrainingArguments)。
+- 🔥output_dir: 默认为None，设置为`output/<model_name>`。
+- 🔥gradient_checkpointing: 是否使用gradient_checkpointing，默认为True。
+- 🔥deepspeed: 默认为None。可以设置为'zero0', 'zero1', 'zero2', 'zero3', 'zero2_offload', 'zero3_offload'来使用ms-swift内置的deepspeed配置文件。
+- zero_hpz_partition_size: 默认为None，这个参数是ZeRO++的特性，即node内模型分片，node间数据分片，如果遇到grad_norm NaN，请尝试使用`--torch_dtype float16`。
+- 🔥per_device_train_batch_size: 默认值1。
+- 🔥per_device_eval_batch_size: 默认值1。
+- 🔥gradient_accumulation_steps: 梯度累加，默认为None，即设置gradient_accumulation_steps使得total_batch_size>=16。total_batch_size等于`per_device_train_batch_size * gradient_accumulation_steps * world_size`。
+- weight_decay: weight衰减系数，默认值0.1。
+- adam_beta2: 默认为0.95。
+- 🔥learning_rate: 学习率，全参数默认为1e-5，LoRA等tuners为1e-4。
+- lr_scheduler_type: lr_scheduler类型，默认为'cosine'。
+- lr_scheduler_kwargs: lr_scheduler其他参数。默认为None。
+- 🔥gradient_checkpointing_kwargs: 传入`torch.utils.checkpoint`中的参数。例如设置为`--gradient_checkpointing_kwargs '{"use_reentrant": false}'`。默认为None。
+- full_determinism: 确保训练中获得可重现的结果，注意：这会对性能产生负面影响。默认为False。
+- 🔥report_to: 默认值为`tensorboard`。你也可以指定`--report_to tensorboard wandb swanlab`、`--report_to all`。
+- logging_first_step: 是否记录第一个step的日志，默认为True。
+- logging_steps: 日志打印间隔，默认为5。
+- predict_with_generate: 验证时使用生成式的方式，默认为False。
+- metric_for_best_model: 默认为None，即当`predict_with_generate`设置为False时，设置为'loss'，否则设置为'rouge-l'（在PPO训练时，不进行默认值设置；GRPO训练设置为'reward'）。
+- greater_is_better: 默认为None，即当`metric_for_best_model`含'loss'时，设置为False，否则设置为True。
+- max_epochs: 训练到`max_epochs`时强制退出训练，并对权重进行验证和保存。该参数在使用流式数据集时很有用。默认为None。
+其他重要参数：
+- 🔥num_train_epochs: 训练的epoch数，默认为3。
+- 🔥save_strategy: 保存模型的策略，可选为'no'、'steps'、'epoch'，默认为'steps'。
+- 🔥save_steps: 默认为500。
+- 🔥eval_strategy: 评估策略。默认为None，跟随`save_strategy`的策略。
+- 🔥eval_steps: 默认为None，如果存在评估数据集，则跟随`save_steps`的策略。
+- 🔥save_total_limit: 最多保存的checkpoint数，会将过期的checkpoint进行删除。默认为None，保存所有的checkpoint。
+- max_steps: 最大训练的steps数。在数据集为流式时需要被设置。默认为-1。
+- 🔥warmup_ratio: 默认为0.。
+- save_on_each_node: 默认为False。在多机训练时需要被考虑。
+- save_only_model: 是否只保存模型权重而不包含优化器状态，随机种子状态等内容。默认为False。
+- 🔥resume_from_checkpoint: 断点续训参数，传入checkpoint路径。默认为None。
+  - 注意: resume_from_checkpoint会读取模型权重，优化器权重，随机种子，并从上次训练的steps继续开始训练。你可以指定`--resume_only_model`只读取模型权重。
+- 🔥ddp_backend: 可选为"nccl"、"gloo"、"mpi"、"ccl"、"hccl" 、"cncl"、"mccl"。默认为None，进行自动选择。
+- 🔥ddp_find_unused_parameters: 默认为None。
+- ddp_timeout: 默认为1800，单位为秒。
+- 🔥dataloader_num_workers: 默认为None，若是windows平台，则设置为0，否则设置为1。
+- dataloader_pin_memory: 默认为True。
+- dataloader_persistent_workers: 默认为False。
+- dataloader_prefetch_factor: 默认为None，若`dataloader_num_workers > 0`，设置为10。
+- train_dataloader_shuffle: CPT/SFT训练的dataloader是否随机，默认为True。该参数对IterableDataset无效。IterableDataset采用顺序的方式读取。
+- 🔥neftune_noise_alpha: neftune添加的噪声系数, 默认为0，通常可以设置为5、10、15。
+- 🔥use_liger_kernel: 是否启用[Liger](https://github.com/linkedin/Liger-Kernel)内核加速训练并减少显存消耗。默认为False。示例shell参考[这里](https://github.com/modelscope/ms-swift/blob/main/examples/train/liger)。
+- average_tokens_across_devices: 是否在设备之间进行token数平均。如果设置为True，将使用all_reduce同步`num_tokens_in_batch`以进行精确的损失计算。默认为False。
+- max_grad_norm: 梯度裁剪。默认为1.。
+- push_to_hub: 推送checkpoint到hub。默认为False。
+- hub_model_id: 默认为None。
+- hub_private_repo: 默认为False。
+### Tuner参数
+- 🔥freeze_llm: 该参数只对多模态模型生效，可用于全参和LoRA，但含义不同。若是全参数训练，将freeze_llm设置为True将会将llm部分权重进行冻结，若是LoRA训练且`target_modules`设置为'all-linear'，将freeze_llm设置为True将会取消在llm部分添加LoRA模块。该参数默认为False。
+- 🔥freeze_vit: 该参数只对多模态模型生效，可用于全参和LoRA，含义参考`freeze_llm`。默认为True。
+- 🔥freeze_aligner: 该参数只对多模态模型生效，可用于全参和LoRA，含义参考`freeze_llm`。默认为True。
+- 🔥target_modules: 指定lora模块, 默认为`all-linear`. 在LLM和多模态LLM中，其行为有所不同. 若是LLM则自动寻找除lm_head外的linear并附加tuner，若是多模态LLM，则默认只在LLM上附加tuner，该行为可以被`freeze_llm`、`freeze_vit`、`freeze_aligner`控制。该参数���限于LoRA，可用于其他tuners。
+- 🔥target_regex: 指定lora模块的regex表达式，默认为`None`。如果该值传入，则target_modules参数失效。该参数不限于LoRA，可用于其他tuners。
+- init_weights: 初始化weights的方法，LoRA可以指定为`true`、`false`、`gaussian`、`pissa`、`pissa_niter_[number of iters]`，Bone可以指定为`true`、`false`、`bat`。默认值`true`。
+- 🔥modules_to_save: 在已附加tuner后，额外指定一部分原模型模块参与训练和存储。默认为`[]`. 该参数不限于LoRA，可用于其他tuners。
+#### 全参
+- freeze_parameters: 需要被冻结参数的前缀，默认为`[]`。
+- freeze_parameters_regex: 需要被冻结参数的正则表达式，默认为None。
+- freeze_parameters_ratio: 从下往上冻结的参数比例，默认为0。可设置为1将所有参数冻结，结合`trainable_parameters`设置可训练参数。
+- trainable_parameters: 额外可训练参数的前缀，默认为`[]`。
+- trainable_parameters_regex: 匹配额外可训练参数的正则表达式，默认为None。
+  - 备注：`trainable_parameters`、`trainable_parameters_regex`的优先级高于`freeze_parameters`、`freeze_parameters_regex`和`freeze_parameters_ratio`。当指定全参数训练时，会将所有模块设置为可训练的状态，随后根据`freeze_parameters`、`freeze_parameters_regex`、`freeze_parameters_ratio`将部分参数冻结，最后根据`trainable_parameters`、`trainable_parameters_regex`重新打开部分参数参与训练。
+#### LoRA
+- 🔥lora_rank: 默认为`8`。
+- 🔥lora_alpha: 默认为`32`。
+- lora_dropout: 默认为`0.05`。
+- lora_bias: 默认为`'none'`，可以选择的值: 'none'、'all'。如果你要将bias全都设置为可训练，你可以设置为`'all'`。
+- lora_dtype: 指定lora模块的dtype类型。支持'float16'、'bfloat16'、'float32'。默认为None，跟随原模型类型。
+- 🔥use_dora: 默认为`False`，是否使用`DoRA`。
+- use_rslora: 默认为`False`，是否使用`RS-LoRA`。
+- 🔥lorap_lr_ratio: LoRA+参数，默认值`None`，建议值`10~16`。使用lora时指定该参数可使用lora+。
+##### LoRA-GA
+- lora_ga_batch_size: 默认值为 `2`。在 LoRA-GA 中估计梯度以进行初始化时使用的批处理大小。
+- lora_ga_iters: 默认值为 `2`。在 LoRA-GA 中估计梯度以进行初始化时的迭代次数。
+- lora_ga_max_length: 默认值为 `1024`。在 LoRA-GA 中估计梯度以进行初始化时的最大输入长度。
+- lora_ga_direction: 默认值为 `ArB2r`。在 LoRA-GA 中使用估计梯度进行初始化时的初始方向。允许的值有：`ArBr`、`A2rBr`、`ArB2r` 和 `random`。
+- lora_ga_scale: 默认值为 `stable`。LoRA-GA 的初始化缩放方式。允许的值有：`gd`、`unit`、`stable` 和 `weightS`。
+- lora_ga_stable_gamma: 默认值为 `16`。当初始化时选择 `stable` 缩放时的 gamma 值。
+#### FourierFt
+FourierFt使用`target_modules`, `target_regex`, `modules_to_save`三个参数.
+- fourier_n_frequency: 傅里叶变换的频率数量, `int`类型, 类似于LoRA中的`r`. 默认值`2000`.
+- fourier_scaling: W矩阵的缩放值, `float`类型, 类似LoRA中的`lora_alpha`. 默认值`300.0`.
+#### BOFT
+BOFT使用`target_modules`, `target_regex`, `modules_to_save`三个参数.
+- boft_block_size: BOFT块尺寸, 默认值4.
+- boft_block_num: BOFT块数量, 不能和`boft_block_size`同时使用.
+- boft_dropout: boft的dropout值, 默认0.0.
+#### Vera
+Vera使用`target_modules`, `target_regex`, `modules_to_save`三个参数.
+- vera_rank: Vera Attention的尺寸, 默认值256.
+- vera_projection_prng_key: 是否存储Vera映射矩阵, 默认为True.
+- vera_dropout: Vera的dropout值, 默认`0.0`.
+- vera_d_initial: Vera的d矩阵的初始值, 默认`0.1`.
+#### GaLore
+- 🔥use_galore: 默认值False, 是否使用GaLore.
+- galore_target_modules: 默认值None, 不传的情况下对attention和mlp应用GaLore.
+- galore_rank: 默认值128, GaLore的rank值.
+- galore_update_proj_gap: 默认值50, 分解矩阵的更新间隔.
+- galore_scale: 默认值1.0, 矩阵权重系数.
+- galore_proj_type: 默认值`std`, GaLore矩阵分解类型.
+- galore_optim_per_parameter: 默认值False, 是否给每个Galore目标Parameter设定一个单独的optimizer.
+- galore_with_embedding: 默认值False, 是否对embedding应用GaLore.
+- galore_quantization: 是否使用q-galore. 默认值`False`.
+- galore_proj_quant: 是否对SVD分解矩阵做量化, 默认`False`.
+- galore_proj_bits: SVD量化bit数.
+- galore_proj_group_size: SVD量化分组数.
+- galore_cos_threshold: 投影矩阵更新的cos相似度阈值. 默认值0.4.
+- galore_gamma_proj: 在投影矩阵逐渐相似后会拉长更新间隔, 本参数为每次拉长间隔的系数, 默认值2.
+- galore_queue_size: 计算投影矩阵相似度的队列长度, 默认值5.
+#### LISA
+注意:LISA仅支持全参数，即`--train_type full`.
+- 🔥lisa_activated_layers: 默认值`0`, 代表不使用LISA，改为非0代表需要激活的layers个数，建议设置为2或8.
+- lisa_step_interval: 默认值`20`, 多少iter切换可反向传播的layers.
+#### UNSLOTH
+🔥unsloth无新增参数，对已有参数进行调节即可支持:
+```
+--tuner_backend unsloth
+--train_type full/lora
+--quant_bits 4
+```
+#### LLAMAPRO
+- 🔥llamapro_num_new_blocks: 默认值`4`, 插入的新layers总数.
+- llamapro_num_groups: 默认值`None`, 分为多少组插入new_blocks, 如果为`None`则等于`llamapro_num_new_blocks`, 即每个新的layer单独插入原模型.
+#### AdaLoRA
+以下参数`train_type`设置为`adalora`时生效. adalora的`target_modules`等参数继承于lora的对应参数, 但`lora_dtype`参数不生效.
+- adalora_target_r: 默认值`8`, adalora的平均rank.
+- adalora_init_r: 默认值`12`, adalora的初始rank.
+- adalora_tinit: 默认值`0`, adalora的初始warmup.
+- adalora_tfinal: 默认值`0`, adalora的final warmup.
+- adalora_deltaT: 默认值`1`, adalora的step间隔.
+- adalora_beta1: 默认值`0.85`, adalora的EMA参数.
+- adalora_beta2: 默认值`0.85`, adalora的EMA参数.
+- adalora_orth_reg_weight: 默认值`0.5`, adalora的正则化参数.
+#### ReFT
+以下参数`train_type`设置为`reft`时生效.
+> 1. ReFT无法合并tuner
+> 2. ReFT和gradient_checkpointing不兼容
+> 3. 如果使用DeepSpeed遇到问题请暂时卸载DeepSpeed
+- 🔥reft_layers: ReFT应用于哪些层上, 默认为`None`, 代表所有层, 可以输入层号的list, 例如reft_layers 1 2 3 4`
+- 🔥reft_rank: ReFT矩阵的rank, 默认为`4`.
+- reft_intervention_type: ReFT的类型, 支持'NoreftIntervention', 'LoreftIntervention', 'ConsreftIntervention', 'LobireftIntervention', 'DireftIntervention', 'NodireftIntervention', 默认为`LoreftIntervention`.
+- reft_args: ReFT Intervention中的其他支持参数, 以json-string格式输入.
+### LMDeploy参数
+参数含义可以查看[lmdeploy文档](https://lmdeploy.readthedocs.io/en/latest/api/pipeline.html#turbomindengineconfig)。
+- 🔥tp: tensor并行度。默认为`1`。
+- session_len: 默认为`None`。
+- cache_max_entry_count: 默认为`0.8`。
+- quant_policy: 默认为`0`。
+- vision_batch_size: 默认为`1`。
+### vLLM参数
+参数含义可以查看[vllm文档](https://docs.vllm.ai/en/latest/serving/engine_args.html)。
+- 🔥gpu_memory_utilization: 默认值`0.9`。
+- 🔥tensor_parallel_size: 默认为`1`。
+- pipeline_parallel_size: 默认为`1`。
+- max_num_seqs: 默认为`256`。
+- 🔥max_model_len: 默认为`None`。
+- disable_custom_all_reduce: 默认为`False`。
+- enforce_eager: vllm使用pytorch eager模式还是建立cuda graph，默认为`False`。设置为True可以节约显存，但会影响效率。
+- 🔥limit_mm_per_prompt: 控制vllm使用多图，默认为`None`。例如传入`--limit_mm_per_prompt '{"image": 5, "video": 2}'`。
+- vllm_max_lora_rank: 默认为`16`。vllm对于lora支持的参数。
+- vllm_quantization: vllm可以在内部量化模型，参数支持的值详见[这里](https://docs.vllm.ai/en/latest/serving/engine_args.html)。
+- enable_prefix_caching: 开启vllm的自动前缀缓存，节约重复查询前缀的处理时间。默认为`False`。
+### 合并参数
+- 🔥merge_lora: 是否合并lora，本参数支持lora、llamapro、longlora，默认为False。例子参数[这里](https://github.com/modelscope/ms-swift/blob/main/examples/export/merge_lora.sh)。
+- safe_serialization: 是否存储safetensors，默认为True。
+- max_shard_size: 单存储文件最大大小，默认'5GB'。
+## 集成参数
+### 训练参数
+训练参数除包含[基本参数](#基本参数)、[Seq2SeqTrainer参数](#Seq2SeqTrainer参数)、[tuner参数](#tuner参数)外，还包含下面的部分:
+- add_version: 在output_dir上额外增加目录`'<版本号>-<时间戳>'`防止权重覆盖，默认为True。
+- resume_only_model: 默认为False。如果在指定resume_from_checkpoint的基础上，将该参数设置为True，则仅resume模型权重。
+- check_model: 检查本地模型文件有损坏或修改并给出提示，默认为True。如果是断网环境，请设置为False。
+- 🔥create_checkpoint_symlink: 额外创建checkpoint软链接，方便书写自动化训练脚本。best_model和last_model的软链接路径分别为f'{output_dir}/best'和f'{output_dir}/last'。
+- loss_type: loss类型。默认为None，使用模型自带损失函数。
+- 🔥packing: 是否使用序列packing提升计算效率，默认为False。
+  - 注意：使用packing请结合`--attn_impl flash_attn`使用且"transformers>=4.44"，具体查看[该PR](https://github.com/huggingface/transformers/pull/31629)。
+- 🔥lazy_tokenize: 是否使用lazy_tokenize。若该参数设置为False，则在训练之前对所有的数据集样本进行tokenize（多模态模型则包括从磁盘中读取图片）。该参数在LLM训练中默认设置为False，而MLLM训练默认为True，节约内存。
+- acc_strategy: 训练和验证时计算acc的策略。可选为`seq`和`token`级别的acc，默���为`token`。
+- max_new_tokens: 覆盖生成参数。predict_with_generate=True时的最大生成token数量，默认64。
+- temperature: 覆盖生成参数。predict_with_generate=True时的temperature，默认0。
+- optimizer: plugin的自定义optimizer名称，默认为None。
+- metric: plugin的自定义metric名称。默认为None，即在predict_with_generate=False的情况下设置为'acc'，在predict_with_generate=True的情况下设置为'nlg'。
+- eval_use_evalscope: 是否使用evalscope进行训练时评测，需要设置该参数来开启评测，具体使用参考[示例](../Instruction/评测.md#训练中评测)。
+- eval_datasets: 评测数据集，可设置多个数据集，用空格分割。
+- eval_datasets_args: 评测数据集参数，json格式，可设置多个数据集的参数。
+- eval_limit: 评测数据集采样数。
+- eval_generation_config: 评测时模型推理配置，json格式，默认为`{'max_tokens': 512}`。
+### RLHF参数
+RLHF参数继承于[训练参数](#训练参数)。
+- 🔥rlhf_type: 人类对齐算法类型，支持`dpo`、`orpo`、`simpo`、`kto`、`cpo`、`rm`、`ppo`和`grpo`。默认为'dpo'。
+- ref_model: 采用dpo、kto、ppo、grpo算法且使用全参数训练时需要传入。默认为None。
+- ref_model_type: 同model_type。默认为None。
+- ref_model_revision: 同model_revision。默认为None。
+- 🔥beta: KL正则项系数，默认为`None`，即`simpo`算法默认为`2.`，GRPO默认为`0.04`，其他算法默认为`0.1`。具体参考[文档](./人类对齐.md)。
+- label_smoothing: 是否使用DPO smoothing，默认值为`0`。
+- 🔥rpo_alpha: 控制DPO中加入sft_loss的权重，默认为`1`。最后的loss为`KL_loss + rpo_alpha * sft_loss`。
+- cpo_alpha: CPO/SimPO loss 中 nll loss的系数, 默认为`1.`。
+- simpo_gamma: SimPO算法中的reward margin项，论文建议设置为0.5-1.5，默认为`1.`。
+- desirable_weight: KTO算法中对desirable response的loss权重 $\lambda_D$，默认为`1.`。
+- undesirable_weight: KTO算法中对undesirable response的loss权重 $\lambda_U$，默认为`1.`。
+- loss_scale: 覆盖模板参数，默认为'last_round'。
+- temperature: 默认为0.9，该参数将在PPO、GRPO中使用。
+#### Reward模型参数
+reward模型参数将在PPO、GRPO中使用。
+- reward_model: 默认为None。
+- reward_adapters: 默认为`[]`。
+- reward_model_type: 默认为None。
+- reward_model_revision: 默认为None。
+#### PPO参数
+以下参数含义可以参考[这里](https://huggingface.co/docs/trl/main/ppo_trainer)。
+- num_ppo_epochs: 默认为4。
+- whiten_rewards: 默认为False。
+- kl_coef: 默认为0.05。
+- cliprange: 默认为0.2。
+- vf_coef: 默认为0.1。
+- cliprange_value: 默认为0.2。
+- gamma: 默认为1.0。
+- lam: 默认为0.95。
+- num_mini_batches: 默认为1。
+- local_rollout_forward_batch_size: 默认为64。
+- num_sample_generations: 默认为10。
+- response_length: 默认为512。
+- missing_eos_penalty: 默认为None。
+#### GRPO参数
+- per_device_train_batch_size: 每个设备训练批量大小，在GRPO中，指 completion 的批次大小。
+- per_device_eval_batch_size: 每个设备评估批量大小，在GRPO中，指 completion 的批次大小。
+- num_generations: 每个prompt采样的数量，论文中的G值，需要被 per_device_batch_size * gradient_accumulation_steps * nproc_per_node 整除，默认为8。
+- max_completion_length: GRPO算法中的最大生成长度，默认为512。
+- ds3_gather_for_generation: 该参数适用于DeepSpeed ZeRO-3。如果启用，策略模型权重将被收集用于生成，从而提高生成速度。然而，禁用此选项允许训练超出单个GPU VRAM的模型，尽管生成速度会变慢。禁用此选项与vLLM生成不兼容。默认为True。
+- reward_funcs: GRPO算法奖励函数，可选项为`accuracy`、`format`、`cosine` 和 `repetition`，见swift/plugin/orm.py。你也可以在plugin中自定义自己的奖励函数。默认为`[]`。
+- reward_weights: 每个奖励函数的权重。必须与奖励函数和奖励模型的总数量匹配。如果为 None，则所有奖励的权重都相等，为`1.0`。
+  - 提示：如果GRPO训练中包含`--reward_model`，则其加在奖励函数的最后位置。
+- reward_model_plugin: 奖励模型逻辑，默认为orm逻辑, 详细见[自定义奖励模型](./GRPO.md#自定义奖励模型)。
+- dataset_shuffle: 是否对dataset进行随机操作，默认为True。
+- loss_type: loss 归一化的类型，可选项为['grpo', 'bnpo', 'dr_grpo'], 默认为'grpo', 具体查看该[pr](https://github.com/huggingface/trl/pull/3256#discussion_r2033213348)。
+- log_completions: 是否记录训练中的模型生成内容，搭配 `--report_to wandb` 使用。默认为False。
+  - 提示：若没有设置`--report_to wandb`，则会在checkpoint中创建`completions.jsonl`来存储生成内容。
+- use_vllm: 是否使用vLLM作为GRPO生成的infer_backend，默认为False。
+- num_infer_workers: 每个node上推理worker数量，仅对vllm或者lmdeploy时有效。
+- vllm_device: 设置vLLM部署的设备，可以设置为`auto`，代表按照num_infer_workers数量使用最后的几张卡，否则请传入和num_infer_workers相等数量的设备，例如`--vllm_device cuda:1 cuda:2`。
+- vllm_gpu_memory_utilization: vllm透传参数，默认为0.9。
+- vllm_max_model_len: vllm透传参数，默认为None。
+- vllm_max_num_seqs: vllm透传参数，默认为256。
+- vllm_enforce_eager: vllm透传参数，默认为False。
+- vllm_limit_mm_per_prompt: vllm透传参数，默认为None。
+- vllm_enable_prefix_caching: vllm透传参数，默认为True。
+- vllm_server_host：vLLM server host地址，默认为None，使用外部vLLM server时使用。
+- vllm_server_port vLLM server 服务端口，默认为8000。
+- vllm_server_timeout 连接vLLM server的超时时间，默认为120s。
+- top_k: 默认为50。
+- top_p: 默认为0.9。
+- repetition_penalty: 重复惩罚项。默认为1.。
+- num_iterations: 每个批次代更新次数，默认为1。
+- epsilon: clip 系数，默认为0.2。
+- epsilon_high: upper clip 系数，默认为None，设置后与epsilon共同构成[epsilon, epsilon_high]裁剪范围。
+- async_generate: 异步rollout以提高训练速度，默认`false`。
+- sleep_level: vllm特有参数，在训练和rollout复用卡的时候，可以选择vllm进行offload。
+- move_model_batches: 在模型向vLLM/LMDeploy等快速推理框架移动参数时，将layers分为多少个batch. 默认为None, 代表整个模型不进行拆分，否则拆分为move_model_batches+1(非layer参数)+1(多模态部分参数)个。
+- offload_optimizer: 是否在vLLM/LMDeploy推理时offload optimizer参数，默认为False。
+- offload_model: 是否在vLLM/LMDeploy推理时offload 模型本身，默认为False。
+  - 注意：若该参数设置为True，训练时grad_norm一直为0，请安装`vllm==0.7.3`。
+- gc_collect_after_offload: 是否在offload结束时进行gc（python gc和GPU gc），默认为False。
+- multi_turn_func: 多轮GRPO参数, 传入对应的plugin名称, 同时在plugin/multi_turn.py中添加好对应的实现。
+- dynamic_sample：筛除group内奖励标准差为0的数据，额外采样新数据，默认为False。
+- max_resample_times：dynamic_sample设置下限制重采样次数，默认3次。
+- overlong_filter：跳过超长截断的样本，不参与loss计算，默认为False。
+cosine 奖励参数
+- cosine_min_len_value_wrong：cosine 奖励函数参数，生成错误答案时，最小长度对应的奖励值。默认值为0.0。
+- cosine_max_len_value_wrong：生成错误答案时，最大长度对应的奖励值。默认值为-0.5。
+- cosine_min_len_value_correct：生成正确答案时，最小长度对应的奖励值。默认值为1.0。
+- cosine_max_len_value_correct：生成正确答案时，最大长度对应的奖励值。默认值为0.5。
+- cosine_max_len：生成文本的最大长度限制。默认等于 max_completion_length。
+repetition 奖励参数
+- repetition_n_grams：用于检测重复的 n-gram 大小。默认值为3。
+- repetition_max_penalty：最大惩罚值，用于控制惩罚的强度。默认值为-1.0。
+soft overlong 奖励参数
+- soft_max_length: 论文中的L_max，模型的最大生成长度，默认等于max_completion_length。
+- soft_cache_length: 论文中的L_cache，控制长度惩罚区间，区间为[soft_max_length-soft_cache_length, soft_max_length]。
+#### SWANLAB
+- swanlab_token: SwanLab的api-key。
+- swanlab_project: swanlab的project，需要在页面中预先创建好:[https://swanlab.cn/space/~](https://swanlab.cn/space/~)。
+- swanlab_workspace: 默认为None，会使用api-key对应的username。
+- swanlab_exp_name: 实验名，可以为空，为空时默认传入--output_dir的值。
+- swanlab_mode: 可选cloud和local，云模式或者本地模式。
+### 推理参数
+推理参数除包含[基本参数](#基本参数)、[合并参数](#合并参数)、[vLLM参数](#vllm参数)、[LMDeploy参数](#LMDeploy参数)外，还包含下面的部分：
+- 🔥infer_backend: 推理加速后端，支持'pt'、'vllm'、'lmdeploy'三种推理引擎。默认为'pt'。
+- 🔥max_batch_size: 指定infer_backend为pt时生效，用于批量推理，默认为1。
+- ddp_backend: 指定infer_backend为pt时生效，用于指定多卡推理时的分布式后端，默认为None，进行自动选择。多卡推理例子可以查看[这里](https://github.com/modelscope/ms-swift/tree/main/examples/infer/pt)。
+- 🔥result_path: 推理结果存储路径（jsonl），默认为None，保存在checkpoint目录（含args.json文件）或者'./result'目录，最终存储路径会在命令行中打印。
+- metric: 对推理的结果进行评估，目前支持'acc'和'rouge'。默认为None，即不进行评估。
+- val_dataset_sample: 推理数据集采样数，默认为None。
+### 部署参数
+部署参数继承于[推理参数](#推理参数)。
+- host: 服务host，默认为'0.0.0.0'。
+- port: 端口号，默认为8000。
+- api_key: 访问需要使用的api_key，默认为None。
+- owned_by: 默认为`swift`。
+- 🔥served_model_name: 提供服务的模型名称，默认使用model的后缀。
+- verbose: 打印详细日志，默认为True。
+  - 注意：在`swift app`或者`swift eval`时，默认为False。
+- log_interval: tokens/s统计值打印间隔，默认20秒。设置为-1则不打印。
+- max_logprobs: 最多返回客户端的logprobs数量，默认为20。
+- use_async_engine: vLLM backend下是否使用async engine，默认为True。
+### Web-UI参数
+- server_name: web-ui的host，默认为'0.0.0.0'。
+- server_port: web-ui的port，默认为7860。
+- share: 默认为False。
+- lang: web-ui的语言，可选为'zh', 'en'。默认为'zh'。
+### App参数
+App参数继承于[部署参数](#部署参数), [Web-UI参数](#Web-UI参数)。
+- base_url: 模型部署的base_url，例如`http://localhost:8000/v1`。默认为`None`，使用本地部署。
+- studio_title: studio的标题。默认为None，设置为模型名。
+- is_multimodal: 是否启动多模态版本的app。默认为None，自动根据model判断，若无法判断，设置为False。
+- lang: 覆盖Web-UI参数，默认为'en'。
+### 评测参数
+评测参数继承于[部署参数](#部署参数)。
+- 🔥eval_backend: 评测后端，默认为'Native'，也可以指定为'OpenCompass'或'VLMEvalKit'。
+- 🔥eval_dataset: 评测数据集，请查看[评测文档](./评测.md)。
+- eval_limit: 每个评测集的采样数，默认为None。
+- eval_output_dir: 评测存储结果的文件夹，默认为'eval_output'。
+- temperature: 覆盖生成参数，默认为0。
+- eval_num_proc: 评测时客户端最大并发数，默认为16。
+- eval_url: 评测url，例如`http://localhost:8000/v1`。例子可以查看[这里](https://github.com/modelscope/ms-swift/tree/main/examples/eval/eval_url)。默认为None，采用本地部署评估。
+- eval_generation_config: 评测时模型推理配置，需传入json字符串格式，例如：`'{"max_new_tokens": 512}'`；默认为None。
+- extra_eval_args: 额外评测参数，需传入json字符串格式，默认为空。仅对Native评测有效，更多参数说明请查看[这里](https://evalscope.readthedocs.io/zh-cn/latest/get_started/parameters.html)
+- local_dataset: 部分评测集，如`CMB`无法直接运行，需要下载额外数据包才可以使用。设置本参数为`true`可以自动下载全量数据包，并在当前目录下创建`data`文件夹并开始评测。数据包仅会下载一次，后续会使用缓存。该参数默认为`false`。
+  - 注意：默认评测会使用`~/.cache/opencompass`下的数据集，在指定本参数后会直接使用当前目录下的data文件夹。
+### 导出参数
+导出参数除包含[基本参数](#基本参数)和[合并参数](#合并参数)外，还包含下面的部分:
+- 🔥output_dir: 导出结果存储路径。默认为None，会自动设置合适后缀的路径。
+- exist_ok: 如果output_dir存在，不抛出异常，进行覆盖。默认为False。
+- 🔥quant_method: 可选为'gptq'、'awq'、'bnb'，默认为None。例子参考[这里](https://github.com/modelscope/ms-swift/tree/main/examples/export/quantize)。
+- quant_n_samples: gptq/awq的校验集采样数，默认为256。
+- max_length: 校准集的max_length, 默认值2048。
+- quant_batch_size: 量化batch_size，默认为1。
+- group_size: 量化group大小，默认为128。
+- to_ollama: 产生ollama所需的Modelfile文件。默认为False。
+- 🔥to_mcore: HF格式权重转成Megatron格式。默认为False。
+- to_hf: Megatron格式权重转成HF格式。默认为False。
+- mcore_model: mcore格式模型路径。默认为None。
+- thread_count: `--to_mcore true`时的模型切片数。默认为None，根据模型大小自动设置，使得最大分片小于10GB。
+- 🔥test_convert_precision: 测试HF和Megatron格式权重转换的精度误差。默认为False。
+- 🔥push_to_hub: 是否推送hub，默认为False。例子参考[这里](https://github.com/modelscope/ms-swift/blob/main/examples/export/push_to_hub.sh)。
+- hub_model_id: 推送的model_id，默认为None。
+- hub_private_repo: 是否是private repo，默认为False。
+- commit_message: 提交信息，默认为'update files'。
+### 采样参数
+- prm_model: 过程奖励模型的类型，可以是模型id（以pt方式拉起），或者plugin中定义的prm key（自定义推理过程）。
+- orm_model: 结果奖励模型的类型，通常是通配符或测试用例等，一般定义在plugin中。
+- sampler_type：采样类型，目前支持 sample, mcts，未来会支持 dvts。
+- sampler_engine：支持`pt`, `lmdeploy`, `vllm`, `client`, `no`，默认为`pt`，采样模型的推理引擎。
+- sampler_type：采样类型，目前支持sample（do_sample方式），未来会支持mcts和dvts。
+- sampler_engine：支持`pt`, `lmdeploy`, `vllm`, `no`，默认为`pt`，采样模型的推理引擎。
+- output_dir：输出目录，默认为`sample_output`。
+- output_file：输出文件名称，默认为`None`使用时间戳作为文件名。传入时不需要传入目录，仅支持jsonl格式。
+- override_exist_file：如`output_file`存在，是否覆盖。
+- num_sampling_per_gpu_batch_size：每次采样的batch_size。
+- num_sampling_per_gpu_batches：共采样多少batch。
+- n_best_to_keep：返回多少最佳sequences。
+- data_range：本采样处理数据集的分片。传入格式为`2 3`，代表数据集分为3份处理（这意味着通常有三个`swift sample`在并行处理），本实例正在处理第3个分片。
+- temperature：在这里默认为1.0。
+- prm_threshold：PRM阈值，低于该阈值的结果会被过滤掉，默认值为`0`。
+- easy_query_threshold：单个query的所有采样中，ORM评估如果正确，大于该比例的query会被丢弃，防止过于简单的query出现在结果中，默认为`None`，代表不过滤。
+- engine_kwargs：传入sampler_engine的额外参数，以json string传入，例如`{"cache_max_entry_count":0.7}`。
+- num_return_sequences：采样返回的原始sequence数量。默认为64，本参数对`sample`采样有效。
+- cache_files：为避免同时加载prm和generator造成显存OOM，可以分两步进行采样，第一步将prm和orm置为`None`，则所有结果都会输出到文件中，第二次运行采样将sampler_engine置为`no`并传入`--cache_files`为上次采样的输出文件，则会使用上次输出的结果进行prm和orm评估并输出最终结果。
+  - 注意：使用cache_files时，`--dataset`仍然需要传入，这是因为cache_files的id是由原始数据计算的md5，需要把两部分信息结合使用。
+#### MCTS
+- rollout_depth：rollout 时的最大深度，默认为 `5`。
+- rollout_start_depth：开始 rollout 时的深度，低于此深度的节点只会进行 expand 操作，默认为 `3`。
+- max_iterations：mcts 的最大迭代次数，默认为 `100`。
+- process_reward_rate：select 中计算 value 时 process reward 占的比例，默认为 `0.0`，即不使用 PRM。
+- exploration_rate：UCT 算法中的探索参数，值越大越照顾探索次数较小的节点，默认为 `0.5`。
+- api_key：使用 client 作为推理引擎时需要，默认为 `EMPTY`。
+- base_url：使用 client 作为推理引擎时需要，默认为 'https://dashscope.aliyuncs.com/compatible-mode/v1'
+## 特定模型参数
+特定模型参数可以通过`--model_kwargs`或者环境变量进行设置，例如: `--model_kwargs '{"fps_max_frames": 12}'`或者`FPS_MAX_FRAMES=12`。
+### qwen2_vl, qvq, qwen2_5_vl
+参数含义同`qwen_vl_utils`或者`qwen_omni_utils`库，可以查看[这里](https://github.com/QwenLM/Qwen2.5-VL/blob/main/qwen-vl-utils/src/qwen_vl_utils/vision_process.py#L24)。
+- IMAGE_FACTOR: 默认为28。
+- MIN_PIXELS: 默认为`4 * 28 * 28`。
+- 🔥MAX_PIXELS: 默认为`16384 * 28 * 28`，参考[这里](https://github.com/modelscope/ms-swift/blob/main/examples/train/multimodal/ocr.sh#L3)。
+- MAX_RATIO: 默认为200。
+- VIDEO_MIN_PIXELS: 默认为`128 * 28 * 28`。
+- 🔥VIDEO_MAX_PIXELS: 默认为`768 * 28 * 28`，参考[这里](https://github.com/modelscope/ms-swift/blob/main/examples/train/multimodal/video.sh#L7)。
+- VIDEO_TOTAL_PIXELS: 默认为`24576 * 28 * 28`。
+- FRAME_FACTOR: 默认为2。
+- FPS: 默认为2.0。
+- FPS_MIN_FRAMES: 默认为4。
+- 🔥FPS_MAX_FRAMES: 默认为768，参考[这里](https://github.com/modelscope/ms-swift/blob/main/examples/train/multimodal/video.sh#L8)。
+### qwen2_audio
+- SAMPLING_RATE: 默认为16000。
+### qwen2_5_omni
+qwen2_5_omni除了包含qwen2_5_vl和qwen2_audio的模型特定参数外，还包含以下参数：
+- USE_AUDIO_IN_VIDEO: 默认为False。
+- 🔥ENABLE_AUDIO_OUTPUT: 默认为True。若使用zero3进行训练，请设置为False。
+### internvl, internvl_phi3
+参数含义可以查看[这里](https://modelscope.cn/models/OpenGVLab/Mini-InternVL-Chat-2B-V1-5)。
+- MAX_NUM: 默认为12。
+- INPUT_SIZE: 默认为448。
+### internvl2, internvl2_phi3, internvl2_5, internvl3
+参数含义可以查看[这里](https://modelscope.cn/models/OpenGVLab/InternVL2_5-2B)。
+- MAX_NUM: 默认为12。
+- INPUT_SIZE: 默认为448。
+- VIDEO_MAX_NUM: 默认为1。视频的MAX_NUM。
+- VIDEO_SEGMENTS: 默认为8。
+### minicpmv2_6, minicpmo2_6
+- MAX_SLICE_NUMS: 默认为9，参考[这里](https://modelscope.cn/models/OpenBMB/MiniCPM-V-2_6/file/view/master?fileName=config.json&status=1)。
+- VIDEO_MAX_SLICE_NUMS: 默认为1，视频的MAX_SLICE_NUMS，参考[这里](https://modelscope.cn/models/OpenBMB/MiniCPM-V-2_6)。
+- MAX_NUM_FRAMES: 默认为64，参考[这里](https://modelscope.cn/models/OpenBMB/MiniCPM-V-2_6)。
+### minicpmo2_6
+- INIT_TTS: 默认为False。
+- INIT_AUDIO: 默认为False。
+### ovis1_6, ovis2
+- MAX_PARTITION: 默认为9，参考[这里](https://github.com/AIDC-AI/Ovis/blob/d248e34d755a95d24315c40e2489750a869c5dbc/ovis/model/modeling_ovis.py#L312)。
+### mplug_owl3, mplug_owl3_241101
+- MAX_NUM_FRAMES: 默认为16，��考[这里](https://modelscope.cn/models/iic/mPLUG-Owl3-7B-240728)。
+### xcomposer2_4khd
+- HD_NUM: 默认为55，参考[这里](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-xcomposer2-4khd-7b)。
+### xcomposer2_5
+- HD_NUM: 图片数量为1时，默认值为24。大于1，默认为6。参考[这里](https://modelscope.cn/models/AI-ModelScope/internlm-xcomposer2d5-7b/file/view/master?fileName=modeling_internlm_xcomposer2.py&status=1#L254)。
+### video_cogvlm2
+- NUM_FRAMES: 默认为24，参考[这里](https://github.com/THUDM/CogVLM2/blob/main/video_demo/inference.py#L22)。
+### phi3_vision
+- NUM_CROPS: 默认为4，参考[这里](https://modelscope.cn/models/LLM-Research/Phi-3.5-vision-instruct)。
+### llama3_1_omni
+- N_MELS: 默认为128，参考[这里](https://github.com/ictnlp/LLaMA-Omni/blob/544d0ff3de8817fdcbc5192941a11cf4a72cbf2b/omni_speech/infer/infer.py#L57)。
+### video_llava
+- NUM_FRAMES: 默认为16。
+## 其他环境变量
+- CUDA_VISIBLE_DEVICES: 控制使用哪些GPU卡。默认使用所有卡。
+- ASCEND_RT_VISIBLE_DEVICES: 控制使用哪些NPU卡（ASCEND卡生效）。默认使用所有卡。
+- MODELSCOPE_CACHE: 控制缓存路径。
+- NPROC_PER_NODE: torchrun中`--nproc_per_node`的参数透传。默认为1。若设置了`NPROC_PER_NODE`或者`NNODES`环境变量，则使用torchrun启动训练或推理。
+- MASTER_PORT: torchrun中`--master_port`的参数透传。默认为29500。
+- MASTER_ADDR: torchrun中`--master_addr`的参数透传。
+- NNODES: torchrun中`--nnodes`的参数透传。
+- NODE_RANK: torchrun中`--node_rank`的参数透传。
+- LOG_LEVEL: 日志的level，默认为'INFO'，你可以设置为'WARNING', 'ERROR'等。
+- SWIFT_DEBUG: 在`engine.infer(...)`时，若设置为'1'，则会打印input_ids和generate_ids的内容。

ms-swift/docs/source/Instruction/导出与推送.md ADDED Viewed

	@@ -0,0 +1,55 @@

+# 导出与推送
+## Merge LoRA
+- 查看[这里](https://github.com/modelscope/ms-swift/blob/main/examples/export/merge_lora.sh)。
+## 量化
+SWFIT支持AWQ、GPTQ、BNB模型的量化导出。其中使用AWQ、GPTQ需使用校准数据集，量化性能较好但量化耗时较长；而BNB无需校准数据集，量化耗时较短。
+| 量化技术 | 多模态 | 推理加速 | 继续训练 |
+| -------- | ------ | -------- | -------- |
+| GPTQ     | ✅      | ✅        | ✅        |
+| AWQ      | ✅      | ✅        | ✅        |
+| BNB      | ❌      | ✅        | ✅        |
+除SWIFT安装外，需要安装以下额外依赖：
+```shell
+# 使用awq量化:
+# autoawq和cuda版本有对应关系，请按照`https://github.com/casper-hansen/AutoAWQ`选择版本
+# 如果出现torch依赖冲突，请额外增加指令`--no-deps`
+pip install autoawq -U
+# 使用gptq量化:
+# auto_gptq和cuda版本有对应关系，请按照`https://github.com/PanQiWei/AutoGPTQ#quick-installation`选择版本
+pip install auto_gptq optimum -U
+# 使用bnb量化：
+pip install bitsandbytes -U
+```
+我们提供了一系列脚本展现SWIFT的量化导出能力：
+- 支持[AWQ](https://github.com/modelscope/ms-swift/blob/main/examples/export/quantize/awq.sh)/[GPTQ](https://github.com/modelscope/ms-swift/blob/main/examples/export/quantize/gptq.sh)/[BNB](https://github.com/modelscope/ms-swift/blob/main/examples/export/quantize/bnb.sh)量化导出。
+- 多模态量化: 支持使用GPTQ和AWQ对多模态模型进行量化，其中AWQ支持的多模态模型有限。参考[这里](https://github.com/modelscope/ms-swift/tree/main/examples/export/quantize/mllm)。
+- 更多系列模型的支持: 支持[Bert](https://github.com/modelscope/ms-swift/tree/main/examples/export/quantize/bert)，[Reward Model](https://github.com/modelscope/ms-swift/tree/main/examples/export/quantize/reward_model)的量化导出。
+- 使用SWIFT量化导出的模型支持使用vllm/lmdeploy进行推理加速；也支持使用QLoRA继续进行SFT/RLHF。
+## 推送模型
+SWIFT支持将训练/量化的模型重新推送到ModelScope/HuggingFace。默认推送到ModelScope，你可以指定`--use_hf true`推送到HuggingFace。
+```shell
+swift export \
+    --model output/vx-xxx/checkpoint-xxx \
+    --push_to_hub true \
+    --hub_model_id '<model-id>' \
+    --hub_token '<sdk-token>' \
+    --use_hf false
+```
+小贴士：
+- 你可以使用`--model <checkpoint-dir>`或者`--adapters <checkpoint-dir>`指定需要推送的checkpoint目录，这两种写法在推送模型场景没有差异。
+- 推送到ModelScope时，你需要确保你已经注册了魔搭账号，你的SDK token可以在[该页面](https://www.modelscope.cn/my/myaccesstoken)中获取。推送模型需确保sdk token的账号具有model_id对应组织的编辑权限。推送模型将自动创建对应model_id的模型仓库（如果该模型仓库不存在），你可以使用`--hub_private_repo true`来自动创建私有的模型仓库。

ms-swift/docs/source/_templates/autosummary/class.rst ADDED Viewed

	@@ -0,0 +1,10 @@

+.. currentmodule:: {{ module }}
+{{ name | underline}}
+.. autoclass:: {{ name }}
+    :inherited-members:
+    :members:
+.. autogenerated from source/_templates/autosummary/class.rst

ms-swift/docs/source_en/BestPractices/Embedding.md ADDED Viewed

	@@ -0,0 +1,97 @@

+# Embedding Training
+SWIFT has already supported the training of Embedding models, including both pure text and multimodal types. The currently supported models are:
+1. **ModernBERT Embedding Model**
+   - [ModelScope](https://modelscope.cn/models/iic/gte-modernbert-base) | [Hugging Face](https://huggingface.co/Alibaba-NLP/gte-modernbert-base)
+2. **GTE Embedding Models**
+   - **1.5B**: [ModelScope](https://www.modelscope.cn/models/iic/gte_Qwen2-1.5B-instruct) | [Hugging Face](https://huggingface.co/Alibaba-NLP/gte-Qwen2-1.5B-instruct)
+   - **7B**: [ModelScope](https://www.modelscope.cn/models/iic/gte_Qwen2-7B-instruct) | [Hugging Face](https://huggingface.co/Alibaba-NLP/gte-Qwen2-7B-instruct)
+3. **GME Embedding Models**
+   - **2B**: [ModelScope](https://www.modelscope.cn/models/iic/gme-Qwen2-VL-2B-Instruct) | [Hugging Face](https://huggingface.co/Alibaba-NLP/gme-Qwen2-VL-2B-Instruct)
+   - **7B**: [ModelScope](https://www.modelscope.cn/models/iic/gme-Qwen2-VL-7B-Instruct) | [Hugging Face](https://huggingface.co/Alibaba-NLP/gme-Qwen2-VL-7B-Instruct)
+Developers can integrate their own models independently. The `forward` output of the model needs to satisfy:
+```json
+{"last_hidden_state": some-embedding-tensor}
+```
+The return value should be a JSON with the key `last_hidden_state`, and the value should be the embedding tensor. For the input part, you can use the templates we have already supported.
+**Note:** Currently, SWIFT supports embedding models that conform to pure text or multimodal LLMs. It does not support the training of CLIP-type models at this time.
+Besides, All Embedding models supported by SWIFT have a normalize layer at last, consider add one when you are adding new models.
+## Loss
+The Embedding models supported by SWIFT currently can use the following loss functions:
+- **cosine_similarity**: Cosine similarity loss, which calculates the similarity between two embeddings and fits based on the label value. It is effectively an MSE loss.
+- **contrastive**: Contrastive learning loss with adjustable margin. Labels are only supported as 0 and 1.
+- **online_contrastive**: Contrastive loss considering hard negatives and hard positives. Labels are only supported as 0 and 1.
+- **infonce**: Computes pairwise cosine similarities between different rows within the same batch, maximizing similarity within rows and minimizing similarity between different rows. No labels are required.
+The source code for the loss functions can be found [here](https://github.com/modelscope/ms-swift/blob/main/swift/plugin/loss.py).
+## Dataset Format
+> **Note:**
+> 1. The `<image>` tag in the multimodal section below can appear in any position within `query`, `response`, or `rejected_response`. It is only required that the number of tags matches the number of values in `images`.
+> 2. The correspondence between tags and `images` follows the order: first matching the `<image>` tags in `query`, then those in `response`, and finally parsing the `<image>` tags in `rejected_response` sequentially.
+> 3. `query` represents the anchor sample, `response` represents the positive or contrastive sample, and `rejected_response` corresponds to hard negative samples.
+> 4. The `<video>` and `<audio>` tags are also supported, enabling native support for video and audio embeddings.
+### Format for Cosine Similarity Loss
+```json lines
+# LLM
+{"query": "sentence1", "response": "sentence2", "label": 0.8}
+# MLLM
+{"query": "<image>", "response": "<image>sentence", "images": ["/some/images1.jpg", "/some/images2.jpg"], "label": 0.7}
+{"query": "sentence1", "response": "<image>sentence2", "images": ["/some/images1.jpg"], "label": 0.7}
+```
+The eval metrics are the Pearson and Spearman's Rank Correlation Coefficient of the embeddings' euclidean distance/dot production and so on, totally 8 values.
+### Format for Contrastive/Online Contrastive Loss
+```json lines
+# LLM
+{"query": "sentence1", "response": "sentence2", "label": 1}
+# MLLM
+{"query": "<image>", "response": "sentence", "images": "/some/images.jpg", "label": 1}
+{"query": "<image>sentence1", "response": "sentence2", "images": "/some/images.jpg", "label": 0}
+```
+### Format for InfoNCE
+```json lines
+# LLM
+{"query": "sentence1", "response": "sentence2"}
+# MLLM
+{"query": "<image>", "response": "sentence", "images": "/some/images.jpg"}
+{"query": "<image>sentence1", "response": "<image>sentence2", "rejected_response": ["<image>sentence1", "<image>sentence2"], "images": ["/some/images.jpg", "/some/images.jpg", "/some/images.jpg", "/some/images.jpg"]}
+```
+InfoNCE loss supports the following environment variables:
+1. `INFONCE_TEMPERATURE`: The temperature parameter. If not set, the default value is 0.01.
+2. `INFONCE_USE_BATCH`: Determines whether to use `rejected_response` within the sample (hard negative samples) or to use all `responses` within a batch. The default is `True`, which means using responses within the batch.
+3. `INFONCE_HARD_NEGATIVES`: The number of hard negatives. If not set, all samples in `rejected_response` will be used. Since the lengths may not be consistent, a for loop will be used to compute the loss (which is slower). If set to a specific number, and there are not enough samples, the missing number will be randomly sampled. If there are excess samples, the first `INFONCE_HARD_NEGATIVES` will be selected.
+4. `INFONCE_MASK_FAKE_NEGATIVE`: Masks out fake negatives. The default is set to False. When enabled, it checks if a sample's similarity is greater than the positive sample's similarity plus 0.1. If so, the sample's similarity is set to -inf to prevent the leakage of the positive sample.
+> It is also possible to set the number of hard negatives to be equal in the dataset, so that even if not set, the for loop method will not be used, thereby speeding up computation.
+>
+> `rejected_response` can also be omitted. In this case, `INFONCE_USE_BATCH` remains `True` and will use other samples within the batch as rejected responses.
+The evaluation of InfoNCE loss includes the following metrics:
+- mean_neg: The average of all hard negatives
+- mean_pos: The average of all positives
+- margin: The average of (positive - max hard negative)
+## Scaffolding
+SWIFT provides two scaffold training scripts:
+- [GTE Model](https://github.com/tastelikefeet/swift/blob/main/examples/train/embedding/train_gte.sh)
+- [GME Model](https://github.com/tastelikefeet/swift/blob/main/examples/train/embedding/train_gme.sh)

ms-swift/docs/source_en/BestPractices/GRPO-Code-Training.md ADDED Viewed

	@@ -0,0 +1,136 @@

+# Code Training with GRPO
+This document explains how to use GRPO to train models for code tasks.
+Model: [Qwen/Qwen2.5-7B-Instruct](https://www.modelscope.cn/models/Qwen/Qwen2.5-VL-7B-Instruct)
+Dataset: [open-r1/verifiable-coding-problems-python-10k](https://www.modelscope.cn/datasets/open-r1/verifiable-coding-problems-python-10k/dataPeview)
+dataset example
+```json
+{
+  "problem": "Solve the following coding problem using the programming language python: Polycarp has $n$ different binary words. A word called binary if it contains only characters '0' and '1'. For example, these words are binary: \"0001\", \"11\", \"0\" and \"0011100\". Polycarp wants to offer his set of $n$ binary words to play a game \"words\". In this game, players name words and each next word (starting from the second) must start with the last character of the previous word. The first word can be any. For example, these sequence of words can be named during the game: \"0101\", \"1\", \"10\", \"00\", \"00001\". Word reversal is the operation of reversing the order of the characters. For example, the word \"0111\" after the reversal becomes \"1110\", the word \"11010\" after the reversal becomes \"01011\". Probably, Polycarp has such a set of words that there is no way to put them in the order correspondent to the game rules. In this situation, he wants to reverse some words from his set so that: the final set of $n$ words still contains different words (i.e. all words are unique); there is a way to put all words of the final set of words in the order so that the final sequence of $n$ words is consistent with the game rules. Polycarp wants to reverse minimal number of words. Please, help him. -----Input----- The first line of the input contains one integer $t$ ($1 \\le t \\le 10^4$) — the number of test cases in the input. Then $t$ test cases follow. The first line of a test case contains one integer $n$ ($1 \\le n \\le 2\\cdot10^5$) — the number of words in the Polycarp's set. Next $n$ lines contain these words. All of $n$ words aren't empty and contains only characters '0' and '1'. The sum of word lengths doesn't exceed $4\\cdot10^6$. All words are different. Guaranteed, that the sum of $n$ for all test cases in the input doesn't exceed $2\\cdot10^5$. Also, guaranteed that the sum of word lengths for all test cases in the input doesn't exceed $4\\cdot10^6$. -----Output----- Print answer for all of $t$ test cases in the order they appear. If there is no answer for the test case, print -1. Otherwise, the first line of the output should contain $k$ ($0 \\le k \\le n$) — the minimal number of words in the set which should be reversed. The second line of the output should contain $k$ distinct integers — the indexes of the words in the set which should be reversed. Words are numerated from $1$ to $n$ in the order they appear. If $k=0$ you can skip this line (or you can print an empty line). If there are many answers you can print any of them. -----Example----- Input 4 4 0001 1000 0011 0111 3 010 101 0 2 00000 00001 4 01 001 0001 00001 Output 1 3 -1 0 2 1 2 The input will be stdin and you should print your solution to stdout Now solve the problem and return the code.",
+  "verification_info": {
+    "language": "python",
+    "test_cases": [
+      {
+        "input": "4\n4\n0001\n1000\n0011\n0111\n3\n010\n101\n0\n2\n00000\n00001\n4\n01\n001\n0001\n00001\n",
+        "output": "1\n3 \n-1\n0\n\n2\n1 2 \n",
+        "type": "stdin_stdout"
+      }
+    ]
+  }
+}
+```
+`verification_info` provides the programming language as well as test cases, which include input and expected output.
+## Reward Functions
+The training process utilizes two reward functions: `code_reward` and `code_format`. For implementation details, refer to the [code]((../../../examples/train/grpo/plugin/plugin.py)).
+- `code_reward` Executes the generated code using [e2b](https://e2b.dev/) or [judge0](https://judge0.com/). Validates the code against the test cases in the dataset and assigns a reward value based on correctness.
+- `code_format` Requires the model to produce formatted responses that include code blocks.
+Note: Currently, executing code through E2B only supports the Python language. If you need to execute code in other languages, you can use Judge0([judge0 supported languages](https://github.com/judge0/judge0?tab=readme-ov-file#supported-languages)).
+## Training Script
+- Register on [e2b](https://e2b.dev/dashboard) to obtain your E2B_API_KEY and set it as an environment variable.
+- Add `external_code_reward` as a reward function with `--reward_funcs`.
+- Set `--external_plugins` to the path of plugin.py.
+```bash
+E2B_API_KEY=xxx \
+WANDB_API_KEY=xxx \
+CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
+NPROC_PER_NODE=7 \
+swift rlhf \
+    --rlhf_type grpo \
+    --model Qwen/Qwen2.5-7B-Instruct \
+    --external_plugins examples/train/grpo/plugin/plugin.py \
+    --reward_funcs external_code_reward external_code_format \
+    --reward_weights 1.0 0.1 \
+    --use_vllm true \
+    --vllm_device auto \
+    --vllm_gpu_memory_utilization 0.7 \
+    --vllm_max_model_len 8192 \
+    --train_type lora \
+    --torch_dtype bfloat16 \
+    --dataset 'open-r1/verifiable-coding-problems-python-10k' \
+    --max_completion_length 2048 \
+    --num_train_epochs 1 \
+    --per_device_train_batch_size 2 \
+    --per_device_eval_batch_size 2 \
+    --learning_rate 1e-6 \
+    --gradient_accumulation_steps 1 \
+    --eval_steps 200 \
+    --save_steps 200 \
+    --save_total_limit 2 \
+    --logging_steps 5 \
+    --max_length 4096 \
+    --output_dir output \
+    --warmup_ratio 0.05 \
+    --dataloader_num_workers 4 \
+    --dataset_num_proc 4 \
+    --num_generations 14 \
+    --temperature 0.9 \
+    --num_infer_workers 1 \
+    --system 'examples/train/grpo/prompt.txt' \
+    --deepspeed zero2 \
+    --log_completions true \
+    --report_to wandb
+```
+### judge0
+- Set environment variables:
+    - (Required) JUDGE0_ENDPOINT: The endpoint address for accessing Judge0.
+    - (Optional) JUDGE0_X_AUTH_TOKEN: The access token for Judge0.
+- Add `external_code_reward_by_judge0` as a reward function with `--reward_funcs`.
+- Set `--external_plugins` to the path of `plugin.py`.
+```bash
+JUDGE0_ENDPOINT=xxx \
+JUDGE0_X_AUTH_TOKEN=xxx \
+WANDB_API_KEY=xxx \
+CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
+NPROC_PER_NODE=7 \
+swift rlhf \
+    --rlhf_type grpo \
+    --model Qwen/Qwen2.5-7B-Instruct \
+    --external_plugins examples/train/grpo/plugin/plugin.py \
+    --reward_funcs external_code_reward_by_judge0 external_code_format \
+    --reward_weights 1.0 0.1 \
+    --use_vllm true \
+    --vllm_device auto \
+    --vllm_gpu_memory_utilization 0.7 \
+    --vllm_max_model_len 8192 \
+    --train_type lora \
+    --torch_dtype bfloat16 \
+    --dataset 'open-r1/verifiable-coding-problems-python-10k' \
+    --max_completion_length 2048 \
+    --num_train_epochs 1 \
+    --per_device_train_batch_size 2 \
+    --per_device_eval_batch_size 2 \
+    --learning_rate 1e-6 \
+    --gradient_accumulation_steps 1 \
+    --eval_steps 200 \
+    --save_steps 200 \
+    --save_total_limit 2 \
+    --logging_steps 5 \
+    --max_length 4096 \
+    --output_dir output \
+    --warmup_ratio 0.05 \
+    --dataloader_num_workers 4 \
+    --dataset_num_proc 4 \
+    --num_generations 14 \
+    --temperature 0.9 \
+    --num_infer_workers 1 \
+    --system 'examples/train/grpo/prompt.txt' \
+    --deepspeed zero2 \
+    --log_completions true \
+    --report_to wandb
+```
+Training Reward Curve
+![Training Reward Curve](../../resources/grpo_code.png)

ms-swift/docs/source_en/BestPractices/GRPO-Multi-Modal-Training.md ADDED Viewed

	@@ -0,0 +1,349 @@

+# Complete Multimodal GRPO Experiment Workflow
+This document explains how to use SWIFT GRPO for training multimodal models and tasks. The goal is to train on multiple multimodal tasks to improve task accuracy. Task definitions, training parameters, etc., refer to [R1-V](https://github.com/Deep-Agent/R1-V.git) and [open-r1-multimodal](https://github.com/EvolvingLMMs-Lab/open-r1-multimodal.git).
+---
+## **ClevrCount Task**
+### **Task and Dataset Definition**
+This task is based on the `clevr_cogen_a_train` dataset. The model's goal is to output the number of objects in the image. Therefore, we define the dataset as follows:
+```python
+class ClevrPreprocessor(ResponsePreprocessor):
+    def preprocess(self, row: Dict[str, Any]) -> Dict[str, Any]:
+        query = row.get('query', '')
+        query = f"""{query} Output the thinking process in <think> </think> and
+ final answer (number) in <answer> </answer> tags."""
+        row.update({'query': query})
+        return super().preprocess(row)
+register_dataset(
+    DatasetMeta(
+        ms_dataset_id='okwinds/clevr_cogen_a_train',
+        subsets=[
+            SubsetDataset(
+                name='default',
+                subset='default',
+                split=['train'],
+            ),
+        ],
+        preprocess_func=ClevrPreprocessor(),
+        tags=['qa', 'math']))
+```
+The purpose of redefining the dataset preprocessor here is to modify the query. A sample dataset entry is as follows, including `messages`, `images`, and `solution` fields. The `solution` is used in the reward function, while `messages` and `images` serve as model input.
+- Note: `{'role': 'assistant', 'content': '<answer> 3 </answer>'}` will be removed in GRPOTrainer and can be ignored. The 'solution' field will be passed directly into the ORM. When creating a custom dataset, the 'images' field should be organized as `["image_path1", "image_path2"]`.
+```json
+{
+    "images": ["image_path1", "image_path2"],
+    "messages": [
+        {
+            "role": "user",
+            "content": "How many items are there in the image? Output the thinking process in <think> </think> and\n final answer (number) in <answer> </answer> tags."
+        }
+    ],
+    "solution": "<answer> 3 </answer>"
+}
+```
+---
+## **Reward Function Definition**
+This task uses two reward functions: one is the format reward function mentioned in `Deepseek-R1`, and the other is the accuracy reward function for ClevrCount. The former is built into SWIFT and can be used directly with `--reward_funcs format`. The latter needs to be custom-defined. Here, we use the `external_plugin` method to define the accuracy reward function by placing the code in `swift/examples/train/grpo/plugin/plugin.py`.
+The reward function's input includes `completions` and `solution` fields, representing the model-generated text and ground truth, respectively. Each is a list, allowing the computation of multiple completions simultaneously. Note that the `solution` field is passed through directly from the dataset definition. If there are task changes, corresponding modifications can be made to the dataset and reward function.
+```python
+class MultiModalAccuracyORM(ORM):
+    def __call__(self, completions, solution, **kwargs) -> List[float]:
+        """
+        Reward function that checks if the completion is correct.
+        Args:
+            completions (list[str]): Generated outputs
+            solution (list[str]): Ground Truths.
+        Returns:
+            list[float]: Reward scores
+        """
+        rewards = []
+        from math_verify import parse, verify
+        for content, sol in zip(completions, solution):
+            reward = 0.0
+            # Try symbolic verification first
+            try:
+                answer = parse(content)
+                if float(verify(answer, parse(sol))) > 0:
+                    reward = 1.0
+            except Exception:
+                pass  # Continue to next verification method if this fails
+            # If symbolic verification failed, try string matching
+            if reward == 0.0:
+                try:
+                    # Extract answer from solution if it has think/answer tags
+                    sol_match = re.search(r'<answer>(.*?)</answer>', sol)
+                    ground_truth = sol_match.group(1).strip() if sol_match else sol.strip()
+                    # Extract answer from content if it has think/answer tags
+                    content_match = re.search(r'<answer>(.*?)</answer>', content)
+                    student_answer = content_match.group(1).strip() if content_match else content.strip()
+                    # Compare the extracted answers
+                    if student_answer == ground_truth:
+                        reward = 1.0
+                except Exception:
+                    pass  # Keep reward as 0.0 if both methods fail
+            rewards.append(reward)
+        return rewards
+orms['external_r1v_acc'] = MultiModalAccuracyORM
+```
+---
+### **GRPO Training Experiment Log**
+#### **Training Parameters**
+We selected `Qwen2.5-VL-3B-Instruct` as the base model for training. The main reason for choosing the `Instruct` model over the base model is to rapidly achieve format rewards. Experiments were conducted on 8 GPUs. SWIFT GRPO training supports multi-GPU deployment to accelerate rollouts, so we set `num_infer_workers` to 2 and processes to 6 (2 GPUs for deployment, 6 GPUs for training). If you encounter deployment errors for `qwen2.5-vl` on `vllm`, refer to [this issue](https://github.com/vllm-project/vllm/issues/13285).
+Since the task is simple, we set `max_completion_length` to 1024 and selected `external_r1v_acc` and `format` as reward functions. The learning rate and beta are set to `1e-6` and `0.001`, respectively. Other configurations are as follows. The settings for `batch_size` and `num_generations` can be referenced from [GRPO Full Workflow](./GRPO完整流程.md).
+```shell
+WANDB_API_KEY=your_wandb_api_key \
+NPROC_PER_NODE=6 \
+swift rlhf \
+    --rlhf_type grpo \
+    --model Qwen/Qwen2.5-VL-3B-Instruct \
+    --external_plugins examples/train/grpo/plugin/plugin.py \
+    --reward_funcs external_r1v_acc format \
+    --use_vllm true \
+    --vllm_device auto \
+    --vllm_gpu_memory_utilization 0.6 \
+    --train_type full \
+    --torch_dtype bfloat16 \
+    --dataset 'okwinds/clevr_cogen_a_train' \
+    --vllm_max_model_len 8192 \
+    --max_completion_length 1024 \
+    --num_train_epochs 1 \
+    --per_device_train_batch_size 8 \
+    --per_device_eval_batch_size 8 \
+    --learning_rate 1e-6 \
+    --gradient_accumulation_steps 2 \
+    --save_strategy 'steps' \
+    --eval_strategy 'steps' \
+    --eval_steps 1000 \
+    --save_steps 1000 \
+    --save_total_limit 10 \
+    --logging_steps 1 \
+    --output_dir output/GRPO_CLEVR_COUNTDOWN \
+    --warmup_ratio 0.01 \
+    --dataloader_num_workers 4 \
+    --num_generations 24 \
+    --temperature 1.0 \
+    --system 'examples/train/grpo/prompt.txt' \
+    --deepspeed zero3 \
+    --log_completions true \
+    --report_to wandb \
+    --num_iterations 1 \
+    --num_infer_workers 2 \
+    --async_generate false \
+    --beta 0.001 \
+```
+#### **Experimental Observations**
+[image.png](../../resources/grpo_clevr_count.png)
+- Given the simplicity of the dataset and task, the model converged after 500 epochs. Key observations:
+  1. The custom `ClevrORM` reward steadily increased, proving the model learned how to complete the task. The task success rate climbed from an initial 0.4 to nearly 1.
+  2. The `Format Reward` remained stable at 1, likely due to the consistent query format across all dataset samples.
+  3. The `reward_std` stabilized below 0.1.
+  4. The `completion length` eventually stabilized between 60-80 tokens, with the model learning a fixed output pattern for item-by-item counting.
+---
+For additional tasks like Geometric QA and Open R1 Multimodal datasets, refer to their respective sections in the full experiment documentation.
+## **Geometric QA Task**
+### **Task and Dataset Definition**
+This task is a Geometric QA task, where the task description is: given a geometric figure, answer mathematical questions related to the figure. The original data comes from [this paper](https://arxiv.org/pdf/2312.11370), and [R1-V](https://github.com/Deep-Agent/R1-V.git) has preprocessed the data into a `problem-solution` format while retaining the images in the `image` field. Therefore, we do not need to redefine the dataset and can directly use `--dataset AI-ModelScope/GEOQA_R1V_Train_8K`.
+---
+### **Reward Function**
+As this is also a mathematical problem, and the answers are already processed into final results, we directly use the previously defined `MultiModalAccuracyORM` reward function.
+---
+### **GRPO Training Experiment Log**
+#### **Training Parameters**
+The selected model and most hyperparameters are similar to the previous experiment, with two main differences:
+1. **SWIFT now supports the `--num_iteration` parameter**, allowing multiple updates during a single rollout. We set it to 2.
+2. During the experiment, we found that training might become unstable in mathematical problems, causing the model to collapse. This is characterized by a sharp drop in all rewards, a rapid increase in loss, `grad_norm`, and KL divergence, with no subsequent recovery. To prevent this, we set `--max_grad_norm 0.5` to ensure stable training. Note that this instability can have some randomness.
+```shell
+WANDB_API_KEY=your_wandb_api_key \
+MAX_PIXELS=401408 \
+NPROC_PER_NODE=6 \
+swift rlhf \
+    --rlhf_type grpo \
+    --model Qwen/Qwen2.5-VL-3B-Instruct \
+    --external_plugins examples/train/grpo/plugin/plugin.py \
+    --reward_funcs external_r1v_acc format \
+    --use_vllm true \
+    --vllm_device auto \
+    --vllm_gpu_memory_utilization 0.6 \
+    --train_type full \
+    --torch_dtype bfloat16 \
+    --dataset 'AI-ModelScope/GEOQA_R1V_Train_8K' \
+    --vllm_max_model_len 8192 \
+    --max_completion_length 1024 \
+    --num_train_epochs 1 \
+    --per_device_train_batch_size 8 \
+    --per_device_eval_batch_size 8 \
+    --learning_rate 1e-6 \
+    --gradient_accumulation_steps 2 \
+    --save_strategy 'steps' \
+    --eval_strategy 'steps' \
+    --eval_steps 400 \
+    --save_steps 400 \
+    --save_total_limit 10 \
+    --logging_steps 1 \
+    --output_dir output/GRPO_GEOQA \
+    --warmup_ratio 0.05 \
+    --dataloader_num_workers 4 \
+    --num_generations 8 \
+    --temperature 1.0 \
+    --repetition_penalty 1.1 \
+    --system 'examples/train/grpo/prompt.txt' \
+    --deepspeed zero3 \
+    --log_completions true \
+    --report_to wandb \
+    --num_iterations 2 \
+    --num_infer_workers 2 \
+    --async_generate false \
+    --beta 0.001 \
+    --max_grad_norm 0.5 \
+```
+---
+#### **Experimental Observations**
+[image.png](../../resources/grpo_geoqa.png)
+The training curve is shown above.
+1. Compared to the **Count task**, the **Geometric QA task** is significantly more challenging. As a result, the **Accuracy Reward** converges much more slowly and has larger oscillations.
+2. The `completion_length` exhibits a clear trend, eventually stabilizing at around 250 tokens.
+3. The reward converges, with the standard deviation stabilizing at around 0.2 after a total of 1300 steps.
+Below is an example of the model's completion after training:
+```plaintext
+User:
+"If AB is the tangent to circle O with the tangent point A, and BO intersects circle O at point C, and point D is located on circle O, what is the measure of angle ADC if the degree of angle ABO is 32.0?"
+Assistant:
+"<think>\nTo find the measure of angle ADC, we first need to understand the properties of the tangent and the circle. Since AB is a tangent to the circle at point A, the line segment OA (which is perpendicular to the tangent AB) is perpendicular to AB. This means that angle OAB is 90 degrees.\n\nGiven that angle ABO is 32 degrees, we can find angle BAO by subtracting angle ABO from 90 degrees:\n∠BAO = 90° - ∠ABO = 90° - 32° = 58°\n\nSince angle BAO is an angle in the minor arc AD, the angle ADC, which is an angle in the minor arc AD, is half the angle BAO. Therefore, angle ADC is:\n∠ADC = 1/2 × ∠BAO = 1/2 × 58° = 29°\n\nSo, the measure of angle ADC is 29 degrees.\n</think>\n<answer>\nThe measure of angle ADC is 29 degrees.\n</answer>"
+```
+---
+## **Multimodal Open R1 Dataset Experiment**
+### **Task and Dataset Definition**
+This task is based on the experiments in [open-r1-multimodal](https://github.com/EvolvingLMMs-Lab/open-r1-multimodal.git), using the dataset: [lmms-lab/multimodal-open-r1-8k-verified](https://www.modelscope.cn/datasets/lmms-lab/multimodal-open-r1-8k-verified). This dataset focuses on multimodal mathematical reasoning tasks, with data generated by GPT4o based on the `Math360K` and `Geo170K` datasets. It includes reasoning paths and verifiable answers. The dataset already contains `image`, `problem`, and `solution` fields, so no additional prompt modifications are required, and there is no need to redefine the dataset.
+---
+### **Reward Function**
+We directly use the previously defined `MultiModalAccuracyORM` reward function.
+---
+### **GRPO Training Experiment Log**
+#### **Training Parameters**
+The selected model and most hyperparameters are similar to the previous experiment. Due to an **OOM (Out of Memory) issue**, we set `MAX_PIXELS=262144` to reduce memory usage.
+```shell
+WANDB_API_KEY=your_wandb_api_key \
+MAX_PIXELS=262144 \
+MASTER_PORT=29600 \
+NPROC_PER_NODE=6 \
+swift rlhf \
+    --rlhf_type grpo \
+    --model Qwen/Qwen2.5-VL-3B-Instruct \
+    --external_plugins examples/train/grpo/plugin/plugin.py \
+    --reward_funcs external_r1v_acc format \
+    --use_vllm true \
+    --vllm_device auto \
+    --vllm_gpu_memory_utilization 0.6 \
+    --train_type full \
+    --torch_dtype bfloat16 \
+    --dataset 'lmms-lab/multimodal-open-r1-8k-verified' \
+    --vllm_max_model_len 8192 \
+    --max_completion_length 1024 \
+    --num_train_epochs 1 \
+    --per_device_train_batch_size 8 \
+    --per_device_eval_batch_size 8 \
+    --learning_rate 1e-6 \
+    --gradient_accumulation_steps 2 \
+    --save_strategy 'steps' \
+    --eval_strategy 'steps' \
+    --eval_steps 400 \
+    --save_steps 400 \
+    --save_total_limit 10 \
+    --logging_steps 1 \
+    --output_dir output/GRPO_GEOQA \
+    --warmup_ratio 0.05 \
+    --dataloader_num_workers 4 \
+    --num_generations 8 \
+    --temperature 1.0 \
+    --repetition_penalty 1.1 \
+    --system 'examples/train/grpo/prompt.txt' \
+    --deepspeed zero3 \
+    --log_completions true \
+    --report_to wandb \
+    --num_iterations 2 \
+    --num_infer_workers 2 \
+    --async_generate false \
+    --beta 0.001 \
+    --max_grad_norm 0.5 \
+```
+---
+#### **Experimental Observations**
+[image.png](../../resources/grpo_openr1_multimodal.png)
+The training curve is shown above.
+- With a total of 8k data samples, the training reached **1268 steps**, with the **Accuracy Reward** converging around 0.5.
+- The `completion_length` stabilized around 200 tokens.
+- The `reward_std` eventually oscillated around 0.2.
+Below is an example of the model's completion after training:
+```plaintext
+User:
+"Based on the map, which state falls into the lowest percentage range of lots, and what is that percentage range?"
+Assistant:
+"<think>\nThe image provided shows a color-coded map of the USA indicating different ranges of lot percentages across various states. According to the legend at the bottom, the lowest percentage range (6.0% - 6.5%) is represented by white. In the image, Alabama (AL) is shaded in white, which corresponds to the 6.0% - 6.5% category. Therefore, based on the map, the state that falls into the lowest percentage range of lots is Alabama, with the percentage range of 6.0% - 6.5%.\nTherefore, the answer is 6.0% - 6.5%.\n</think>\n<answer>Alabama</answer>"
+```

ms-swift/docs/source_en/BestPractices/GRPO.md ADDED Viewed

	@@ -0,0 +1,298 @@

+# Complete GRPO Experiment Process
+This article starts with the relatively simple mathematical task "Countdown Game" and introduces the complete GRPO training process through several steps: dataset definition, reward function definition, and GRPO training. The task definition and training parameters are based on [mini-deepseek-r1](https://github.com/philschmid/deep-learning-pytorch-huggingface/blob/main/training/mini-deepseek-r1-aha-grpo.ipynb).
+## Task and Dataset Definition
+The goal of the Countdown Game task is to reach a target number using the given numbers and the four basic arithmetic operations. Therefore, we define the dataset as follows:
+```python
+class CoundownTaskPreprocessor(ResponsePreprocessor):
+    def preprocess(self, row: Dict[str, Any]) -> Dict[str, Any]:
+        numbers = row['nums']
+        target = row.pop('response', None)
+        query = f"""
+        Using the numbers {numbers}, create an equation that equals {target}.
+        You can use basic arithmetic operations (+, -, *, /) and each number can only be used once.
+        Show your work in <think> </think> tags. And return the final equation and answer in <answer> </answer> tags,
+        for example <answer> (1 + 2) / 3 * 4 = 4 </answer>.
+        """
+        row.update({'target': target, 'query': query})
+        return super().preprocess(row)
+register_dataset(
+    DatasetMeta(
+        ms_dataset_id='zouxuhong/Countdown-Tasks-3to4',
+        subsets=['default'],
+        preprocess_func=CoundownTaskPreprocessor(),
+        tags=['math']))
+```
+Through a template, numbers and the target are used to define the task, and a `query` field is provided for model sampling. At the same time, we need to retain the `nums` and `target` fields for subsequent reward function calculation.
+## Reward Function Definition
+Two reward functions are used for this task: one is the format reward function mentioned in Deepseek-R1, and the other is the accuracy reward function for the Countdown Game. The former is already built into Swift and can be used directly with `--reward_funcs format`, while the latter requires custom definition. Here, we use the `external_plugin` method to define the accuracy reward function, placing the code in `swift/examples/train/grpo/plugin/plugin.py`.
+The input to the reward function includes three fields: `completions`, `target`, and `nums`, representing the model-generated text, the target answer, and the available numbers, respectively. Each is a list, supporting simultaneous computation of multiple completions. Note that, except for `completions`, the other parameters are transparently passed from the fields defined in the dataset. If there are changes to the task, adjustments can be made to both the dataset and the reward function as needed.
+```python
+class CountdownORM(ORM):
+    def __call__(self, completions, target, nums, **kwargs) -> List[float]:
+        """
+        Evaluates completions based on Mathematical correctness of the answer
+        Args:
+            completions (list[str]): Generated outputs
+            target (list[str]): Expected answers
+            nums (list[str]): Available numbers
+        Returns:
+            list[float]: Reward scores
+        """
+        rewards = []
+        for completion, gt, numbers in zip(completions, target, nums):
+            try:
+                # Check if the format is correct
+                match = re.search(r"<answer>(.*?)<\/answer>", completion)
+                if match is None:
+                    rewards.append(0.0)
+                    continue
+                # Extract the "answer" part from the completion
+                equation = match.group(1).strip()
+                if '=' in equation:
+                    equation = equation.split('=')[0]
+                # Extract all numbers from the equation
+                used_numbers = [int(n) for n in re.findall(r'\d+', equation)]
+                # Check if all numbers are used exactly once
+                if sorted(used_numbers) != sorted(numbers):
+                    rewards.append(0.0)
+                    continue
+                # Define a regex pattern that only allows numbers, operators, parentheses, and whitespace
+                allowed_pattern = r'^[\d+\-*/().\s]+$'
+                if not re.match(allowed_pattern, equation):
+                    rewards.append(0.0)
+                    continue
+                # Evaluate the equation with restricted globals and locals
+                result = eval(equation, {"__builti'ns__": None}, {})
+                # Check if the equation is correct and matches the ground truth
+                if abs(float(result) - float(gt)) < 1e-5:
+                    rewards.append(1.0)
+                else:
+                    rewards.append(0.0)
+            except Exception as e:
+                # If evaluation fails, reward is 0
+                rewards.append(0.0)
+        return rewards
+orms['external_countdown'] = CountdownORM
+```
+## GRPO Training Experiment Record
+We first present the GRPO formula:
+$$
+\begin{aligned}
+\mathcal{J}_{G R P O}(\theta) & =\mathbb{E}\left[q \sim P(Q),\left\{o_i\right\}_{i=1}^G \sim \pi_{\theta_{o l d}}(O \mid q)\right] \\
+& \frac{1}{G} \sum_{i=1}^G \frac{1}{\left|o_i\right|} \sum_{t=1}^{\left|o_i\right|}\left\{\min \left[\frac{\pi_\theta\left(o_{i, t} \mid q, o_{i,<t}\right)}{\pi_{\theta_{o l d}}\left(o_{i, t} \mid q, o_{i,<t}\right)} \hat{A}_{i, t}, \operatorname{clip}\left(\frac{\pi_\theta\left(o_{i, t} \mid q, o_{i,<t}\right)}{\pi_{\theta_{o l d}}\left(o_{i, t} \mid q, o_{i,<t}\right)}, 1-\varepsilon, 1+\varepsilon\right) \hat{A}_{i, t}\right]-\beta \mathbb{D}_{K L}\left[\pi_\theta| | \pi_{r e f}\right]\right\}
+\end{aligned}
+$$
+### Training Parameters
+We selected Qwen2.5-3B-Instruct as the base model for training, as using an instruct-tuned model allows for faster acquisition of format rewards. The experiment was conducted on three GPUs, with vLLM inference deployed on the last GPU and two processes set on the remaining GPUs for gradient updates.
+Since the task is relatively simple, we set both `max_completion_length` and `vllm_max_model_len` to 1024. For more complex tasks, the model output length can be increased appropriately, but note that **the larger these parameters, the more GPU memory is required, and the slower the training speed**. The training time per step is linearly related to `max_completion_length`.
+In our experiment, the total batch size is:
+$$
+\text{total batch size} = \text{num\_processes} \times \text{per\_device\_train\_batch\_size} \times \text{gradient\_accumulation\_steps} = 2 \times 8 \times 8 = 128
+$$
+There is a constraint:
+$$
+\text{num\_processes} \times \text{per\_device\_train\_batch\_size} \text{ must divide evenly into } \text{num\_generations},
+$$
+where `num_generations` corresponds to $G$ in the GRPO formula. Therefore, we set it to 8. Note that the single-GPU batch size is also closely related to GPU memory capacity, so set an appropriate value based on memory limits. Additionally, the total number of steps can be calculated as:
+$$
+\text{num\_steps} = \text{epochs} \times \text{len(datasets)} \times \text{num\_generations} \div \text{batch\_size}
+$$
+This formula should guide the planning of learning rate and warmup settings.
+Finally, two important parameters are learning rate and $\beta$. The learning rate is straightforward, while $\beta$ is the weight of the KL divergence gradient in the formula. Increasing these parameters accelerates convergence but may lead to instability. After experimentation, we set them to `5e-7` and `0.001`, respectively. During training, adjust these parameters appropriately if instability or oscillations occur.
+For KL divergence, the community has extensive discussions, such as [Why GRPO Adheres to KL Divergence](https://zhuanlan.zhihu.com/p/25862547100).
+Other parameter settings were not explored in detail and will not be discussed here.
+```bash
+CUDA_VISIBLE_DEVICES=0,1,2 \
+WANDB_API_KEY=your_wandb_key \
+NPROC_PER_NODE=2 \
+swift rlhf \
+    --rlhf_type grpo \
+    --model Qwen/Qwen2.5-3B-Instruct \
+    --external_plugins examples/train/grpo/plugin/plugin.py \
+    --reward_funcs external_countdown format \
+    --use_vllm true \
+    --vllm_device auto \
+    --vllm_gpu_memory_utilization 0.6 \
+    --train_type full \
+    --torch_dtype bfloat16 \
+    --dataset 'zouxuhong/Countdown-Tasks-3to4#50000' \
+    --max_length 2048 \
+    --max_completion_length 1024 \
+    --num_train_epochs 1 \
+    --per_device_train_batch_size 8 \
+    --per_device_eval_batch_size 8 \
+    --learning_rate 5e-7 \
+    --gradient_accumulation_steps 8 \
+    --eval_steps 500 \
+    --save_steps 100 \
+    --save_total_limit 20 \
+    --logging_steps 1 \
+    --output_dir output/GRPO_COUNTDOWN \
+    --warmup_ratio 0.01 \
+    --dataloader_num_workers 4 \
+    --num_generations 8 \
+    --temperature 1.0 \
+    --system 'You are a helpful assistant. You first thinks about the reasoning process in the mind and then provides the user with the answer.' \
+    --deepspeed zero3 \
+    --log_completions true \
+    --vllm_max_model_len 1024 \
+    --report_to wandb \
+    --beta 0.001 \
+    --num_iterations 1
+```
+### Experimental Observations
+After training for 2000 steps, the following observations were made:
+1. **Custom CountdownORM Rewards**: The custom `CountdownORM` reward is steadily increasing, which proves that the model is learning to accomplish the task. The final task success rate is around 50%.
+2. **Format Reward**: The format reward remains at a relatively high level, fluctuating between 0.9 and 1.
+3. **Reward Standard Deviation**: The reward standard deviation stabilizes around 0.2-0.3 but has not fully converged, indicating that the model still has room to improve its reward.
+4. **Failure Cases**: By examining the model's output, many failure cases are due to the model making simple arithmetic errors. Therefore, the model's final convergence direction involves solving the problem step by step. Using a larger model may lead to different convergence directions.
+5. **Completion Length**: The completion length initially decreases and then increases, reaching its lowest point around 200 steps and later rising to approximately 350. This suggests the model underwent a change in reasoning approach.
+6. **Reasoning Patterns**: The model's reasoning pattern evolves over time. Eventually, the model realizes that performing permutations and step-by-step calculations is more effective for completing this task. The final sampled outputs mostly follow this format. Below are examples of how completions change across different steps:
+#### Completion Examples at Different Steps
+**Step 1:**
+```plaintext
+<think>
+We need to create an equation using the given numbers [70, 5, 42, 30] exactly once and make it equal to 77.
+First, let's consider the possibility of multiplication or division to get closer to 77. Notice that 77 is close to 70 and 42. If we can use these numbers together, it might work.
+Multiplying 70 by 1.1 would give us approximately 77, which is very close. So, let's try to make 1.1 using the remaining numbers 5 and 30.
+If we take 30/5 = 6, and then 5/6 = 0.8333..., this isn't useful here, so let's look at other possibilities.
+Next, let's consider if there's another way to achieve 77 using addition, subtraction, or another combination of our numbers. Given the numbers provided, it seems that multiplication or division will be key.
+We can try subtracting 5 from 30 to get 25. Then, if we divide 42 by 25, we don't get a whole number, but we can round it to the nearest whole number, 1. Now, we need to make the equation fit 77. Let's multiply 25 by 3 and add 7 (since 77 = 25*3 + 7). However, we only have one 7 available and we need to find a way to use the remaining numbers to get 7.
+Let's try:
+70 - 42 = 28
+Then, 28 + 5 = 33
+Next, 33 + 30 = 63
+Finally, 77 - 63 = 14, which doesn't help.
+Considering different combinations, one viable way is:
+(70 - 42) * (30 / 5) = 28 * 6 = 168 / 2 = 84 / 2 = 42 / 2 = 21 / 3 = 7
+So, (70 - 42) * (30 / 5) = 77.
+</think>
+<answer>(70 - 42) * (30 / 5) = 77</answer>
+```
+**Step 200:**
+```plaintext
+<think>
+We need to use all the given numbers in a way that the resultant value of the calculation is 99. Given these restrictions, it's clear we should look into how the numbers relate and the mathematical operations that will bring them together to meet our condition.
+Firstly, let's evaluate the sums, differences, and multiplication/division among the numbers and their relations to see how this could lead us towards an outcome of 99.
+- Summing the three numbers: 96 + 70 + 73 = 239, which is far from 99. This shows addition isn't practical.
+- Subtracting numbers: For instance, subtracting the lowest two (i.e., 73 - 70 = 3) and combining with the highest one (96) does not work. Also, there is no other combination of subtraction that would result in the number needing to add up to 99.
+- Multiplying or dividing the numbers alone wouldn’t reach any near solution since none of the products or division results of the numbers themselves is very close to 99. We might think more carefully about the arrangement of them using a combination of the operations provided.
+Therefore, considering the possible arrangement, a combination like taking (73 - 70 + 96) may produce an effective solution as this will sum the values in such a manner that it aligns with the goal of 99.
+So we conclude that the operation: 73 - 70 + 96 gives the right sequence to equal 99.
+</think>
+<answer>73 - 70 + 96 = 99</answer>
+```
+**Step 500:**
+```plaintext
+<think>
+Let's begin by exploring various possible combinations of the given numbers using the arithmetic operators we have access to.
+- If I add the numbers, I get: 56 + 73 + 51 = 180 which is not equal to our target number (68)
+- If I subtract one number from another I will still exceed our target value as well since the maximum difference is: |(difference between max-min):| | (73 - 51) or | (73 - 56), etc.|
+- Let's multiply the numbers individually and see if the division operation can bring us closer to the goal:
+    - (73 * 51) - 56 = 3723 - 56 = 3667 which isn't the goal either.
+    - The only operator left is addition and subtraction; I'll try the combination:
+    ((73 - 56) + 51) gives me (17 + 51) = 68.
+Thus, our solution fits the provided problem constraints and meets the required outcome.
+</think>
+<answer>(73 - 56) + 51 = 68</answer>
+```
+**Step 1000:**
+```plaintext
+<think>
+We need to form an expression using the given numbers (50, 86, and 49) such that it equals 85. Let's try different combinations:
+(86 - 50) + 49 = 76 + 49 = 125 (too big)
+(86 - 49) + 50 = 37 + 50 = 87 (too big)
+(86 - 50) + 49 = 37 + 49 = 86 (too big)
+(86 + 50) - 49 = 136 - 49 = 87 (too big)
+(86 - 50) + (49 - x)? Not working!
+(50 + 86) - 49: this is 136 which isn't our aim.
+50 + (86 - 49) = 50 + 37 = 87 which wasn't correct.
+50 - 86 + 49 = -16 + 49: This doesn't match up as well.
+86 - (50 + 49) = 86 - 99 which doesn't work too.
+We try one more:
+(86 + 49) - 50 = 135 - 50 = 85!
+That's it!
+</think>
+<answer>(86 + 49) - 50 = 85</answer>
+```
+**Step 2000:**
+```plaintext
+<think>
+We need to form an expression using the numbers 58, 9, 78, and 44 which evaluates to 73. Let's try different combinations:
+(78 - 58) + (9 + 44) = 10 + 53 = 63 (too low)
+(78 - 58) + (9 - 44) = 20 - 35 = -15 (too low)
+(78 - 58) + (44 - 9) = 20 + 35 = 55 (too low)
+(78 + 58) - (9 + 44) = 136 - 53 = 83 (too high)
+(78 + 58) - (9 - 44) = 136 + 35 = 171 (too high)
+(78 + 58) - (44 + 9) = 136 + 53 = 189 (too high)
+(78 + 9) - (58 + 44) = 87 - 102 = -15 (too low)
+(78 + 9) - (58 - 44) = 87 - 14 = 73
+So our solution is: (78 + 9) - (58 - 44) = 73
+</think>
+<answer>(78 + 9) - (58 - 44) = 73</answer>
+```
+---
+### Unstable Experiment Records
+For learning rate set to `1e-6` and $\beta$ set to `0.04`, the model showed oscillations around step 200. Both the format reward and CountdownORM reward dropped significantly:
+[Unstable Experiment Graph](../../resources/grpo_countdown_1.png)

ms-swift/docs/source_en/BestPractices/Rapidly-Training-VL-model.md ADDED Viewed

	@@ -0,0 +1,228 @@

+# Best Practices for Rapidly Training Vision-Language (VL) Models
+This document provides best practices for quickly training vision-language (VL) models from scratch.
+Model Links
+- [Qwen2.5-VL-7B-Instruct](https://www.modelscope.cn/models/Qwen/Qwen2.5-VL-7B-Instruct)
+- [Qwen3-8B](https://www.modelscope.cn/models/Qwen/Qwen3-8B)
+Trained Model Link
+- [Simple-VL-8B](https://www.modelscope.cn/models/swift/Simple-VL-8B/summary)
+The training workflow builds upon the Qwen2.5-VL-7B-Instruct model architecture by replacing its internal large language model (LLM) component with the weights from Qwen3-8B , thereby enhancing the model's visual understanding capabilities. The process involves the following steps:
+1. Modify the original model’s configuration file config.json to align with Qwen3-8B.
+2. Initialize and load new model weights, saving them as a new model.
+3. Fine-tune the new model in two stages:
+    1. Stage 1 : Train only the vision-to-language alignment module (aligner), freezing the ViT and LLM components.
+    2. Stage 2 : Unfreeze all modules and perform joint fine-tuning to improve overall performance.
+## Model Modification
+### Config File (config.json) Update
+Due to structural differences between Qwen2.5-7B-Instruct and Qwen3-8B (e.g., number of layers, hidden dimensions), create a new config.json based on the Qwen2.5-VL-7B-Instruct config and update the following parameters to match Qwen3-8B:
+```
+Modified Parameters
+1. hidden_size 3584->4096
+2. intermediate_size: 18944->12288
+3. num_attention_heads: 28->32
+4. num_key_value_heads: 4->8
+5. num_hidden_layers: 28->32
+6. vocab_size:152064->151936
+7. max_window_layers:28->36
+Newly Added Parameter
+1. head_dim： 128
+```
+### Model Weight Initialization and Replacement
+Use the following Python script to initialize, replace, and save the model weights:
+```python
+import torch
+from modelscope import Qwen2_5_VLForConditionalGeneration, AutoModelForCausalLM, AutoConfig
+from transformers.models.qwen2_5_vl.modeling_qwen2_5_vl import Qwen2_5_VLPatchMerger, Qwen2_5_VLModel
+from accelerate import Accelerator
+# Load original VL model and Qwen3-8B model
+qwen2_5_vl_7b_model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
+    "Qwen/Qwen2.5-VL-7B-Instruct",
+    device_map="cuda",
+    torch_dtype=torch.bfloat16
+)
+device = qwen2_5_vl_7b_model.device
+qwen3_8b_model = AutoModelForCausalLM.from_pretrained(
+    "Qwen/Qwen3-8B",
+    device_map=device,
+    torch_dtype=torch.bfloat16
+)
+# Load configurations
+old_config = AutoConfig.from_pretrained("Qwen/Qwen2.5-VL-7B-Instruct")
+new_config = AutoConfig.from_pretrained("/path/to/new_config_dir")  # Path to new config directory
+# Replace merger (aligner) layer
+new_merger = Qwen2_5_VLPatchMerger(
+    dim=new_visual_config.out_hidden_size,
+    context_dim=new_visual_config.hidden_size,
+    spatial_merge_size=new_visual_config.spatial_merge_size,
+).to(device).to(torch.bfloat16)
+qwen2_5_vl_7b_model.visual.merger = new_merger
+# Replace LLM part of the VL model
+new_llm_model = Qwen2_5_VLModel(new_config).to(device).to(torch.bfloat16)
+for name, param in qwen3_8b_model.model.named_parameters():
+    if name in new_llm_model.state_dict():
+        new_llm_model.state_dict()[name].copy_(param)
+qwen2_5_vl_7b_model.model = new_llm_model
+qwen2_5_vl_7b_model.lm_head = qwen3_8b_model.lm_head
+# Save modified model
+accelerator = Accelerator()
+accelerator.save_model(
+    model=qwen2_5_vl_7b_model,
+    save_directory="/path/to/save/Qwen3-VL-Model",
+    max_shard_size="4GB",
+    safe_serialization=True
+)
+```
+## Training
+To simplify the process, we skip pre-training and proceed directly to supervised fine-tuning (SFT). The training is divided into two stages:
+### Stage 1: Train Aligner Layer
+Train only the vision-to-language alignment module while freezing the ViT and LLM parts:
+```bash
+NNODES=$WORLD_SIZE \
+NODE_RANK=$RANK \
+NPROC_PER_NODE=8 \
+MAX_PIXELS=1003520 \
+CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
+swift sft \
+    --model /path/to/new_vl_model \
+    --model_type qwen2_5_vl \
+    --train_type full \
+    --dataset xxx \
+    --torch_dtype bfloat16 \
+    --attn_impl flash_attn \
+    --freeze_vit true \
+    --freeze_llm true \
+    --freeze_aligner false \
+    --num_train_epochs 3 \
+    --per_device_train_batch_size 2 \
+    --learning_rate 5e-6 \
+    --gradient_accumulation_steps 8 \
+    --eval_steps -1 \
+    --save_steps 1000 \
+    --save_total_limit 10 \
+    --logging_steps 5 \
+    --max_length 8192 \
+    --output_dir output \
+    --warmup_ratio 0.05 \
+    --dataloader_num_workers 4 \
+    --dataset_num_proc 8 \
+    --deepspeed zero2
+```
+### Stage 2: Full Model Training
+Unfreeze all modules and jointly train to enhance the model's visual understanding:
+```bash
+NNODES=$WORLD_SIZE \
+NODE_RANK=$RANK \
+NPROC_PER_NODE=8 \
+MAX_PIXELS=1003520 \
+CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
+swift sft \
+    --model /path/to/stage1_checkpoint \
+    --model_type qwen2_5_vl \
+    --train_type full \
+    --dataset xxx \
+    --torch_dtype bfloat16 \
+    --attn_impl flash_attn \
+    --freeze_vit false \
+    --freeze_llm false \
+    --freeze_aligner false \
+    --num_train_epochs 3 \
+    --per_device_train_batch_size 2 \
+    --learning_rate 5e-6 \
+    --gradient_accumulation_steps 8 \
+    --eval_steps -1 \
+    --save_steps 1000 \
+    --save_total_limit 10 \
+    --logging_steps 5 \
+    --max_length 8192 \
+    --output_dir output \
+    --warmup_ratio 0.05 \
+    --dataloader_num_workers 4 \
+    --dataset_num_proc 8 \
+    --deepspeed zero2
+```
+## Inference / Deployment / Evaluation
+### Inference
+Perform inference using `swift infer`:
+```bash
+swift infer \
+    --model /path/to/stage2_checkpoint
+```
+### Deoloyment
+Accelerate model serving with vLLM:
+```bash
+CUDA_VISIBLE_DEVICES=0 \
+MAX_PIXELS=1003520 \
+VIDEO_MAX_PIXELS=50176 \
+FPS_MAX_FRAMES=12 \
+swift deploy \
+    --model /path/to/stage2_checkpoint \
+    --infer_backend vllm \
+    --gpu_memory_utilization 0.9 \
+    --max_model_len 8192 \
+    --max_new_tokens 2048 \
+    --limit_mm_per_prompt '{"image": 5, "video": 2}' \
+    --served_model_name Qwen3-VL
+```
+### Evaluation
+Evaluate the trained VL model using [EvalScope](https://github.com/modelscope/evalscope/).
+Example Evaluation Using MMMU Benchmark
+```python
+from evalscope import TaskConfig, run_task
+task_cfg_dict = TaskConfig(
+    work_dir='outputs',
+    eval_backend='VLMEvalKit',
+    eval_config={
+        'data': ['MMMU_DEV_VAL'],
+        'mode': 'all',
+        'model': [
+            {
+                'api_base': 'http://localhost:8000/v1/chat/completions',
+                'key': 'EMPTY',
+                'name': 'CustomAPIModel',
+                'temperature': 0.6,
+                'type': 'Qwen3-VL',
+                'img_size': -1,
+                'video_llm': False,
+                'max_tokens': 512,
+            }
+        ],
+        'reuse': False,
+        'nproc': 64,
+        'judge': 'exact_matching'
+    },
+)
+run_task(task_cfg=task_cfg_dict)
+```

ms-swift/docs/source_en/Customization/Custom-dataset.md ADDED Viewed

	@@ -0,0 +1,255 @@

+# Custom Dataset
+There are three methods for accessing custom datasets, each offering progressively greater control over preprocessing functions but also increasing in complexity. For example, Solution 1 is the most convenient but offers the least control over preprocessing functions, requiring prior conversion of the dataset into a specific format:
+1. **Recommended**: Directly use the command line parameter to access the dataset with `--dataset <dataset_path1> <dataset_path2>`. This will use `AutoPreprocessor` to convert your dataset into a standard format (supporting four dataset formats; see the introduction to AutoPreprocessor below). You can use `--columns` to transform column names. The supported input formats include csv, json, jsonl, txt, and folders (e.g. git clone open-source datasets). This solution does not require modifying `dataset_info.json` and is suitable for users new to ms-swift. The following two solutions are suitable for developers looking to extend ms-swift.
+2. Add the dataset to `dataset_info.json`, which you can refer to in the built-in [dataset_info.json](https://github.com/modelscope/ms-swift/blob/main/swift/llm/dataset/data/dataset_info.json) of ms-swift. This solution also uses AutoPreprocessor to convert the dataset to a standard format. `dataset_info.json` is a list of metadata for datasets, and one of the fields ms_dataset_id/hf_dataset_id/dataset_path must be filled. Column name transformation can be done through the `columns` field. Datasets added to `dataset_info.json` or registered ones will automatically generate [supported dataset documentation](https://swift.readthedocs.io/en/latest/Instruction/Supported-models-and-datasets.html) when running [run_dataset_info.py](https://github.com/modelscope/ms-swift/blob/main/scripts/utils/run_dataset_info.py). In addition, you can use the external `dataset_info.json` approach by parsing the JSON file with `--custom_dataset_info xxx.json` (to facilitate users who prefer `pip install` over `git clone`), and then specify `--dataset <dataset_id/dataset_dir/dataset_path>`.
+3. Manually register the dataset to have the most flexible customization capability for preprocessing functions, allowing the use of functions to preprocess datasets, but it is more difficult. You can refer to the [built-in datasets](https://github.com/modelscope/ms-swift/blob/main/swift/llm/dataset/dataset/llm.py) or [examples](https://github.com/modelscope/swift/blob/main/examples/custom). You can specify `--custom_register_path xxx.py` to parse external registration content (convenient for users who use pip install instead of git clone).
+   - Solutions one and two leverage solution three under the hood, where the registration process occurs automatically.
+The following is an introduction to the dataset formats that `AutoPreprocessor` can handle:
+The standard dataset format for ms-swift accepts keys such as: 'messages', 'rejected_response', 'label', 'images', 'videos', 'audios', 'tools', and 'objects'. Among these, 'messages' is a required key. 'rejected_response' is used for DPO and other RLHF training, 'label' is used for KTO training and classification model training. The keys 'images', 'videos', and 'audios' are used to store paths or URLs for multimodal data, 'tools' is used for Agent tasks, and 'objects' is used for grounding tasks.
+There are three core preprocessors in ms-swift: `MessagesPreprocessor`, `AlpacaPreprocessor`, and `ResponsePreprocessor`. `MessagesPreprocessor` is used to convert datasets in the messages and sharegpt format into the standard format. `AlpacaPreprocessor` converts datasets in the alpaca format, while `ResponsePreprocessor` converts datasets in the query/response format. `AutoPreprocessor` automatically selects the appropriate preprocessor for the task.
+The following four formats will all be converted into the `messages` field of the ms-swift standard format under the processing of `AutoPreprocessor`, meaning they can all be directly used with `--dataset <dataset-path>`:
+Messages format (standard format):
+```jsonl
+{"messages": [{"role": "system", "content": "<system>"}, {"role": "user", "content": "<query1>"}, {"role": "assistant", "content": "<response1>"}, {"role": "user", "content": "<query2>"}, {"role": "assistant", "content": "<response2>"}]}
+```
+- Note: The system part is optional. The system in the dataset has a higher priority than the `--system` passed through the command line, followed by the `default_system` defined in the template.
+ShareGPT format:
+```jsonl
+{"system": "<system>", "conversation": [{"human": "<query1>", "assistant": "<response1>"}, {"human": "<query2>", "assistant": "<response2>"}]}
+```
+Alpaca format:
+```jsonl
+{"system": "<system>", "instruction": "<query-inst>", "input": "<query-input>", "output": "<response>"}
+```
+Query-Response format:
+```jsonl
+{"system": "<system>", "query": "<query2>", "response": "<response2>", "history": [["<query1>", "<response1>"]]}
+```
+## Standard Dataset Format
+The following outlines the standard dataset format for ms-swift, where the "system" field is optional and uses the "default_system" defined in the template by default. The four dataset formats introduced earlier can also be processed by AutoPreprocessor into the standard dataset format.
+### Pre-training
+```jsonl
+{"messages": [{"role": "assistant", "content": "I love music"}]}
+{"messages": [{"role": "assistant", "content": "Coach, I want to play basketball"}]}
+{"messages": [{"role": "assistant", "content": "Which is more authoritative, tomato and egg rice or the third fresh stir-fry?"}]}
+```
+### Supervised Fine-tuning
+```jsonl
+{"messages": [{"role": "system", "content": "You are a useful and harmless assistant"}, {"role": "user", "content": "Tell me tomorrow's weather"}, {"role": "assistant", "content": "Tomorrow's weather will be sunny"}]}
+{"messages": [{"role": "system", "content": "You are a useful and harmless math calculator"}, {"role": "user", "content": "What is 1 + 1?"}, {"role": "assistant", "content": "It equals 2"}, {"role": "user", "content": "What about adding 1?"}, {"role": "assistant", "content": "It equals 3"}]}
+```
+### RLHF
+#### DPO/ORPO/CPO/SimPO/RM
+```jsonl
+{"messages": [{"role": "system", "content": "You are a useful and harmless assistant"}, {"role": "user", "content": "Tell me tomorrow's weather"}, {"role": "assistant", "content": "Tomorrow's weather will be sunny"}], "rejected_response": "I don't know"}
+{"messages": [{"role": "system", "content": "You are a useful and harmless math calculator"}, {"role": "user", "content": "What is 1 + 1?"}, {"role": "assistant", "content": "It equals 2"}, {"role": "user", "content": "What about adding 1?"}, {"role": "assistant", "content": "It equals 3"}], "rejected_response": "I don't know"}
+```
+#### KTO
+```jsonl
+{"messages": [{"role": "system", "content": "You are a useful and harmless assistant"}, {"role": "user", "content": "Tell me tomorrow's weather"}, {"role": "assistant", "content": "I don't know"}], "label": false}
+{"messages": [{"role": "system", "content": "You are a useful and harmless math calculator"}, {"role": "user", "content": "What is 1 + 1?"}, {"role": "assistant", "content": "It equals 2"}, {"role": "user", "content": "What about adding 1?"}, {"role": "assistant", "content": "It equals 3"}], "label": true}
+```
+#### PPO/GRPO
+```jsonl
+{"messages": [{"role": "system", "content": "You are a useful and harmless assistant"}, {"role": "user", "content": "Tell me tomorrow's weather"}]}
+{"messages": [{"role": "system", "content": "You are a useful and harmless math calculator"}, {"role": "user", "content": "What is 1 + 1?"}, {"role": "assistant", "content": "It equals 2"}, {"role": "user", "content": "What about adding 1?"}]}
+{"messages": [{"role": "user", "content": "What is your name?"}]}
+```
+- Note: GRPO will pass through all additional field content to the ORM, unlike other training methods that, by default, delete extra fields. For example, you can additionally pass in 'solution'. The custom ORM needs to include a positional argument called `completions`, with other arguments as keyword arguments passed through from the additional dataset fields.
+### Sequence Classification
+**Single-label Task**:
+```jsonl
+{"messages": [{"role": "user", "content": "The weather is really nice today"}], "label": 1}
+{"messages": [{"role": "user", "content": "Today is really unlucky"}], "label": 0}
+{"messages": [{"role": "user", "content": "So happy"}], "label": 1}
+```
+**Multi-label Task**:
+```jsonl
+{"messages": [{"role": "user", "content": "<sentence>"}], "label": [1, 3, 5]}
+```
+**Single Regression Task**:
+```jsonl
+{"messages": [{"role": "user", "content": "Calculate the similarity between two sentences, with a range of 0-1.\nsentence1: <sentence1>\nsentence2: <sentence2>"}], "label": 0.8}
+```
+**Multi Regression Task**:
+```jsonl
+{"messages": [{"role": "user", "content": "<sentence>"}], "label": [1.2, -0.6, 0.8]}
+```
+### Embedding
+Please refer to [embedding训练文档](../BestPractices/Embedding.md#dataset-format).
+### Multimodal
+For multimodal datasets, the format is the same as the aforementioned tasks. The difference lies in the addition of several keys: `images`, `videos`, and `audios`, which represent the URLs or paths (preferably absolute paths) of multimodal resources. The tags `<image>`, `<video>`, and `<audio>` indicate where to insert images, videos, or audio. MS-Swift supports multiple images, videos, and audio files. These special tokens will be replaced during preprocessing, as referenced [here](https://github.com/modelscope/ms-swift/blob/main/swift/llm/template/template/qwen.py#L198). The four examples below respectively demonstrate the data format for plain text, as well as formats containing image, video, and audio data.
+Pre-training:
+```jsonl
+{"messages": [{"role": "assistant", "content": "Pre-trained text goes here"}]}
+{"messages": [{"role": "assistant", "content": "<image>is a puppy, <image>is a kitten"}], "images": ["/xxx/x.jpg", "/xxx/x.png"]}
+{"messages": [{"role": "assistant", "content": "<audio>describes how nice the weather is today"}], "audios": ["/xxx/x.wav"]}
+{"messages": [{"role": "assistant", "content": "<image>is an elephant, <video>is a lion running"}], "images": ["/xxx/x.jpg"], "videos": ["/xxx/x.mp4"]}
+```
+Supervised Fine-tuning:
+```jsonl
+{"messages": [{"role": "user", "content": "Where is the capital of Zhejiang?"}, {"role": "assistant", "content": "The capital of Zhejiang is Hangzhou."}]}
+{"messages": [{"role": "user", "content": "<image><image>What is the difference between the two images?"}, {"role": "assistant", "content": "The first one is a kitten, and the second one is a puppy."}], "images": ["/xxx/x.jpg", "/xxx/x.png"]}
+{"messages": [{"role": "user", "content": "<audio>What did the audio say?"}, {"role": "assistant", "content": "The weather is really nice today."}], "audios": ["/xxx/x.mp3"]}
+{"messages": [{"role": "system", "content": "You are a helpful and harmless assistant."}, {"role": "user", "content": "<image>What is in the image, <video>What is in the video?"}, {"role": "assistant", "content": "The image shows an elephant, and the video shows a puppy running on the grass."}], "images": ["/xxx/x.jpg"], "videos": ["/xxx/x.mp4"]}
+```
+The data format for RLHF and sequence classification of multimodal models can reference the format of pure text large models, with additional fields such as `images` added on top of that.
+#### Grounding
+For grounding (object detection) tasks, SWIFT supports two methods:
+1. Directly use the data format of the grounding task corresponding to the model. For example, the format for qwen2-vl is as follows:
+```
+{"messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "<image>Describe the image."}, {"role": "assistant", "content": "<|object_ref_start|>a dog<|object_ref_end|><|box_start|>(221,423),(569,886)<|box_end|> and <|object_ref_start|>a woman<|object_ref_end|><|box_start|>(451,381),(733,793)<|box_end|> are playing on the beach"}], "images": ["/xxx/x.jpg"]}
+{"messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "<image>Find the <|object_ref_start|>sheep<|object_ref_end|> in the image"}, {"role": "assistant", "content": "<|box_start|>(101,201),(150,266)<|box_end|><|box_start|>(401,601),(550,666)<|box_end|>"}], "images": ["/xxx/x.jpg"]}
+{"messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "<image>Help me open Google Chrome"}, {"role": "assistant", "content": "Action: click(start_box='<|box_start|>(246,113)<|box_end|>')"}], "images": ["/xxx/x.jpg"]}
+```
+When using this type of data, please note:
+- Different models have different special characters and data format for the grounding task.
+- The handling of bounding box normalization varies across different models: for example, qwen2.5-vl uses absolute coordinates, while qwen2-vl and internvl2.5 require bounding box coordinates to be normalized to the thousandth scale.
+1. Use SWIFT's grounding data format:
+```
+{"messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "<image>Describe the image."}, {"role": "assistant", "content": "<ref-object><bbox> and <ref-object><bbox> are playing on the beach"}], "images": ["/xxx/x.jpg"], "objects": {"ref": ["a dog", "a woman"], "bbox": [[331.5, 761.4, 853.5, 1594.8], [676.5, 685.8, 1099.5, 1427.4]]}}
+{"messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "<image>Find the <ref-object> in the image"}, {"role": "assistant", "content": "<bbox><bbox>"}], "images": ["/xxx/x.jpg"], "objects": {"ref": ["sheep"], "bbox": [[90.9, 160.8, 135, 212.8], [360.9, 480.8, 495, 532.8]]}}
+{"messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "<image>Help me open Google Chrome"}, {"role": "assistant", "content": "Action: click(start_box='<bbox>')"}], "images": ["/xxx/x.jpg"], "objects": {"ref": [], "bbox": [[615, 226]]}}
+```
+The format will automatically convert the dataset format to the corresponding model's grounding task format and select the appropriate model's bbox normalization method. Compared to the general format, this format includes an additional "objects" field, which contains the following subfields:
+- ref: Used to replace `<ref-object>`.
+- bbox: Used to replace `<bbox>`. If the length of each box in the bbox is 2, it represents the x and y coordinates. If the box length is 4, it represents the x and y coordinates of two points.
+- bbox_type: Optional values are 'real' and 'norm1'. The default is 'real', meaning the bbox represents the actual bounding box value. If set to 'norm1', the bbox is normalized to the range 0~1.
+- image_id: This parameter is only effective when bbox_type is 'real'. It indicates the index of the image corresponding to the bbox, used for scaling the bbox. The index starts from 0, and the default is 0 for all.
+### Text-to-Image Format
+```jsonl
+{"messages": [{"role": "system", "content": "You are a useful and harmless assistant"}, {"role": "user", "content": "Draw me an apple"}, {"role": "assistant", "content": "<image>"}], "images": ["/xxx/x.jpg"]}
+```
+### Agent Format
+Here are example data samples for a text-only Agent and a multimodal Agent:
+```jsonl
+{"tools": ["{\"type\": \"function\", \"function\": {\"name\": \"realtime_aqi\", \"description\": \"Weather forecast. Get real-time air quality, including current air quality, PM2.5, and PM10 information.\", \"parameters\": {\"type\": \"object\", \"properties\": {\"city\": {\"type\": \"string\", \"description\": \"City name, e.g., Shanghai\"}}, \"required\": [\"city\"]}}}"], "messages": [{"role": "user", "content": "What is the weather like in Beijing and Shanghai today?"}, {"role": "tool_call", "content": "{\"name\": \"realtime_aqi\", \"arguments\": {\"city\": \"Beijing\"}}"}, {"role": "tool_call", "content": "{\"name\": \"realtime_aqi\", \"arguments\": {\"city\": \"Shanghai\"}}"}, {"role": "tool_response", "content": "{\"city\": \"Beijing\", \"aqi\": \"10\", \"unit\": \"celsius\"}"}, {"role": "tool_response", "content": "{\"city\": \"Shanghai\", \"aqi\": \"72\", \"unit\": \"fahrenheit\"}"}, {"role": "assistant", "content": "According to the weather forecast tool, the air quality index (AQI) in Beijing is 10, which indicates good air quality; whereas in Shanghai, the AQI is 72, indicating mild pollution."}]}
+{"tools": ["{\"type\": \"function\", \"function\": {\"name\": \"click\", \"description\": \"Click on a position on the screen\", \"parameters\": {\"type\": \"object\", \"properties\": {\"x\": {\"type\": \"integer\", \"description\": \"X-coordinate representing the horizontal position on the screen\"}, \"y\": {\"type\": \"integer\", \"description\": \"Y-coordinate representing the vertical position on the screen\"}}, \"required\": [\"x\", \"y\"]}}}"], "messages": [{"role": "user", "content": "<image>What time is it now?"}, {"role": "assistant", "content": "<think>\nI can check the current time by opening the calendar app.\n</think>\n"}, {"role": "tool_call", "content": "{\"name\": \"click\", \"arguments\": {\"x\": 105, \"y\": 132}}"}, {"role": "tool_response", "content": "{\"images\": \"<image>\", \"status\": \"success\"}"}, {"role": "assistant", "content": "Successfully opened the calendar app. The current time is 11 o'clock in the morning."}], "images": ["desktop.png", "calendar.png"]}
+```
+- When the `agent_template` is set to "react_en", "hermes", etc., this format is compatible with training for all model Agents and allows easy switching between different models.
+- Here, `tools` is a `List[str]`, where each tool needs to be a JSON string. Additionally, the `content` part of the messages where the role is `'tool_call'` or `'tool_response/tool'` must also be in JSON string format.
+- The `tools` field will be combined with the `{"role": "system", ...}` section during training/inference according to the `agent_template`, forming a complete system section.
+- The `{"role": "tool_call", ...}` part will automatically be converted into corresponding formats of `{"role": "assistant", ...}` based on the `agent_template`. Multiple consecutive `{"role": "assistant", ...}` entries will be concatenated to form a complete assistant_content.
+- The `{"role": "tool_response", ...}` can also be written as `{"role": "tool", ...}`, these two forms are equivalent. This part will also be automatically converted according to the `agent_template`. During training, this part does not participate in loss calculations, similar to `{"role": "user", ...}`.
+- This format supports parallel tool calls; refer to the first data sample for an example. In multimodal Agent data samples, the number of `<image>` tags should match the length of "images", and their positions indicate where the image features are inserted. It also supports other modalities, such as audios and videos.
+- For more details, please refer to [Agent Documentation](../Instruction/Agent-support.md).
+## dataset_info.json
+You can refer to the ms-swift built-in [dataset_info.json](https://github.com/modelscope/ms-swift/blob/main/swift/llm/dataset/data/dataset_info.json). This approach uses the AutoPreprocessor function to convert the dataset into a standard format. The dataset_info.json file contains a list of metadata about the dataset. Here are some examples:
+```json
+[
+  {
+    "ms_dataset_id": "xxx/xxx"
+  },
+  {
+    "dataset_path": "<dataset_dir/dataset_path>"
+  },
+  {
+    "ms_dataset_id": "<dataset_id>",
+    "subsets": ["v1"],
+    "split": ["train", "validation"],
+    "columns": {
+      "input": "query",
+      "output": "response"
+    }
+  },
+  {
+    "ms_dataset_id": "<dataset_id>",
+    "hf_dataset_id": "<hf_dataset_id>",
+    "subsets": [{
+      "subset": "subset1",
+      "columns": {
+        "problem": "query",
+        "content": "response"
+      }
+    },
+    {
+      "subset": "subset2",
+      "columns": {
+        "messages": "_",
+        "new_messages": "messages"
+      }
+    }]
+  }
+]
+```
+The following parameters are supported:
+- ms_dataset_id: Refers to the DatasetMeta parameter.
+- hf_dataset_id: Refers to the DatasetMeta parameter.
+- dataset_path: Refers to the DatasetMeta parameter.
+- dataset_name: Refers to the DatasetMeta parameter.
+- subsets: Refers to the DatasetMeta parameter.
+- split: Refers to the DatasetMeta parameter.
+- columns: Transforms column names before preprocessing the dataset.
+## Dataset Registration
+`register_dataset` will register the dataset in `DATASET_MAPPING`. You can call the function `register_dataset(dataset_meta)` to complete the dataset registration, where `dataset_meta` will store the metadata of the model. The parameter list for DatasetMeta is as follows:
+- ms_dataset_id: The dataset_id for ModelScope, default is None.
+- hf_dataset_id: The dataset_id for HuggingFace, default is None.
+- dataset_path: The local path to the dataset (an absolute path is recommended), default is None.
+- dataset_name: The alias of the dataset, which can be specified via `--dataset <dataset_name>`. This is very convenient when the dataset_path is long. The default value is None.
+- subsets: A list of subdataset names or a list of `SubsetDataset` objects, default is `['default']`. (The concepts of subdatasets and splits only exist for dataset_id or dataset_dir (open source datasets cloned via git)).
+- split: Defaults to `['train']`.
+- preprocess_func: A preprocessing function or callable object, default is `AutoPreprocessor()`. This preprocessing function takes an `HfDataset` as input and returns an `HfDataset` in the standard format.
+- load_function: Defaults to `DatasetLoader.load`. If a custom loading function is needed, it should return an `HfDataset` in the standard format, allowing users maximum flexibility while bypassing the ms-swift dataset loading mechanism. This parameter usually does not need to be modified.

ms-swift/docs/source_en/GetStarted/Quick-start.md ADDED Viewed

	@@ -0,0 +1,105 @@

+# Quick Start
+ms-swift is a comprehensive training and deployment framework for large language models and multimodal large models, provided by the ModelScope Community. It currently supports the training (CPT, SFT, RLHF), inference, evaluation, quantization, and deployment of 500+ LLM and 200+ MLLM. Model developers can fulfill all kinds of needs related to large models in a single platform within the ms-swift framework. The main capabilities of ms-swift include:
+- 🍎 Model Types: Supports 500+ pure text large models, 200+ multi-modal large models, as well as All-to-All multi-modal models, sequence classification models, and embedding models, covering the entire process from training to deployment.
+- Dataset Types: Comes with more than 150 pre-built datasets for pre-training, fine-tuning, human alignment, multimodal, and supports custom datasets.
+- Hardware Support: Compatible with CPU, RTX series, T4/V100, A10/A100/H100, Ascend NPU, MPS and others.
+- 🍊 Lightweight Training: Supports lightweight fine-tuning methods like LoRA, QLoRA, DoRA, LoRA+, ReFT, RS-LoRA, LLaMAPro, Adapter, GaLore, Q-Galore, LISA, UnSloth, Liger-Kernel, and more.
+- Distributed Training: Supports distributed data parallel (DDP), simple model parallelism via device_map, DeepSpeed ZeRO2 ZeRO3, FSDP, and other distributed training technologies.
+- Quantization Training: Provides training for quantized models like BNB, AWQ, GPTQ, AQLM, HQQ, EETQ.
+- RLHF Training: Supports human alignment training methods like DPO, GRPO, RM, PPO, KTO, CPO, SimPO, ORPO for both text-based and multimodal large models.
+- 🍓 Multimodal Training: Capable of training models for different modalities such as images, videos, and audios; supports tasks like VQA (Visual Question Answering), Captioning, OCR (Optical Character Recognition), and Grounding.
+- Interface-driven Training: Offers training, inference, evaluation, and quantization capabilities through an interface, enabling a complete workflow for large models.
+- Plugins and Extensions: Allows customization and extension of models and datasets, and supports customizations for components like loss, metric, trainer, loss-scale, callback, optimizer, etc.
+- 🍉 Toolbox Capabilities: Offers not only training support for large models and multi-modal large models but also covers the entire process of inference, evaluation, quantization, and deployment.
+- Inference Acceleration: Supports inference acceleration engines like PyTorch, vLLM, LmDeploy, and provides OpenAI interface, accelerating inference, deployment, and evaluation modules.
+- Model Evaluation: Uses EvalScope as the evaluation backend and supports evaluation of text-based and multimodal models with over 100 evaluation datasets.
+- Model Quantization: Supports the export of quantized models in AWQ, GPTQ, and BNB formats, which can be accelerated using vLLM/LmDeploy for inference and support continued training.
+## Installation
+For the installation of ms-swift, please refer to the [installation documentation](./SWIFT-installation.md).
+## Usage Example
+10 minutes of self-cognition fine-tuning of Qwen2.5-7B-Instruct on a single 3090 GPU:
+```shell
+# 22GB
+CUDA_VISIBLE_DEVICES=0 \
+swift sft \
+    --model Qwen/Qwen2.5-7B-Instruct \
+    --train_type lora \
+    --dataset 'AI-ModelScope/alpaca-gpt4-data-zh#500' \
+              'AI-ModelScope/alpaca-gpt4-data-en#500' \
+              'swift/self-cognition#500' \
+    --torch_dtype bfloat16 \
+    --num_train_epochs 1 \
+    --per_device_train_batch_size 1 \
+    --per_device_eval_batch_size 1 \
+    --learning_rate 1e-4 \
+    --lora_rank 8 \
+    --lora_alpha 32 \
+    --target_modules all-linear \
+    --gradient_accumulation_steps 16 \
+    --eval_steps 50 \
+    --save_steps 50 \
+    --save_total_limit 2 \
+    --logging_steps 5 \
+    --max_length 2048 \
+    --output_dir output \
+    --system 'You are a helpful assistant.' \
+    --warmup_ratio 0.05 \
+    --dataloader_num_workers 4 \
+    --model_author swift \
+    --model_name swift-robot
+```
+Tips:
+- If you want to train with a custom dataset, you can refer to [this guide](../Customization/Custom-dataset.md) to organize your dataset format and specify `--dataset <dataset_path>`.
+- The `--model_author` and `--model_name` parameters are only effective when the dataset includes `swift/self-cognition`.
+- To train with a different model, simply modify `--model <model_id/model_path>`.
+- By default, ModelScope is used for downloading models and datasets. If you want to use HuggingFace, simply specify `--use_hf true`.
+After training is complete, use the following command to infer with the trained weights:
+- Here, `--adapters` should be replaced with the last checkpoint folder generated during training. Since the adapters folder contains the training parameter file `args.json`, there is no need to specify `--model`, `--system` separately; Swift will automatically read these parameters. To disable this behavior, you can set `--load_args false`.
+```shell
+# Using an interactive command line for inference.
+CUDA_VISIBLE_DEVICES=0 \
+swift infer \
+    --adapters output/vx-xxx/checkpoint-xxx \
+    --stream true \
+    --temperature 0 \
+    --max_new_tokens 2048
+# merge-lora and use vLLM for inference acceleration
+CUDA_VISIBLE_DEVICES=0 \
+swift infer \
+    --adapters output/vx-xxx/checkpoint-xxx \
+    --stream true \
+    --merge_lora true \
+    --infer_backend vllm \
+    --max_model_len 8192 \
+    --temperature 0 \
+    --max_new_tokens 2048
+```
+Finally, use the following command to push the model to ModelScope:
+```shell
+CUDA_VISIBLE_DEVICES=0 \
+swift export \
+    --adapters output/vx-xxx/checkpoint-xxx \
+    --push_to_hub true \
+    --hub_model_id '<your-model-id>' \
+    --hub_token '<your-sdk-token>' \
+    --use_hf false
+```
+## Learn More
+- More Shell scripts: [https://github.com/modelscope/ms-swift/tree/main/examples](https://github.com/modelscope/ms-swift/tree/main/examples)
+- Using Python: [https://github.com/modelscope/ms-swift/blob/main/examples/notebook/qwen2_5-self-cognition/self-cognition-sft.ipynb](https://github.com/modelscope/ms-swift/blob/main/examples/notebook/qwen2_5-self-cognition/self-cognition-sft.ipynb)

ms-swift/docs/source_en/Instruction/Export-and-push.md ADDED Viewed

	@@ -0,0 +1,57 @@

+# Export and Push
+## Merge LoRA
+- See [here](https://github.com/modelscope/ms-swift/blob/main/examples/export/merge_lora.sh).
+## Quantization
+SWIFT supports quantization exports for AWQ, GPTQ, and BNB models. AWQ and GPTQ require a calibration dataset, which yields better quantization performance but takes longer to quantize. On the other hand, BNB does not require a calibration dataset and is quicker to quantize.
+| Quantization Technique | Multimodal | Inference Acceleration | Continued Training |
+| ---------------------- | ---------- | ---------------------- | ------------------ |
+| GPTQ                   | ✅          | ✅                      | ✅                  |
+| AWQ                    | ✅          | ✅                      | ✅                  |
+| BNB                    | ❌          | ✅                      | ✅                  |
+In addition to the SWIFT installation, the following additional dependencies need to be installed:
+```shell
+# For AWQ quantization:
+# The versions of autoawq and CUDA are correlated; please choose the version according to `https://github.com/casper-hansen/AutoAWQ`.
+# If there are dependency conflicts with torch, please add the `--no-deps` option.
+pip install autoawq -U
+# For GPTQ quantization:
+# The versions of auto_gptq and CUDA are correlated; please choose the version according to `https://github.com/PanQiWei/AutoGPTQ#quick-installation`.
+pip install auto_gptq optimum -U
+# For BNB quantization:
+pip install bitsandbytes -U
+```
+We provide a series of scripts to demonstrate SWIFT's quantization export capabilities:
+- Supports [AWQ](https://github.com/modelscope/ms-swift/blob/main/examples/export/quantize/awq.sh)/[GPTQ](https://github.com/modelscope/ms-swift/blob/main/examples/export/quantize/gptq.sh)/[BNB](https://github.com/modelscope/ms-swift/blob/main/examples/export/quantize/bnb.sh) quantization exports.
+- Multimodal quantization: Supports quantizing multimodal models using GPTQ and AWQ, with limited multimodal models supported by AWQ. Refer to [here](https://github.com/modelscope/ms-swift/tree/main/examples/export/quantize/mllm).
+- Support for more model series: Supports quantization exports for [BERT](https://github.com/modelscope/ms-swift/tree/main/examples/export/quantize/bert) and [Reward Model](https://github.com/modelscope/ms-swift/tree/main/examples/export/quantize/reward_model).
+- Models exported with SWIFT's quantization support inference acceleration using vllm/lmdeploy; they also support further SFT/RLHF using QLoRA.
+## Push Model
+SWIFT supports re-pushing trained/quantized models to ModelScope/Hugging Face. By default, it pushes to ModelScope, but you can specify `--use_hf true` to push to Hugging Face.
+```shell
+swift export \
+    --model output/vx-xxx/checkpoint-xxx \
+    --push_to_hub true \
+    --hub_model_id '<model-id>' \
+    --hub_token '<sdk-token>' \
+    --use_hf false
+```
+Tips:
+- You can use `--model <checkpoint-dir>` or `--adapters <checkpoint-dir>` to specify the checkpoint directory to be pushed. There is no difference between these two methods in the model pushing scenario.
+- When pushing to ModelScope, you need to make sure you have registered for a ModelScope account. Your SDK token can be obtained from [this page](https://www.modelscope.cn/my/myaccesstoken). Ensure that the account associated with the SDK token has edit permissions for the organization corresponding to the model_id. The model pushing process will automatically create a model repository corresponding to the model_id (if it does not already exist), and you can use `--hub_private_repo true` to automatically create a private model repository.

ms-swift/docs/source_en/Instruction/ReleaseNote3.0.md ADDED Viewed

	@@ -0,0 +1,93 @@

+# ReleaseNote 3.0
+> If you encounter any issues while using version 3.x, please submit an issue to us. If something works in version 2.x but not in 3.x, please use version 2.x temporarily while we complete the fixes.
+## New Features
+1. Dataset module refactoring. The dataset loading speed has improved by 2-20 times, and encoding speed has improved by 2-4 times, with support for streaming mode.
+    - Removed the dataset_name mechanism; now use dataset_id, dataset_dir, or dataset_path to specify the dataset.
+    - Use `--dataset_num_proc` to support multi-process acceleration.
+    - Use `--streaming` to support streaming loading of hub and local datasets.
+    - Support `--packing` command for more stable training efficiency.
+    - Use `--dataset <dataset_dir>` to support local loading of open-source datasets.
+2. Model refactoring:
+    - Removed model_type mechanism; use `--model <model_id>/<model_path>` for training and inference.
+    - For new models, directly use `--model <model_id>/<model_path> --template xxx --model_type xxx` without needing to write a Python script for model registration.
+3. Template module refactoring:
+    - Use `--template_backend jinja` for Jinja mode inference.
+    - Utilize messages format as the input parameter interface.
+4. Supported plugin mechanism for customizing the training process. Current plugins include:
+    - callback  to customize training callbacks,
+    - loss  to customize the loss method,
+    - loss_scale  to customize the weight of each token,
+    - metric  to customize cross-validation metrics,
+    - optimizer  to customize the optimizer and lr_scheduler used in training,
+    - tools  to customize agent training system format,
+    - tuner  to customize new tuners.
+5. Training module refactoring:
+    - Supports a single command to launch multi-machine training. See details [here](https://github.com/modelscope/ms-swift/tree/main/examples/train/multi-node/deepspeed/README.md).
+    - Supports PreTraining for all multi-modal LLMs.
+    - In training, predict_with_generate now uses the infer module, supporting multi-modal LLM and multi-card setups.
+    - Human alignment KTO algorithm supports multi-modal LLMs.
+6. Inference and deployment module refactoring:
+    - Supports batch inference under pt backend and multi-card inference.
+    - Inference and deployment modules are unified using the OpenAI format interface.
+    - Supports asynchronous inference interface.
+7. Merged app-ui into web-ui, with app-ui supporting multi-modal inference.
+8. Supports All-to-All models, such as Emu3-Gen and Janus for text-to-image or all-modal model training and deployment.
+9. Enhanced the functionality of the examples, so that they can now fully reflect the capabilities of SWIFT and have stronger usability.
+10. Use `--use_hf true/false` to switch between downloading/uploading datasets and models from HuggingFace and ModelScope communities.
+11. Improved support for training and inference through code. The code structure is clearer, and extensive code comments have been added.
+## BreakChanges
+This document lists the BreakChanges between version 3.x and 2.x. Developers should note these differences when using them.
+### Parameter Differences
+- Version 3.0 only requires specifying --model. The model_type only needs to be specified additionally when the model is not supported by SWIFT.
+- sft_type is renamed to train_type.
+- model_id_or_path is renamed to model.
+- template_type is renamed to template.
+- quantization_bit is renamed to quant_bits.
+- check_model_is_latest is renamed to check_model.
+- batch_size is renamed to per_device_train_batch_size, following the transformers naming convention.
+- eval_batch_size is renamed to per_device_eval_batch_size, following the transformers naming convention.
+- tuner_backend has removed the swift option.
+- use_flash_attn is renamed to attn_impl.
+- bnb_4bit_comp_dtype is renamed to bnb_4bit_compute_dtype.
+- Removed train_dataset_sample and val_dataset_sample.
+- The term 'dtype' has been renamed to 'torch_dtype', and the option names have been changed from 'bf16' to the standard 'bfloat16', 'fp16' to 'float16', and 'fp32' to 'float32'.
+- Removed eval_human option.
+- The dataset option has removed the HF:: usage; use the new --use_hf to control downloading and uploading.
+- Removed the do_sample option, and now use temperature for control.
+- add_output_dir_suffix is renamed to add_version.
+- Removed eval_token; API key is now supported.
+- target_modules (lora_target_modules) ALL is changed to all-linear, retaining the same meaning.
+- The parameters --ckpt_dir have been removed from infer/deploy/export, and control is now done using --model and --adapters.
+The parameters marked as compatible in version 2.0 have been entirely removed.
+### Functionality
+1. For pre-training, please use the swift pt command. This command will default to using the generation template, while the swift sft command will default to the template preset by model_type.
+2. Completely removed the examples directory from version 2.x, and added new examples categorized by functionality.
+3. The dataset format is now fully compatible with messages format; query/response/history formats are no longer supported.
+4. The storage directory for merge_lora can be specified using `--output_dir`, and merge_lora and quantization cannot be executed in the same command; at least two commands are required.
+5. Use `swift app --model xxx` to launch the app-ui interface, which supports multimodal interface inference.
+6. Removed dependencies for AIGC along with corresponding examples and training code.

ms-swift/docs/source_en/_templates/classtemplate.rst ADDED Viewed

	@@ -0,0 +1,12 @@

+.. currentmodule:: {{ module }}
+{{ name | underline}}
+.. autoclass:: {{ name }}
+    :members:
+    :special-members: __init__, __call__
+..
+  autogenerated from source/_templates/classtemplate.rst
+  note it does not have :inherited-members:

ms-swift/docs/source_en/index.rst ADDED Viewed

	@@ -0,0 +1,61 @@

+.. swift documentation file,
+   You can adapt this file completely to your liking, but it should at least
+   contain the root `toctree` directive.
+Swift DOCUMENTATION
+========================
+.. toctree::
+   :maxdepth: 2
+   :caption: Get Started
+   GetStarted/SWIFT-installation.md
+   GetStarted/Quick-start.md
+   GetStarted/Web-UI.md
+.. toctree::
+   :maxdepth: 2
+   :caption: Instruction
+   Instruction/Command-line-parameters.md
+   Instruction/Pre-training-and-Fine-tuning.md
+   Instruction/RLHF.md
+   Instruction/Inference-and-deployment.md
+   Instruction/Megatron-SWIFT-Training.md
+   Instruction/Sample.md
+   Instruction/Evaluation.md
+   Instruction/Export-and-push.md
+   Instruction/Reinforced-Fine-tuning.md
+   Instruction/GRPO.md
+   Instruction/Agent-support.md
+   Instruction/Supported-models-and-datasets.md
+   Instruction/Use-tuners.md
+   Instruction/Frequently-asked-questions.md
+.. toctree::
+   :maxdepth: 2
+   :caption: Customization
+   Customization/Custom-model.md
+   Customization/Custom-dataset.md
+   Customization/Pluginization.md
+.. toctree::
+   :maxdepth: 2
+   :caption: Best Practices
+   BestPractices/GRPO.md
+   BestPractices/GRPO-Multi-Modal-Training.md
+   BestPractices/GRPO-Code-Training.md
+   BestPractices/Embedding.md
+   BestPractices/Rapidly-Training-VL-model.md
+   BestPractices/NPU-support.md
+   BestPractices/More-Best-Practices.md
+Indices and tables
+==================
+* :ref:`genindex`
+* :ref:`modindex`
+* :ref:`search`

ms-swift/examples/app/mllm.sh ADDED Viewed

	@@ -0,0 +1,13 @@

+CUDA_VISIBLE_DEVICES=0 \
+MAX_PIXELS=1003520 \
+VIDEO_MAX_PIXELS=50176 \
+FPS_MAX_FRAMES=12 \
+swift app \
+    --model Qwen/Qwen2.5-VL-7B-Instruct \
+    --stream true \
+    --infer_backend vllm \
+    --gpu_memory_utilization 0.9 \
+    --max_model_len 8192 \
+    --max_new_tokens 2048 \
+    --limit_mm_per_prompt '{"image": 5, "video": 2}' \
+    --lang zh

ms-swift/examples/deploy/bert/server.sh ADDED Viewed

	@@ -0,0 +1,10 @@

+# Since `swift/test_lora` is trained by swift and contains an `args.json` file,
+# there is no need to explicitly set `--model`, `--system`, etc., as they will be automatically read.
+CUDA_VISIBLE_DEVICES=0 swift deploy \
+    --host 0.0.0.0 \
+    --port 8000 \
+    --adapters swift/test_bert \
+    --served_model_name bert-base-chinese \
+    --infer_backend pt \
+    --truncation_strategy right \
+    --max_length 512

ms-swift/examples/deploy/client/llm/base/openai_client.py ADDED Viewed

	@@ -0,0 +1,41 @@

+# Copyright (c) Alibaba, Inc. and its affiliates.
+import os
+from openai import OpenAI
+os.environ['CUDA_VISIBLE_DEVICES'] = '0'
+def infer(client, model: str, messages):
+    query = messages[0]['content']
+    print(f'query: {query}')
+    resp = client.completions.create(model=model, prompt=query, max_tokens=64, temperature=0)
+    response = resp.choices[0].text
+    print(f'response: {response}')
+    # or (The two calling methods are equivalent.)
+    resp = client.chat.completions.create(model=model, messages=messages, max_tokens=64, temperature=0)
+    response = resp.choices[0].message.content
+    print(f'response: {response}')
+    return response
+def run_client(host: str = '127.0.0.1', port: int = 8000):
+    client = OpenAI(
+        api_key='EMPTY',
+        base_url=f'http://{host}:{port}/v1',
+    )
+    model = client.models.list().data[0].id
+    print(f'model: {model}')
+    messages = [{'role': 'user', 'content': '浙江 -> 杭州\n安徽 -> 合肥\n四川 ->'}]
+    infer(client, model, messages)
+if __name__ == '__main__':
+    from swift.llm import run_deploy, DeployArguments
+    # NOTE: In a real deployment scenario, please comment out the context of run_deploy.
+    with run_deploy(
+            DeployArguments(
+                model='Qwen/Qwen2.5-1.5B', verbose=False, log_interval=-1, infer_backend='pt',
+                use_chat_template=False)) as port:
+        run_client(port=port)

ms-swift/examples/deploy/client/llm/chat/swift_client.py ADDED Viewed

	@@ -0,0 +1,60 @@

+# Copyright (c) Alibaba, Inc. and its affiliates.
+import os
+from typing import List
+os.environ['CUDA_VISIBLE_DEVICES'] = '0'
+def infer_batch(engine: 'InferEngine', infer_requests: List['InferRequest']):
+    request_config = RequestConfig(max_tokens=512, temperature=0)
+    metric = InferStats()
+    resp_list = engine.infer(infer_requests, request_config, metrics=[metric])
+    # # The asynchronous interface below is equivalent to the synchronous interface above.
+    # async def _run():
+    #     tasks = [engine.infer_async(infer_request, request_config) for infer_request in infer_requests]
+    #     return await asyncio.gather(*tasks)
+    # resp_list = asyncio.run(_run())
+    query0 = infer_requests[0].messages[0]['content']
+    print(f'query0: {query0}')
+    print(f'response0: {resp_list[0].choices[0].message.content}')
+    print(f'metric: {metric.compute()}')
+def infer_stream(engine: 'InferEngine', infer_request: 'InferRequest'):
+    request_config = RequestConfig(max_tokens=512, temperature=0, stream=True)
+    metric = InferStats()
+    gen_list = engine.infer([infer_request], request_config, metrics=[metric])
+    query = infer_request.messages[0]['content']
+    print(f'query: {query}\nresponse: ', end='')
+    for resp in gen_list[0]:
+        if resp is None:
+            continue
+        print(resp.choices[0].delta.content, end='', flush=True)
+    print()
+    print(f'metric: {metric.compute()}')
+def run_client(host: str = '127.0.0.1', port: int = 8000):
+    engine = InferClient(host=host, port=port)
+    print(f'models: {engine.models}')
+    # Here, `load_dataset` is used for convenience; `infer_batch` does not require creating a dataset.
+    dataset = load_dataset(['AI-ModelScope/alpaca-gpt4-data-zh#1000'], seed=42)[0]
+    print(f'dataset: {dataset}')
+    infer_requests = [InferRequest(**data) for data in dataset]
+    infer_batch(engine, infer_requests)
+    messages = [{'role': 'user', 'content': 'who are you?'}]
+    infer_stream(engine, InferRequest(messages=messages))
+if __name__ == '__main__':
+    from swift.llm import (InferEngine, InferRequest, InferClient, RequestConfig, load_dataset, run_deploy,
+                           DeployArguments)
+    from swift.plugin import InferStats
+    # NOTE: In a real deployment scenario, please comment out the context of run_deploy.
+    with run_deploy(
+            DeployArguments(model='Qwen/Qwen2.5-1.5B-Instruct', verbose=False, log_interval=-1,
+                            infer_backend='vllm')) as port:
+        run_client(port=port)

ms-swift/examples/deploy/client/mllm/openai_client.py ADDED Viewed

	@@ -0,0 +1,97 @@

+# Copyright (c) Alibaba, Inc. and its affiliates.
+import os
+from typing import Literal
+from openai import OpenAI
+os.environ['CUDA_VISIBLE_DEVICES'] = '0'
+def infer(client, model: str, messages):
+    resp = client.chat.completions.create(model=model, messages=messages, max_tokens=512, temperature=0)
+    query = messages[0]['content']
+    response = resp.choices[0].message.content
+    print(f'query: {query}')
+    print(f'response: {response}')
+    return response
+# streaming
+def infer_stream(client, model: str, messages):
+    gen = client.chat.completions.create(model=model, messages=messages, stream=True, temperature=0)
+    print(f'messages: {messages}\nresponse: ', end='')
+    for chunk in gen:
+        print(chunk.choices[0].delta.content, end='', flush=True)
+    print()
+def get_message(mm_type: Literal['text', 'image', 'video', 'audio']):
+    if mm_type == 'text':
+        message = {'role': 'user', 'content': 'who are you?'}
+    elif mm_type == 'image':
+        message = {
+            'role':
+            'user',
+            'content': [{
+                'type': 'image',
+                'image': 'http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png'
+            }, {
+                'type': 'text',
+                'text': 'How many sheep are there in the picture?'
+            }]
+        }
+    elif mm_type == 'video':
+        # # use base64
+        # import base64
+        # with open('baby.mp4', 'rb') as f:
+        #     vid_base64 = base64.b64encode(f.read()).decode('utf-8')
+        # video = f'data:video/mp4;base64,{vid_base64}'
+        # use url
+        video = 'https://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/baby.mp4'
+        message = {
+            'role': 'user',
+            'content': [{
+                'type': 'video',
+                'video': video
+            }, {
+                'type': 'text',
+                'text': 'Describe this video.'
+            }]
+        }
+    elif mm_type == 'audio':
+        message = {
+            'role':
+            'user',
+            'content': [{
+                'type': 'audio',
+                'audio': 'http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/weather.wav'
+            }, {
+                'type': 'text',
+                'text': 'What does this audio say?'
+            }]
+        }
+    return message
+def run_client(host: str = '127.0.0.1', port: int = 8000):
+    client = OpenAI(
+        api_key='EMPTY',
+        base_url=f'http://{host}:{port}/v1',
+    )
+    model = client.models.list().data[0].id
+    print(f'model: {model}')
+    query = 'who are you?'
+    messages = [{'role': 'user', 'content': query}]
+    response = infer(client, model, messages)
+    messages.append({'role': 'assistant', 'content': response})
+    messages.append(get_message(mm_type='video'))
+    infer_stream(client, model, messages)
+if __name__ == '__main__':
+    from swift.llm import run_deploy, DeployArguments
+    with run_deploy(DeployArguments(model='Qwen/Qwen2.5-VL-3B-Instruct', verbose=False, log_interval=-1)) as port:
+        run_client(port=port)

ms-swift/examples/deploy/client/mllm/swift_client.py ADDED Viewed

	@@ -0,0 +1,127 @@

+# Copyright (c) Alibaba, Inc. and its affiliates.
+import os
+from typing import List, Literal
+os.environ['CUDA_VISIBLE_DEVICES'] = '0'
+def infer_batch(engine: 'InferEngine', infer_requests: List['InferRequest']):
+    request_config = RequestConfig(max_tokens=512, temperature=0)
+    metric = InferStats()
+    resp_list = engine.infer(infer_requests, request_config, metrics=[metric])
+    query0 = infer_requests[0].messages[0]['content']
+    print(f'query0: {query0}')
+    print(f'response0: {resp_list[0].choices[0].message.content}')
+    print(f'metric: {metric.compute()}')
+def infer_stream(engine: 'InferEngine', infer_request: 'InferRequest'):
+    request_config = RequestConfig(max_tokens=512, temperature=0, stream=True)
+    metric = InferStats()
+    gen_list = engine.infer([infer_request], request_config, metrics=[metric])
+    query = infer_request.messages[0]['content']
+    print(f'query: {query}\nresponse: ', end='')
+    for resp in gen_list[0]:
+        if resp is None:
+            continue
+        print(resp.choices[0].delta.content, end='', flush=True)
+    print()
+    print(f'metric: {metric.compute()}')
+def get_message(mm_type: Literal['text', 'image', 'video', 'audio']):
+    if mm_type == 'text':
+        message = {'role': 'user', 'content': 'who are you?'}
+    elif mm_type == 'image':
+        message = {
+            'role':
+            'user',
+            'content': [
+                {
+                    'type': 'image',
+                    # url or local_path or PIL.Image or base64
+                    'image': 'http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png'
+                },
+                {
+                    'type': 'text',
+                    'text': 'How many sheep are there in the picture?'
+                }
+            ]
+        }
+    elif mm_type == 'video':
+        # # use base64
+        # import base64
+        # with open('baby.mp4', 'rb') as f:
+        #     vid_base64 = base64.b64encode(f.read()).decode('utf-8')
+        # video = f'data:video/mp4;base64,{vid_base64}'
+        # use url
+        video = 'https://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/baby.mp4'
+        message = {
+            'role': 'user',
+            'content': [{
+                'type': 'video',
+                'video': video
+            }, {
+                'type': 'text',
+                'text': 'Describe this video.'
+            }]
+        }
+    elif mm_type == 'audio':
+        message = {
+            'role':
+            'user',
+            'content': [{
+                'type': 'audio',
+                'audio': 'http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/weather.wav'
+            }, {
+                'type': 'text',
+                'text': 'What does this audio say?'
+            }]
+        }
+    return message
+def get_data(mm_type: Literal['text', 'image', 'video', 'audio']):
+    data = {}
+    if mm_type == 'text':
+        messages = [{'role': 'user', 'content': 'who are you?'}]
+    elif mm_type == 'image':
+        # The number of <image> tags must be the same as len(images).
+        messages = [{'role': 'user', 'content': '<image>How many sheep are there in the picture?'}]
+        # Support URL/Path/base64/PIL.Image
+        data['images'] = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png']
+    elif mm_type == 'video':
+        messages = [{'role': 'user', 'content': '<video>Describe this video.'}]
+        data['videos'] = ['https://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/baby.mp4']
+    elif mm_type == 'audio':
+        messages = [{'role': 'user', 'content': '<audio>What does this audio say?'}]
+        data['audios'] = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/weather.wav']
+    data['messages'] = messages
+    return data
+def run_client(host: str = '127.0.0.1', port: int = 8000):
+    engine = InferClient(host=host, port=port)
+    print(f'models: {engine.models}')
+    # Here, `load_dataset` is used for convenience; `infer_batch` does not require creating a dataset.
+    dataset = load_dataset(['AI-ModelScope/LaTeX_OCR:small#1000'], seed=42)[0]
+    print(f'dataset: {dataset}')
+    infer_requests = [InferRequest(**data) for data in dataset]
+    infer_batch(engine, infer_requests)
+    infer_stream(engine, InferRequest(messages=[get_message(mm_type='video')]))
+    # This writing is equivalent to the above writing.
+    infer_stream(engine, InferRequest(**get_data(mm_type='video')))
+if __name__ == '__main__':
+    from swift.llm import (InferEngine, InferRequest, InferClient, RequestConfig, load_dataset, run_deploy,
+                           DeployArguments)
+    from swift.plugin import InferStats
+    # NOTE: In a real deployment scenario, please comment out the context of run_deploy.
+    with run_deploy(
+            DeployArguments(model='Qwen/Qwen2.5-VL-3B-Instruct', verbose=False, log_interval=-1,
+                            infer_backend='vllm')) as port:
+        run_client(port=port)

ms-swift/examples/deploy/lora/client.py ADDED Viewed

	@@ -0,0 +1,27 @@

+from swift.llm import InferClient, InferRequest, RequestConfig
+def infer_multilora(engine: InferClient, infer_request: InferRequest):
+    # Dynamic LoRA
+    models = engine.models
+    print(f'models: {models}')
+    request_config = RequestConfig(max_tokens=512, temperature=0)
+    # use lora1
+    resp_list = engine.infer([infer_request], request_config, model=models[1])
+    response = resp_list[0].choices[0].message.content
+    print(f'lora1-response: {response}')
+    # origin model
+    resp_list = engine.infer([infer_request], request_config, model=models[0])
+    response = resp_list[0].choices[0].message.content
+    print(f'response: {response}')
+    # use lora2
+    resp_list = engine.infer([infer_request], request_config, model=models[2])
+    response = resp_list[0].choices[0].message.content
+    print(f'lora2-response: {response}')
+if __name__ == '__main__':
+    engine = InferClient(host='127.0.0.1', port=8000)
+    infer_request = InferRequest(messages=[{'role': 'user', 'content': 'who are you?'}])
+    infer_multilora(engine, infer_request)

ms-swift/examples/deploy/server/README.md ADDED Viewed

	@@ -0,0 +1,9 @@

+Please refer to the examples in [examples/infer](../../infer/) and change `swift infer` to `swift deploy` to start the service. (You need to additionally remove `--val_dataset`)
+e.g.
+```shell
+CUDA_VISIBLE_DEVICES=0 \
+swift deploy \
+    --model Qwen/Qwen2.5-7B-Instruct \
+    --infer_backend vllm
+```

ms-swift/examples/eval/llm/eval.sh ADDED Viewed

	@@ -0,0 +1,7 @@

+CUDA_VISIBLE_DEVICES=0 \
+swift eval \
+    --model Qwen/Qwen2.5-1.5B-Instruct \
+    --eval_backend OpenCompass \
+    --infer_backend vllm \
+    --eval_limit 100 \
+    --eval_dataset gsm8k

ms-swift/examples/export/merge_lora.sh ADDED Viewed

	@@ -0,0 +1,5 @@

+# Since `output/vx-xxx/checkpoint-xxx` is trained by swift and contains an `args.json` file,
+# there is no need to explicitly set `--model`, `--system`, etc., as they will be automatically read.
+swift export \
+    --adapters output/vx-xxx/checkpoint-xxx \
+    --merge_lora true

ms-swift/examples/export/quantize/mllm/gptq.sh ADDED Viewed

	@@ -0,0 +1,18 @@

+# OMP_NUM_THREADS=14 please Check issue: https://github.com/AutoGPTQ/AutoGPTQ/issues/439
+OMP_NUM_THREADS=14 \
+CUDA_VISIBLE_DEVICES=0 \
+MAX_PIXELS=1003520 \
+VIDEO_MAX_PIXELS=50176 \
+FPS_MAX_FRAMES=12 \
+swift export \
+    --model Qwen/Qwen2.5-VL-3B-Instruct \
+    --dataset 'AI-ModelScope/alpaca-gpt4-data-zh#500' \
+              'AI-ModelScope/alpaca-gpt4-data-en#500' \
+              'modelscope/coco_2014_caption:validation#500' \
+              'swift/VideoChatGPT:Generic#500' \
+    --quant_n_samples 256 \
+    --quant_batch_size 1 \
+    --max_length 2048 \
+    --quant_method gptq \
+    --quant_bits 4 \
+    --output_dir Qwen2.5-VL-3B-Instruct-GPTQ-Int4

ms-swift/examples/export/quantize/moe/gptq.sh ADDED Viewed

	@@ -0,0 +1,13 @@

+# 2 * 80GB
+OMP_NUM_THREADS=14 \
+CUDA_VISIBLE_DEVICES=0,1 \
+swift export \
+    --model Qwen/Qwen2-57B-A14B-Instruct \
+    --dataset 'AI-ModelScope/alpaca-gpt4-data-zh#1000' \
+              'AI-ModelScope/alpaca-gpt4-data-en#1000' \
+    --quant_n_samples 512 \
+    --quant_batch_size 1 \
+    --max_length 4096 \
+    --quant_method gptq \
+    --quant_bits 4 \
+    --output_dir Qwen2-57B-A14B-Instruct-GPTQ-Int4

ms-swift/examples/export/quantize/omni/gptq.sh ADDED Viewed

	@@ -0,0 +1,18 @@

+# OMP_NUM_THREADS=14 please Check issue: https://github.com/AutoGPTQ/AutoGPTQ/issues/439
+OMP_NUM_THREADS=14 \
+CUDA_VISIBLE_DEVICES=0 \
+MAX_PIXELS=1003520 \
+VIDEO_MAX_PIXELS=50176 \
+FPS_MAX_FRAMES=12 \
+swift export \
+    --model Qwen/Qwen2.5-Omni-7B \
+    --dataset 'AI-ModelScope/alpaca-gpt4-data-zh#500' \
+              'AI-ModelScope/alpaca-gpt4-data-en#500' \
+              'modelscope/coco_2014_caption:validation#500' \
+              'swift/VideoChatGPT:Generic#500' \
+    --quant_n_samples 256 \
+    --quant_batch_size 1 \
+    --max_length 2048 \
+    --quant_method gptq \
+    --quant_bits 4 \
+    --output_dir Qwen2.5-Omni-7B-GPTQ-Int4

ms-swift/examples/export/quantize/reward_model/bnb.sh ADDED Viewed

	@@ -0,0 +1,12 @@

+# bnb quantize
+CUDA_VISIBLE_DEVICES=0 swift export \
+    --model Shanghai_AI_Laboratory/internlm2-1_8b-reward \
+    --output_dir output/internlm2-1_8b-reward-bnb-int4 \
+    --quant_bits 4 \
+    --quant_method bnb
+# infer
+CUDA_VISIBLE_DEVICES=0 swift infer \
+    --model output/internlm2-1_8b-reward-bnb-int4 \
+    --val_dataset 'AI-ModelScope/alpaca-gpt4-data-zh#1000' \
+    --max_batch_size 16

ms-swift/examples/export/quantize/reward_model/gptq.sh ADDED Viewed

	@@ -0,0 +1,13 @@

+# gptq quantize
+CUDA_VISIBLE_DEVICES=0 swift export \
+    --model Shanghai_AI_Laboratory/internlm2-1_8b-reward \
+    --output_dir output/internlm2-1_8b-reward-gptq-int4 \
+    --quant_bits 4 \
+    --quant_method gptq \
+    --dataset 'AI-ModelScope/alpaca-gpt4-data-zh#1000' 'AI-ModelScope/alpaca-gpt4-data-en#1000'
+# infer
+CUDA_VISIBLE_DEVICES=0 swift infer \
+    --model output/internlm2-1_8b-reward-gptq-int4 \
+    --val_dataset 'AI-ModelScope/alpaca-gpt4-data-zh#1000' \
+    --max_batch_size 16

ms-swift/examples/infer/cli_demo.sh ADDED Viewed

	@@ -0,0 +1,6 @@

+CUDA_VISIBLE_DEVICES=0 \
+swift infer \
+    --model Qwen/Qwen2.5-1.5B-Instruct \
+    --infer_backend pt \
+    --stream true \
+    --max_new_tokens 2048

ms-swift/examples/infer/demo.py ADDED Viewed

	@@ -0,0 +1,73 @@

+# Copyright (c) Alibaba, Inc. and its affiliates.
+import asyncio
+import os
+from typing import List
+os.environ['CUDA_VISIBLE_DEVICES'] = '0'
+def infer_batch(engine: 'InferEngine', infer_requests: List['InferRequest']):
+    request_config = RequestConfig(max_tokens=512, temperature=0)
+    metric = InferStats()
+    resp_list = engine.infer(infer_requests, request_config, metrics=[metric])
+    query0 = infer_requests[0].messages[0]['content']
+    print(f'query0: {query0}')
+    print(f'response0: {resp_list[0].choices[0].message.content}')
+    print(f'metric: {metric.compute()}')
+    # metric.reset()  # reuse
+def infer_async_batch(engine: 'InferEngine', infer_requests: List['InferRequest']):
+    # The asynchronous interface below is equivalent to the synchronous interface above.
+    request_config = RequestConfig(max_tokens=512, temperature=0)
+    async def _run():
+        tasks = [engine.infer_async(infer_request, request_config) for infer_request in infer_requests]
+        return await asyncio.gather(*tasks)
+    resp_list = asyncio.run(_run())
+    query0 = infer_requests[0].messages[0]['content']
+    print(f'query0: {query0}')
+    print(f'response0: {resp_list[0].choices[0].message.content}')
+def infer_stream(engine: 'InferEngine', infer_request: 'InferRequest'):
+    request_config = RequestConfig(max_tokens=512, temperature=0, stream=True)
+    metric = InferStats()
+    gen_list = engine.infer([infer_request], request_config, metrics=[metric])
+    query = infer_request.messages[0]['content']
+    print(f'query: {query}\nresponse: ', end='')
+    for resp in gen_list[0]:
+        if resp is None:
+            continue
+        print(resp.choices[0].delta.content, end='', flush=True)
+    print()
+    print(f'metric: {metric.compute()}')
+if __name__ == '__main__':
+    from swift.llm import InferEngine, InferRequest, PtEngine, RequestConfig, load_dataset
+    from swift.plugin import InferStats
+    model = 'Qwen/Qwen2.5-1.5B-Instruct'
+    infer_backend = 'pt'
+    if infer_backend == 'pt':
+        engine = PtEngine(model, max_batch_size=64)
+    elif infer_backend == 'vllm':
+        from swift.llm import VllmEngine
+        engine = VllmEngine(model, max_model_len=8192)
+    elif infer_backend == 'lmdeploy':
+        from swift.llm import LmdeployEngine
+        engine = LmdeployEngine(model)
+    # Here, `load_dataset` is used for convenience; `infer_batch` does not require creating a dataset.
+    dataset = load_dataset(['AI-ModelScope/alpaca-gpt4-data-zh#1000'], seed=42)[0]
+    print(f'dataset: {dataset}')
+    infer_requests = [InferRequest(**data) for data in dataset]
+    # if infer_backend in {'vllm', 'lmdeploy'}:
+    #     infer_async_batch(engine, infer_requests)
+    infer_batch(engine, infer_requests)
+    messages = [{'role': 'user', 'content': 'who are you?'}]
+    infer_stream(engine, InferRequest(messages=messages))

ms-swift/examples/infer/demo_agent.py ADDED Viewed

	@@ -0,0 +1,118 @@

+# Copyright (c) Alibaba, Inc. and its affiliates.
+import os
+os.environ['CUDA_VISIBLE_DEVICES'] = '0'
+# os.environ['SWIFT_DEBUG'] = '1'
+def infer(engine: 'InferEngine', infer_request: 'InferRequest'):
+    stop = [engine.default_template.agent_template.keyword.observation]  # compat react_en
+    request_config = RequestConfig(max_tokens=512, temperature=0, stop=stop)
+    resp_list = engine.infer([infer_request], request_config)
+    query = infer_request.messages[0]['content']
+    response = resp_list[0].choices[0].message.content
+    print(f'query: {query}')
+    print(f'response: {response}')
+    print(f'tool_calls: {resp_list[0].choices[0].message.tool_calls}')
+    tool = '{"temperature": 32, "condition": "Sunny", "humidity": 50}'
+    print(f'tool_response: {tool}')
+    infer_request.messages += [{'role': 'assistant', 'content': response}, {'role': 'tool', 'content': tool}]
+    resp_list = engine.infer([infer_request], request_config)
+    response2 = resp_list[0].choices[0].message.content
+    print(f'response2: {response2}')
+def infer_stream(engine: 'InferEngine', infer_request: 'InferRequest'):
+    stop = [engine.default_template.agent_template.keyword.observation]
+    request_config = RequestConfig(max_tokens=512, temperature=0, stream=True, stop=stop)
+    gen_list = engine.infer([infer_request], request_config)
+    query = infer_request.messages[0]['content']
+    response = ''
+    print(f'query: {query}\nresponse: ', end='')
+    for resp in gen_list[0]:
+        if resp is None:
+            continue
+        delta = resp.choices[0].delta.content
+        response += delta
+        print(delta, end='', flush=True)
+    print()
+    print(f'tool_calls: {resp.choices[0].delta.tool_calls}')
+    tool = '{"temperature": 32, "condition": "Sunny", "humidity": 50}'
+    print(f'tool_response: {tool}\nresponse2: ', end='')
+    infer_request.messages += [{'role': 'assistant', 'content': response}, {'role': 'tool', 'content': tool}]
+    gen_list = engine.infer([infer_request], request_config)
+    for resp in gen_list[0]:
+        if resp is None:
+            continue
+        print(resp.choices[0].delta.content, end='', flush=True)
+    print()
+def get_infer_request():
+    return InferRequest(
+        messages=[{
+            'role': 'user',
+            'content': "How's the weather in Beijing today?"
+        }],
+        tools=[{
+            'name': 'get_current_weather',
+            'description': 'Get the current weather in a given location',
+            'parameters': {
+                'type': 'object',
+                'properties': {
+                    'location': {
+                        'type': 'string',
+                        'description': 'The city and state, e.g. San Francisco, CA'
+                    },
+                    'unit': {
+                        'type': 'string',
+                        'enum': ['celsius', 'fahrenheit']
+                    }
+                },
+                'required': ['location']
+            }
+        }])
+def infer_continue_generate(engine):
+    # Continue generating after the assistant message.
+    infer_request = InferRequest(messages=[{
+        'role': 'user',
+        'content': 'How is the weather today?'
+    }, {
+        'role': 'assistant',
+        'content': 'It is sunny today, '
+    }, {
+        'role': 'assistant',
+        'content': None
+    }])
+    request_config = RequestConfig(max_tokens=512, temperature=0)
+    resp_list = engine.infer([infer_request], request_config)
+    response = resp_list[0].choices[0].message.content
+    print(f'response: {response}')
+if __name__ == '__main__':
+    from swift.llm import InferEngine, InferRequest, PtEngine, RequestConfig
+    from swift.plugin import agent_templates
+    model = 'Qwen/Qwen2.5-1.5B-Instruct'
+    infer_backend = 'pt'
+    if infer_backend == 'pt':
+        engine = PtEngine(model, max_batch_size=64)
+    elif infer_backend == 'vllm':
+        from swift.llm import VllmEngine
+        engine = VllmEngine(model, max_model_len=8192)
+    elif infer_backend == 'lmdeploy':
+        from swift.llm import LmdeployEngine
+        engine = LmdeployEngine(model)
+    # agent_template = agent_templates['hermes']()  # react_en/qwen_en/qwen_en_parallel
+    # engine.default_template.agent_template = agent_template
+    infer(engine, get_infer_request())
+    infer_stream(engine, get_infer_request())
+    # infer_continue_generate(engine)

ms-swift/examples/infer/demo_bert.py ADDED Viewed

	@@ -0,0 +1,53 @@

+# Copyright (c) Alibaba, Inc. and its affiliates.
+import os
+from typing import List
+os.environ['CUDA_VISIBLE_DEVICES'] = '0'
+def infer_batch(engine: 'InferEngine', infer_requests: List['InferRequest']):
+    resp_list = engine.infer(infer_requests)
+    query0 = infer_requests[0].messages[0]['content']
+    query1 = infer_requests[1].messages[0]['content']
+    print(f'query0: {query0}')
+    print(f'response0: {resp_list[0].choices[0].message.content}')
+    print(f'query1: {query1}')
+    print(f'response1: {resp_list[1].choices[0].message.content}')
+if __name__ == '__main__':
+    # This is an example of BERT with LoRA.
+    from swift.llm import InferEngine, InferRequest, PtEngine, load_dataset, safe_snapshot_download, BaseArguments
+    from swift.tuners import Swift
+    adapter_path = safe_snapshot_download('swift/test_bert')
+    args = BaseArguments.from_pretrained(adapter_path)
+    args.max_length = 512
+    args.truncation_strategy = 'right'
+    # method1
+    model, processor = args.get_model_processor()
+    model = Swift.from_pretrained(model, adapter_path)
+    template = args.get_template(processor)
+    engine = PtEngine.from_model_template(model, template, max_batch_size=64)
+    # method2
+    # engine = PtEngine(args.model, adapters=[adapter_path], max_batch_size=64,
+    #                   task_type=args.task_type, num_labels=args.num_labels)
+    # template = args.get_template(engine.processor)
+    # engine.default_template = template
+    # Here, `load_dataset` is used for convenience; `infer_batch` does not require creating a dataset.
+    dataset = load_dataset(['DAMO_NLP/jd:cls#1000'], seed=42)[0]
+    print(f'dataset: {dataset}')
+    infer_requests = [InferRequest(messages=data['messages']) for data in dataset]
+    infer_batch(engine, infer_requests)
+    infer_batch(engine, [
+        InferRequest(messages=[{
+            'role': 'user',
+            'content': '今天天气真好呀'
+        }]),
+        InferRequest(messages=[{
+            'role': 'user',
+            'content': '真倒霉'
+        }])
+    ])

ms-swift/examples/infer/demo_hf.py ADDED Viewed

	@@ -0,0 +1,61 @@

+def infer_hf():
+    from transformers import AutoModelForCausalLM, AutoTokenizer
+    from peft import PeftModel
+    from modelscope import snapshot_download
+    model_dir = snapshot_download('Qwen/Qwen2.5-7B-Instruct')
+    adapter_dir = snapshot_download('swift/test_lora')
+    model = AutoModelForCausalLM.from_pretrained(
+        model_dir, torch_dtype='auto', device_map='auto', trust_remote_code=True)
+    model = PeftModel.from_pretrained(model, adapter_dir)
+    tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)
+    messages = [{
+        'role': 'system',
+        'content': 'You are a helpful assistant.'
+    }, {
+        'role': 'user',
+        'content': 'who are you?'
+    }]
+    text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+    model_inputs = tokenizer([text], return_tensors='pt', add_special_tokens=False).to(model.device)
+    generated_ids = model.generate(**model_inputs, max_new_tokens=512, do_sample=False)
+    generated_ids = [
+        output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
+    ]
+    response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
+    print(f'response: {response}')
+    return response
+def infer_swift():
+    from swift.llm import get_model_tokenizer, get_template, InferRequest, RequestConfig, PtEngine
+    from modelscope import snapshot_download
+    from swift.tuners import Swift
+    model_dir = snapshot_download('Qwen/Qwen2.5-7B-Instruct')
+    adapter_dir = snapshot_download('swift/test_lora')
+    model, tokenizer = get_model_tokenizer(model_dir, device_map='auto')
+    model = Swift.from_pretrained(model, adapter_dir)
+    template = get_template(model.model_meta.template, tokenizer)
+    engine = PtEngine.from_model_template(model, template)
+    messages = [{
+        'role': 'system',
+        'content': 'You are a helpful assistant.'
+    }, {
+        'role': 'user',
+        'content': 'who are you?'
+    }]
+    request_config = RequestConfig(max_tokens=512, temperature=0)
+    resp_list = engine.infer([InferRequest(messages=messages)], request_config=request_config)
+    response = resp_list[0].choices[0].message.content
+    print(f'response: {response}')
+    return response
+if __name__ == '__main__':
+    response = infer_hf()
+    response2 = infer_swift()
+    assert response == response2

ms-swift/examples/infer/demo_lora.py ADDED Viewed

	@@ -0,0 +1,68 @@

+import os
+from typing import Literal
+os.environ['CUDA_VISIBLE_DEVICES'] = '0'
+def infer_multilora(infer_request: 'InferRequest', infer_backend: Literal['vllm', 'pt']):
+    # Dynamic LoRA
+    adapter_path = safe_snapshot_download('swift/test_lora')
+    adapter_path2 = safe_snapshot_download('swift/test_lora2')
+    args = BaseArguments.from_pretrained(adapter_path)
+    if infer_backend == 'pt':
+        engine = PtEngine(args.model)
+    elif infer_backend == 'vllm':
+        from swift.llm import VllmEngine
+        engine = VllmEngine(args.model, enable_lora=True, max_loras=1, max_lora_rank=16)
+    template = get_template(args.template, engine.processor, args.system)
+    request_config = RequestConfig(max_tokens=512, temperature=0)
+    adapter_request = AdapterRequest('lora1', adapter_path)
+    adapter_request2 = AdapterRequest('lora2', adapter_path2)
+    # use lora
+    resp_list = engine.infer([infer_request], request_config, template=template, adapter_request=adapter_request)
+    response = resp_list[0].choices[0].message.content
+    print(f'lora1-response: {response}')
+    # origin model
+    resp_list = engine.infer([infer_request], request_config)
+    response = resp_list[0].choices[0].message.content
+    print(f'response: {response}')
+    # use lora
+    resp_list = engine.infer([infer_request], request_config, template=template, adapter_request=adapter_request2)
+    response = resp_list[0].choices[0].message.content
+    print(f'lora2-response: {response}')
+def infer_lora(infer_request: 'InferRequest'):
+    request_config = RequestConfig(max_tokens=512, temperature=0)
+    adapter_path = safe_snapshot_download('swift/test_lora')
+    args = BaseArguments.from_pretrained(adapter_path)
+    # method1
+    # engine = PtEngine(args.model, adapters=[adapter_path])
+    # template = get_template(args.template, engine.tokenizer, args.system)
+    # engine.default_template = template
+    # method2
+    # model, processor = args.get_model_processor()
+    # model = Swift.from_pretrained(model, adapter_path)
+    # template = args.get_template(processor)
+    # engine = PtEngine.from_model_template(model, template)
+    # method3
+    model, tokenizer = get_model_tokenizer(args.model)
+    model = Swift.from_pretrained(model, adapter_path)
+    template = get_template(args.template, tokenizer, args.system)
+    engine = PtEngine.from_model_template(model, template)
+    resp_list = engine.infer([infer_request], request_config)
+    response = resp_list[0].choices[0].message.content
+    print(f'lora-response: {response}')
+if __name__ == '__main__':
+    from swift.llm import (PtEngine, RequestConfig, AdapterRequest, get_template, BaseArguments, InferRequest,
+                           safe_snapshot_download, get_model_tokenizer)
+    from swift.tuners import Swift
+    infer_request = InferRequest(messages=[{'role': 'user', 'content': 'who are you?'}])
+    # infer_lora(infer_request)
+    infer_multilora(infer_request, 'pt')

ms-swift/examples/infer/demo_mllm.py ADDED Viewed

	@@ -0,0 +1,145 @@

+# Copyright (c) Alibaba, Inc. and its affiliates.
+import os
+from typing import List, Literal
+os.environ['CUDA_VISIBLE_DEVICES'] = '0'
+def infer_batch(engine: 'InferEngine', infer_requests: List['InferRequest']):
+    request_config = RequestConfig(max_tokens=512, temperature=0)
+    metric = InferStats()
+    resp_list = engine.infer(infer_requests, request_config, metrics=[metric])
+    query0 = infer_requests[0].messages[0]['content']
+    print(f'query0: {query0}')
+    print(f'response0: {resp_list[0].choices[0].message.content}')
+    print(f'metric: {metric.compute()}')
+    # metric.reset()  # reuse
+def infer_stream(engine: 'InferEngine', infer_request: 'InferRequest'):
+    request_config = RequestConfig(max_tokens=512, temperature=0, stream=True)
+    metric = InferStats()
+    gen_list = engine.infer([infer_request], request_config, metrics=[metric])
+    query = infer_request.messages[0]['content']
+    print(f'query: {query}\nresponse: ', end='')
+    for resp in gen_list[0]:
+        if resp is None:
+            continue
+        print(resp.choices[0].delta.content, end='', flush=True)
+    print()
+    print(f'metric: {metric.compute()}')
+def get_message(mm_type: Literal['text', 'image', 'video', 'audio']):
+    if mm_type == 'text':
+        message = {'role': 'user', 'content': 'who are you?'}
+    elif mm_type == 'image':
+        message = {
+            'role':
+            'user',
+            'content': [
+                {
+                    'type': 'image',
+                    # url or local_path or PIL.Image or base64
+                    'image': 'http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png'
+                },
+                {
+                    'type': 'text',
+                    'text': 'How many sheep are there in the picture?'
+                }
+            ]
+        }
+    elif mm_type == 'video':
+        message = {
+            'role':
+            'user',
+            'content': [{
+                'type': 'video',
+                'video': 'https://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/baby.mp4'
+            }, {
+                'type': 'text',
+                'text': 'Describe this video.'
+            }]
+        }
+    elif mm_type == 'audio':
+        message = {
+            'role':
+            'user',
+            'content': [{
+                'type': 'audio',
+                'audio': 'http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/weather.wav'
+            }, {
+                'type': 'text',
+                'text': 'What does this audio say?'
+            }]
+        }
+    return message
+def get_data(mm_type: Literal['text', 'image', 'video', 'audio']):
+    data = {}
+    if mm_type == 'text':
+        messages = [{'role': 'user', 'content': 'who are you?'}]
+    elif mm_type == 'image':
+        # The number of <image> tags must be the same as len(images).
+        messages = [{'role': 'user', 'content': '<image>How many sheep are there in the picture?'}]
+        # Support URL/Path/base64/PIL.Image
+        data['images'] = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png']
+    elif mm_type == 'video':
+        messages = [{'role': 'user', 'content': '<video>Describe this video.'}]
+        data['videos'] = ['https://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/baby.mp4']
+    elif mm_type == 'audio':
+        messages = [{'role': 'user', 'content': '<audio>What does this audio say?'}]
+        data['audios'] = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/weather.wav']
+    data['messages'] = messages
+    return data
+if __name__ == '__main__':
+    # The inference of the trained model can be referred to as:
+    # https://github.com/modelscope/ms-swift/tree/main/examples/notebook
+    from swift.llm import InferEngine, InferRequest, PtEngine, RequestConfig, load_dataset
+    from swift.plugin import InferStats
+    infer_backend = 'pt'
+    if infer_backend == 'pt':
+        model = 'Qwen/Qwen2-Audio-7B-Instruct'
+        mm_type = 'audio'
+        engine = PtEngine(model, max_batch_size=64)
+    elif infer_backend == 'vllm':
+        # test env: vllm==0.7.3, transformers==4.49.*
+        # The meaning of environment variables can be found at:
+        # https://swift.readthedocs.io/zh-cn/latest/Instruction/%E5%91%BD%E4%BB%A4%E8%A1%8C%E5%8F%82%E6%95%B0.html#id17
+        from swift.llm import VllmEngine
+        os.environ['MAX_PIXELS'] = '1003520'
+        os.environ['VIDEO_MAX_PIXELS'] = '50176'
+        os.environ['FPS_MAX_FRAMES'] = '12'
+        model = 'Qwen/Qwen2.5-VL-3B-Instruct'
+        # If you encounter insufficient GPU memory, please reduce `max_model_len` and set `max_num_seqs=5`.
+        engine = VllmEngine(model, max_model_len=8192, limit_mm_per_prompt={'image': 5, 'video': 2})
+        mm_type = 'image'  # or 'video'
+    elif infer_backend == 'lmdeploy':
+        # test env: lmdeploy==0.7.1
+        from swift.llm import LmdeployEngine
+        model = 'OpenGVLab/InternVL2_5-1B'
+        engine = LmdeployEngine(model, vision_batch_size=8)
+        mm_type = 'image'  # or 'video'
+    # infer dataset
+    if mm_type == 'audio':
+        dataset = 'speech_asr/speech_asr_aishell1_trainsets:validation#1000'
+    elif mm_type == 'image':
+        dataset = 'AI-ModelScope/LaTeX_OCR:small#1000'
+    elif mm_type == 'video':
+        dataset = 'swift/VideoChatGPT:Generic#100'
+    # Here, `load_dataset` is used for convenience; `infer_batch` does not require creating a dataset.
+    dataset = load_dataset([dataset], seed=42)[0]
+    print(f'dataset: {dataset}')
+    infer_requests = [InferRequest(**data) for data in dataset]
+    infer_batch(engine, infer_requests)
+    infer_stream(engine, InferRequest(messages=[get_message(mm_type)]))
+    # This writing is equivalent to the above writing.
+    infer_stream(engine, InferRequest(**get_data(mm_type)))

ms-swift/examples/infer/pt/batch_ddp.sh ADDED Viewed

	@@ -0,0 +1,9 @@

+# 18GB
+NPROC_PER_NODE=4 \
+CUDA_VISIBLE_DEVICES=0,1,2,3 \
+swift infer \
+    --model Qwen/Qwen2.5-1.5B-Instruct \
+    --infer_backend pt \
+    --val_dataset AI-ModelScope/alpaca-gpt4-data-zh#1000 \
+    --max_batch_size 16 \
+    --max_new_tokens 512

ms-swift/examples/infer/pt/bert.sh ADDED Viewed

	@@ -0,0 +1,8 @@

+# Since `swift/test_lora` is trained by swift and contains an `args.json` file,
+# there is no need to explicitly set `--model`, `--system`, etc., as they will be automatically read.
+# To disable this behavior, please set `--load_args false`.
+CUDA_VISIBLE_DEVICES=0 \
+swift infer \
+    --adapters swift/test_bert \
+    --truncation_strategy right \
+    --max_length 512