BechusRantus's picture
Upload folder using huggingface_hub
7134ce7 verified

Custom Model

The models built into ms-swift can be used directly by specifying either model_id or model_path: --model <model_id_or_path>. ms-swift determines the model_type based on the suffix of model_id/model_path and the config.json file.

Each model_type has a unique model structure, template, and loading method. Of course, you can also manually override these by passing --model_type and --template. You can check the supported model_type and templates in the Supported Models and Datasets.

The following introduces how to register a new model and its corresponding template. For best practices, refer to Best Practices for Registering Multimodal Models.

Model Registration

Custom models are typically implemented using model registration. You can refer to the built-in model, the built-in dialogue template, or the example code in the examples. You can specify the --external_plugins xxx.py to parse the externally registered content, which is convenient for users installing via pip instead of git clone.

The register_model function registers a model in the MODEL_MAPPING. You can complete the model registration by calling the function register_model(model_meta), where model_meta will store the model's metadata. The parameter list for ModelMeta is as follows:

  • model_type: Required. The model type, which is also the unique ID.
  • model_groups: Required. Lists the ModelScope/HuggingFace model IDs and local paths. Running the run_model_info.py file will automatically generate the supported models documentation and automatically match the model_type based on the --model suffix.
  • loader: The loader for model and tokenizer/processor (multimodal models). Defaults to swift.model.ModelLoader.
  • template: The default template type when --template is not additionally specified in the command line. Defaults to None.
  • model_arch: The model architecture. Defaults to None. Multi-modal model training requires setting this parameter to determine the prefix for llm/vit/aligner.
  • architectures: The architectures item in config.json, used to automatically match the model with its model_type. Defaults to [].
  • additional_saved_files: Files that need to be additionally saved during full parameter training and merge-lora. Defaults to [].
  • torch_dtype: The default dtype when torch_dtype is not passed during model loading. Defaults to None, read from config.json.
  • is_multimodal: Indicates whether the model is multi-modal. Defaults to False.
  • ignore_patterns: File patterns to be ignored when downloading from the hub. Defaults to [].

The register_template function registers a dialogue template in TEMPLATE_MAPPING. To complete the registration of the dialogue template, simply call the function register_template(template_meta), where template_meta will store the metadata of the template. The parameter list for TemplateMeta is as follows:

  • template_type: Required. The type of dialogue template, which also serves as a unique ID.
  • prefix: Required. The prefix of the dialogue template, usually encompassing parts like system, bos_token, and is generated independently of multi-turn dialogue loops. For example, the prefix for qwen is [].
  • prompt: Required. Represents the dialogue portion before {{RESPONSE}}. We use {{QUERY}} as a placeholder for the user's inquiry part. For example, the prompt for qwen is ['<|im_start|>user\n{{QUERY}}<|im_end|>\n<|im_start|>assistant\n'].
  • chat_sep: Required. The separator for each turn in multi-turn dialogues. If set to None, the template does not support multi-turn dialogue. For example, the chat_sep for qwen is ['<|im_end|>\n'].
  • suffix: Defaults to [['eos_token_id']]. The suffix part of the dialogue template, generated independently of multi-turn dialogue loops, usually the eos_token. For example, the suffix for qwen is ['<|im_end|>'].
  • template_cls: Defaults to Template. Customization is generally required when defining templates for multimodal models, particularly in customizing the _encode, _post_encode, and _data_collator functions.
  • system_prefix: Defaults to None. The prefix for dialogue templates with a system. We use{{SYSTEM}}as a placeholder for the system. For example, the system_prefix for qwen is['<|im_start|>system\n{{SYSTEM}}<|im_end|>\n'].
    • Note: If the system is empty and prefix can be replaced by system_prefix, you can write prefix as a prefix including the system without setting system_prefix.
    • If the prefix does not include {{SYSTEM}} and system_prefix is not set, the template does not support the system.
  • default_system: Defaults to None. The default system used when --system is not provided. For example, the default_system for qwen is 'You are a helpful assistant.'.
  • stop_words: Defaults to[]. Additional stop words besides eos_token andsuffix[-1]. For example, the stop_words for qwen is['<|endoftext|>']
    • Note: During inference, the output response will be filtered by eos_token and suffix[-1], but additional stop_words will be retained.