Text Generation
PEFT
English
Chinese
hypernetwork
hyper-lora
lora
role-play
character-impersonation
persona
dialogue
phase-tree
Instructions to use IAAR-Shanghai/phase_tree_models with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use IAAR-Shanghai/phase_tree_models with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
File size: 2,607 Bytes
1145a14 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 | # ==========================================================================
# Architecture / loading metadata for the PHASE-Tree pretrained hypermod.
#
# The released warm-start checkpoint is
# phase_tree_models/phase_tree_pretrained/hypermod.pt
# (= the it_20000 snapshot of the original pretraining run).
#
# Only fields read by `load_hypermod_checkpoint` (path resolution +
# hypermod architecture) are kept here; the dataset lists and the
# original training schedule are intentionally omitted, because the
# PHASE-Tree SFT runs warm-start from these weights and override every
# training hyperparameter from `train_phase_tree_qwen_7b.sh`.
# ==========================================================================
# ββ Paths ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
model_dir: Qwen/Qwen2.5-7B-Instruct
emb_model: Qwen/Qwen3-Embedding-4B
mt_lora_path: null
# ββ Task setup βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
training_task: sft
exp_setup: hyper_lora
sft_mode: completion
encoder_type: linear
# ββ Task-embedding mode βββββββββββββββββββββββββββββββββββββββββββββββββ
use_hypernet: true
use_per_task_emb: true
use_one_hot_task_emb: false
use_inp_as_desc: false
use_per_sample_desc: false
use_default_desc: false
# ββ Hypermod architecture ββββββββββββββββββββββββββββββββββββββββββββββββ
head_in_size: 2048
head_use_bias: false
hypernet_latent_size: 1024
delta_w_scaling: 100
pred_z_score: true
factorized: false
shared_AB_head: false
autoreg_gen: false
learnable_pos_emb: false
learnable_AB_offset: false
# ββ Fusion (disabled; kept for loader compatibility) ββββββββββββββββββββ
use_conv_fusion: false
conv_fusion_type: 1d
conv_fusion_kernel_size: 3
conv_fusion_num_layers: 2
conv_fusion_channels: 64
conv_fusion_dropout: 0.1
use_attention_fusion: false
attention_fusion_type: self
attention_num_heads: 8
attention_num_layers: 2
attention_dropout: 0.1
# ββ Target LoRA modules and context window ββββββββββββββββββββββββββββββ
target_modules:
- q_proj
- v_proj
inp_max_len: 1024
|