--- license: cc-by-nc-4.0 base_model: Qwen/Qwen2.5-7B-Instruct pipeline_tag: text-generation library_name: peft language: - en - zh tags: - hypernetwork - hyper-lora - lora - role-play - character-impersonation - pretraining - phase-tree datasets: - IAAR-Shanghai/phase_tree_data --- # PHASE-Tree Pretrained Hypermod Hypernetwork pretrained on the PHASE-Tree character-dialogue corpus on top of [`Qwen/Qwen2.5-7B-Instruct`](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct). This is the **warm-start checkpoint** consumed by the SFT runs released under `phase_tree_models/sft/hyper_lora/`. It is *not* intended as a stand-alone inference checkpoint — for character-conditioned generation, the SFT runs are recommended. > The pretraining-stage training schedule (full dataset list, optimizer > schedule, etc.) is not bundled with this release. Only the fields required > by `load_hypermod_checkpoint` (path resolution + hypermod architecture) are > retained in `args.yaml`; the SFT runs in `phase_tree_models/sft/hyper_lora/` > carry the complete training configurations for their respective fine-tuning > stages. ## What is a hypermod? A **hypermod** (hyper-modulator) is a hypernetwork that, conditioned on a character profile embedding, emits a low-rank LoRA delta `ΔW = AB` for each target layer of the base model on the fly. The base model weights themselves are never updated; only the hypernet is trained. At inference time the hypernet generates a personalised LoRA per character, giving one model that covers an open-ended set of personas without needing to store per-character adapters. ## Files | File | Purpose | |------|---------| | `hypermod.pt` | The released pretrained hypermod (it_20000 of the original pretraining run). Use this as the entry point. | | `args.yaml` | Architecture and loader metadata (no training schedule — this checkpoint is meant to be consumed, not resumed). | | `adapter_config.json` | LoRA target-module stub (rank 8, alpha 16, `q_proj` + `v_proj`). | ## How to load ```python from huggingface_hub import snapshot_download from hyper_llm_modulator.hyper_modulator import load_hypermod_checkpoint ckpt_dir = snapshot_download("/PHASE-Tree-pretrained-hypermod") ( args, hypermod, base_model, tokenizer, emb_model, emb_tokenizer, task_desc_format_fn, pooling_fn, ) = load_hypermod_checkpoint(f"{ckpt_dir}/hypermod.pt", device="cuda") ``` The loader reads `args.yaml` and `adapter_config.json` from the same directory as `hypermod.pt` automatically; you do not need to pass them explicitly. The full inference pipeline (profile → embedding → per-layer LoRA → generation) lives in the PHASE-Tree codebase. ## Architecture | Component | Value | |-----------|-------| | Base model | `Qwen/Qwen2.5-7B-Instruct` | | Task encoder | `Qwen/Qwen3-Embedding-4B` | | Target modules | `q_proj`, `v_proj` | | LoRA rank `r` | 8 | | LoRA alpha | 16 | | LoRA dropout | 0.05 | | Hypernet latent size | 1024 | | Hypernet head input size | 2048 | | `delta_w` scaling | 100 | ## Use as warm-start SFT runs whose `args.yaml` sets ```yaml init_hypermod_from: phase_tree_models/phase_tree_pretrained/hypermod.pt ``` consume this checkpoint as the initial hypernet weights. This is the warm-start used by the released anchor SFT run under `phase_tree_models/sft/hyper_lora/`. ## Limitations - This is a **pretraining** checkpoint; downstream SFT is required for competitive character-fidelity scores. - Persona conditioning is mediated entirely by the profile embedding fed into the task encoder; the model has no other persona-control mechanism. - Generations may reproduce stylistic biases of the source corpora and are intended for research evaluation only.